How To Mine Twitter Streams from PHP in Real Time
Need to mine Twitter for tweets related to certain keywords?
Twitter provides a pretty simple streaming interface to the onslaught of tweets it receives, letting you specify whatever keywords you want to search for, in a real-time "live" way.
To do this, I created a simple PHP class that can run in the background, collecting tweets for certain keywords:
ctwitter_stream.php
class ctwitter_stream
{
private $m_username;
private $m_password;
public function __construct()
{
//
// set a time limit to unlimited
//
set_time_limit(0);
}
//
// set the login details
//
public function login($_username, $_password)
{
$this->m_username = $_username;
$this->m_password = $_password;
}
//
// process a tweet object from the stream
//
private function process_tweet(array $_data)
{
print_r($_data);
return true;
}
//
// the main stream manager
//
public function start(array $_keywords)
{
while(1)
{
$fp = fsockopen("ssl://stream.twitter.com", 443, $errno, $errstr, 30);
if (!$fp)
{
echo "ERROR: Twitter Stream Error: failed to open socket";
} else
{
//
// build the request
//
$request = "GET /1/statuses/filter.json?track=";
$request .= urlencode(implode($_keywords, ',')) . " HTTP/1.1\r\n";
$request .= "Host: stream.twitter.com\r\n";
$request .= "Authorization: Basic ";
$request .= base64_encode($this->m_username . ':' . $this->m_password);
$request .= "\r\n\r\n";
//
// write the request
//
fwrite($fp, $request);
//
// set it to non-blocking
//
stream_set_blocking($fp, 0);
while(!feof($fp))
{
$read = array($fp);
$write = null;
$except = null;
//
// select, waiting up to 10 minutes for a tweet; if we don't get one, then
// then reconnect, because it's possible something went wrong.
//
$res = stream_select($read, $write, $except, 600, 0);
if ( ($res == false) || ($res == 0) )
{
break;
}
//
// read the JSON object from the socket
//
$json = fgets($fp);
if ( ($json !== false) && (strlen($json) > 0) )
{
//
// decode the socket to a PHP array
//
$data = json_decode($json, true);
if ($data)
{
//
// process it
//
$this->process_tweet($data);
}
}
}
}
fclose($fp);
sleep(10);
}
return;
}
};
The "process_tweet()" method will be called for each matching tweet- just modify that method to process the tweet however you want (load it into a database, print it to screen, email it, etc). The keyword matching isn't perfect- if you search for a string of words, it won't necessarily match the words in that exact order, but you can check that yourself from the process_tweet() method.
Then create a simple PHP application to run the collector:
require 'ctwitter_stream.php';
$t = new ctwitter_stream();
$t->login('your twitter username', 'your twitter password');
$t->start(array('facebook', 'fbook', 'fb'));
Just provide your twitter account username/password, and then an array of keywords/strings to search for.
Since this application runs continuously in the background, it's obviously not meant to be run via a web request, but meant to be run from the command line of your Unix or Windows box.
According to the Twitter documentation, the default access level allows up to 400 keywords, so you can track all sorts of things at the same time. If you need more details about the Twitter streaming API, it's available here.
This class uses the HTTPS PHP stream- so you'll need the OpenSSL extension enabled for it to work.
Net_DNS2 Version 1.2.0
I've released a new version of the PEAR Net_DNS2 library- you can install it now through the command line PEAR installer:
pear install Net_DNS2
Or download it directly from the Google Code page here.
This release includes a significant speed-up with the local cache by using JSON to encode the cache data, rather then the PHP serialize function. Using JSON loses the class information of the objects, but the data remains the same, and the serialization time is about half.
A Query lookup against Google DNS- NO cache
time: 0.0340800285339 Net_DNS2_RR_A Object ( [address] => 199.59.148.82 [name] => twitter.com [type] => A [class] => IN [ttl] => 28 [rdlength] => 4 [rdata] => )
with cache + serialize
time: 0.00258994102478 Net_DNS2_RR_A Object ( [address] => 199.59.148.82 [name] => twitter.com [type] => A [class] => IN [ttl] => 28 [rdlength] => 4 [rdata] => )
with cache + json
time: 0.00178384780884 stdClass Object ( [address] => 199.59.148.82 [name] => twitter.com [type] => A [class] => IN [ttl] => 28 [rdlength] => 4 [rdata] => )
Version 1.2.0
This version changes the way some exceptions are thrown, and may break your code!
- added numeric error codes to the Lookups class, and had each method that throws an exception throw a numeric error code along with the message.
- dropped all references to InvalidArgumentException; we only use the Net_DNS2_Exception from now on.
- added the CAA, URI, TALINK, CDS and TA resource records. Some of these are experimental, but are pretty straight forward.
- fixed a bug in formatString(); my version was only putting double quotes around strings that have spaces, but apparently ALL strings should have double quotes around them. This is how BIND does it.
- re-organized the Net_DNS2_Lookups initialization code; it no longer creates a global object of itself.
- fixed a bug in the caching code; in some cases it wouldn't cache the same content more then once.
- added an option to use JSON to serialize the cache data rather than using the PHP serialize function. JSON is much faster, but loses the class definition, and becomes a stdClass object.
- fixed a handful of cases where I was using double quotes (") where a single quote (') would be fine.
Net_DNS2 Version 1.1.0
Net_DNS2 version 1.1.0 is now available for download from the PEAR site, or can be installed using the "pear" command line client like:
pear install Net_DNS2
This release includes support for signing DNS updates and zone transfers (AXFR's) using SIG(0), a private/public key authentication mechanism. The OpenSSL PHP extension is required for this feature to work.
//
// create a new Updater object
//
$u = new Net_DNS2_Updater('example.com', array('nameservers' => array('192.168.0.1')));
//
// add a SIG(0) to authenticate the request
//
$u->signSIG0('/etc/namedb/Kexample.com.+001+15765.private');
//
// send the update rquest.
//
$u->update();
Support for the ATMA resource record- a method for publishing ATM addresses via DNS.
@ IN SOA name1.data.example.com. name4.data.example.com. (
1994041800 ; Serial - date
1800 ; Refresh - 30 minutes
300 ; Retry - 5 minutes
604800 ; Expire - 7 days
3600 ) ; Minimum - 1 hour
IN NS name1.data.example.com.
IN NS ns.example.com.
;
salmon IN ATMA 39.246f.000e7c9c031200010001.000012345678.00
And a new simple local cache system, using shared memory (using the PHP Shmop Extension), or using a flat file.
$r = new Net_DNS2_Resolver(array(
'cache_type' => 'shared',
'cache_file' => '/tmp/net_dns2.cache',
'cache_size' => 100000
));
Caching is disabled by default, and is only used for DNS queries (and not for updates), but can drastically improve query performance.
For more details, see the Net_DNS2 Google Code Page.
PERL Text-to-Speech using Cepstral voices (libswift)
I've released two new PERL modules:
Speech::Swift - a PERL interface to the Cepstral text-to-speech engine, Swift.
and
Speech::Swift::Simple - a simplified interface to Speech::Swift
The libswift shared library is required to support this code, which is included with every voice downloaded from Cepstral.
The reason for two releases, is that the Speech::Swift module exports all (well, almost all) the underlying functions of the libswift.so library, while Speech::Swift::Simple has a simplified interface to generate speech in a just a few function calls.
For example:
#!/usr/bin/perl
use Speech::Swift::Simple;
#
# create a new Speech::Swift::Simple with one channel audio, and 16bit encoding.
#
my $s = new Speech::Swift::Simple(
channels => 1,
encoding => Speech::Swift::AUDIO_ENCODING_PCM16
);
#
# set the voice to use by name
#
$s->set_voice("Allison");
#
# synthesize the text, and return it as a Speech::Swift::Simple::Wav object
#
my $wav = $s->generate("My name is allison");
#
# write the wav object to a file.
#
$wav->write("test.wav");
Or use the Speech::Swift library directly, for a more low-level interface.
The audio output is always as a WAV file; you can use one of the many audio modules available from CPAN, like Audio::GSM or Audio::MPEG, to re-encode the audio as needed.
Both PERL modules are available for download from CPAN now.
First Release of PHP Swift TTS Extension
I'm happy to announce the first release of the Swift Text-To-Speech PHP extension; the swift engine is the free TTS engine provided with any Cepstral TTS voice. A lot of Asterisk fans out there will recognize the Cepstral Allison voice, as the default voice for Asterisk installations.
The extension will only work on systems support by the Swift engine, and has only been tested (so far) on Linux (CentOS).
The extension will generate audio based on the text provided, and can be exported in several different audio formats, including:
- PCM (RAW audio)
- u-law / a-law (logarithmically encoded RAW audio)
- WAV (RAW audio)
- GSM (when compiled with the libgsm library)
- MP3 (when compiled with the libmp3lame library)
A simple example on how to use this:
//
// create the new TTS object
//
$tts = new SwiftTTS();
//
// set a voice to use for generation
//
$tts->setVoice("Allison");
//
// generate text, and return a stream for the audio
//
$s = $tts->generate("hello my name is allison", SwiftTTS::FORMAT_WAV);
if ($s !== false)
{
//
// write the stream contents to a file
//
file_put_contents("audio.wav", $s);
}
For more details, and to download the current version, see the Google Code page.
