This is a quick update to my post about a year ago, with details on how to mine Twitter streams in real-time using PHP. This new code includes updates for the v1.1 API, including authentication using OAuth.
The first thing you need to do is sign in to the Twitter developer portal with your Twitter account here: https://dev.twitter.com/user/login
Once you’ve logged in, click on your profile icon in the top right hand corner, select
“My applications”, and create a new application if you don’t already have one.
Select the option to create the access token as well, as the requests need to be signed by a Twitter account.
The Code
ctwitter_stream.php
class ctwitter_stream { private $m_oauth_consumer_key; private $m_oauth_consumer_secret; private $m_oauth_token; private $m_oauth_token_secret; private $m_oauth_nonce; private $m_oauth_signature; private $m_oauth_signature_method = 'HMAC-SHA1'; private $m_oauth_timestamp; private $m_oauth_version = '1.0'; public function __construct() { // // set a time limit to unlimited // set_time_limit(0); } // // set the login details // public function login($_consumer_key, $_consumer_secret, $_token, $_token_secret) { $this->m_oauth_consumer_key = $_consumer_key; $this->m_oauth_consumer_secret = $_consumer_secret; $this->m_oauth_token = $_token; $this->m_oauth_token_secret = $_token_secret; // // generate a nonce; we're just using a random md5() hash here. // $this->m_oauth_nonce = md5(mt_rand()); return true; } // // process a tweet object from the stream // private function process_tweet(array $_data) { print_r($_data); return true; } // // the main stream manager // public function start(array $_keywords) { while(1) { $fp = fsockopen("ssl://stream.twitter.com", 443, $errno, $errstr, 30); if (!$fp) { echo "ERROR: Twitter Stream Error: failed to open socket"; } else { // // build the data and store it so we can get a length // $data = 'track=' . rawurlencode(implode($_keywords, ',')); // // store the current timestamp // $this->m_oauth_timestamp = time(); // // generate the base string based on all the data // $base_string = 'POST&' . rawurlencode('https://stream.twitter.com/1.1/statuses/filter.json') . '&' . rawurlencode('oauth_consumer_key=' . $this->m_oauth_consumer_key . '&' . 'oauth_nonce=' . $this->m_oauth_nonce . '&' . 'oauth_signature_method=' . $this->m_oauth_signature_method . '&' . 'oauth_timestamp=' . $this->m_oauth_timestamp . '&' . 'oauth_token=' . $this->m_oauth_token . '&' . 'oauth_version=' . $this->m_oauth_version . '&' . $data); // // generate the secret key to use to hash // $secret = rawurlencode($this->m_oauth_consumer_secret) . '&' . rawurlencode($this->m_oauth_token_secret); // // generate the signature using HMAC-SHA1 // // hash_hmac() requires PHP >= 5.1.2 or PECL hash >= 1.1 // $raw_hash = hash_hmac('sha1', $base_string, $secret, true); // // base64 then urlencode the raw hash // $this->m_oauth_signature = rawurlencode(base64_encode($raw_hash)); // // build the OAuth Authorization header // $oauth = 'OAuth oauth_consumer_key="' . $this->m_oauth_consumer_key . '", ' . 'oauth_nonce="' . $this->m_oauth_nonce . '", ' . 'oauth_signature="' . $this->m_oauth_signature . '", ' . 'oauth_signature_method="' . $this->m_oauth_signature_method . '", ' . 'oauth_timestamp="' . $this->m_oauth_timestamp . '", ' . 'oauth_token="' . $this->m_oauth_token . '", ' . 'oauth_version="' . $this->m_oauth_version . '"'; // // build the request // $request = "POST /1.1/statuses/filter.json HTTP/1.1\r\n"; $request .= "Host: stream.twitter.com\r\n"; $request .= "Authorization: " . $oauth . "\r\n"; $request .= "Content-Length: " . strlen($data) . "\r\n"; $request .= "Content-Type: application/x-www-form-urlencoded\r\n\r\n"; $request .= $data; // // write the request // fwrite($fp, $request); // // set it to non-blocking // stream_set_blocking($fp, 0); while(!feof($fp)) { $read = array($fp); $write = null; $except = null; // // select, waiting up to 10 minutes for a tweet; if we don't get one, then // then reconnect, because it's possible something went wrong. // $res = stream_select($read, $write, $except, 600, 0); if ( ($res == false) || ($res == 0) ) { break; } // // read the JSON object from the socket // $json = fgets($fp); // // look for a HTTP response code // if (strncmp($json, 'HTTP/1.1', 8) == 0) { $json = trim($json); if ($json != 'HTTP/1.1 200 OK') { echo 'ERROR: ' . $json . "\n"; return false; } } // // if there is some data, then process it // if ( ($json !== false) && (strlen($json) > 0) ) { // // decode the socket to a PHP array // $data = json_decode($json, true); if ($data) { // // process it // $this->process_tweet($data); } } } } fclose($fp); sleep(10); } return; } };
The “process_tweet()” method will be called for each matching tweet- just modify that method to process the tweet however you want (load it into a database, print it to screen, email it, etc). The keyword matching isn’t perfect- if you search for a string of words, it won’t necessarily match the words in that exact order, but you can check that yourself from the process_tweet() method.
Then create a simple PHP application to run the collector:
require 'ctwitter_stream.php'; $t = new ctwitter_stream(); $t->login('consumer_key', 'consumer secret', 'access token', 'access secret'); $t->start(array('facebook', 'fbook', 'fb'));
You’ll need to provide the Consumer Key, Consumer Secret, Access Token, and the Access Secret, all of which are available from the Details section of your Application.
This new class uses the PHP hash_hmac() function for OAuth, which is available only in PHP 5.2.1 and up, and in the PECL hash extension 1.1 and up.
You can also Download the file here: http://mikepultz.com/uploads/ctwitter_stream.php.zip
Pingback: How To Mine Twitter Streams from PHP in Real Time | mike pultz
Hi Mike–
This is very cool. Simple, clearly coded and efficient. Works great for me.
But. I was wondering if you have any ideas why when I add follow, like so, it returns a 401unauthorized error:
//
// build the data and store it so we can get a length
//
$data = "";
if(count($_keywords))
{
$data .= 'track=' . rawurlencode(implode($_keywords, ','));
}
if(count($_user_ids))
{
if(count($_keywords))
{
$data .= "&";
}
$data .= 'follow=' . rawurlencode(implode($_user_ids, ','));
}
That seems like it should work as far as I can tell, but I must be missing something.
Thanks!
Matt W.
Also, I think there is an error around line 62 in the start method. I think it should be
$data = ‘track=’ . rawurlencode(implode(“,”, $_keywords));
rather than
$data = ‘track=’ . rawurlencode(implode($_keywords, ‘,’));
Does it work either way? The PHP docs indicate the former order
=====
implode($glue, array $pieces)
Join array elements with a string
Parameters:
glue string
Defaults to an empty string. This is not the preferred usage of implode as glue would be the second parameter and thus, the bad prototype would be used.
pieces array
The array of strings to implode.
Okay, I got follow and track requests working– forgot about alphabetical key ordering before generating signature. This isn’t the cleanest way to do it, but it does work.
public function start($_keywords = array(), $_user_ids = array(), $count = 0)
{
if(!is_array($_keywords))
{
$_keywords = array();
}
if(!is_array($_user_ids))
{
$_user_ids = array();
}
while(1)
{
$fp = fsockopen(“ssl://stream.twitter.com”, 443, $errno, $errstr, 30);
if (!$fp)
{
echo “ERROR: Twitter Stream Error: failed to open socket”;
} else
{
//
// build the data and store it so we can get a length
//
$track = “”;
$follow = “”;
$request_data = “”;
if(count($_user_ids) > 0)
{
$follow = ‘follow=’ . rawurlencode(implode(“,”, $_user_ids));
$request_data .= $follow;
}
if(count($_keywords) > 0)
{
if(count($_user_ids))
{
$request_data .= “&”;
}
$track = ‘track=’ . rawurlencode(implode(“,”, $_keywords));
$request_data .= $track;
}
//
// store the current timestamp
//
$this->m_oauth_timestamp = time();
//
// generate the base string based on all the data
//
$base_string = ‘POST&’ . rawurlencode(‘https://stream.twitter.com/1.1/statuses/filter.json’). ‘&’;
if(strlen($follow))
{
$base_string .= rawurlencode($follow.’&’);
}
$base_string .= rawurlencode(‘oauth_consumer_key=’ . $this->m_oauth_consumer_key . ‘&’ .
‘oauth_nonce=’ . $this->m_oauth_nonce . ‘&’ .
‘oauth_signature_method=’ . $this->m_oauth_signature_method . ‘&’ .
‘oauth_timestamp=’ . $this->m_oauth_timestamp . ‘&’ .
‘oauth_token=’ . $this->m_oauth_token . ‘&’ .
‘oauth_version=’ . $this->m_oauth_version);
if(strlen($track))
{
$base_string .= rawurlencode(‘&’.$track);
}
//
// generate the secret key to use to hash
//
$secret = rawurlencode($this->m_oauth_consumer_secret) . ‘&’ .
rawurlencode($this->m_oauth_token_secret);
//
// generate the signature using HMAC-SHA1
//
// hash_hmac() requires PHP >= 5.1.2 or PECL hash >= 1.1
//
$raw_hash = hash_hmac(‘sha1’, $base_string, $secret, true);
//
// base64 then urlencode the raw hash
//
$this->m_oauth_signature = rawurlencode(base64_encode($raw_hash));
//
// build the OAuth Authorization header
//
$oauth = ‘OAuth oauth_consumer_key=”‘ . $this->m_oauth_consumer_key . ‘”, ‘ .
‘oauth_nonce=”‘ . $this->m_oauth_nonce . ‘”, ‘ .
‘oauth_signature=”‘ . $this->m_oauth_signature . ‘”, ‘ .
‘oauth_signature_method=”‘ . $this->m_oauth_signature_method . ‘”, ‘ .
‘oauth_timestamp=”‘ . $this->m_oauth_timestamp . ‘”, ‘ .
‘oauth_token=”‘ . $this->m_oauth_token . ‘”, ‘ .
‘oauth_version=”‘ . $this->m_oauth_version . ‘”‘;
//
// build the request
//
$request = “POST /1.1/statuses/filter.json HTTP/1.1\r\n”;
$request .= “Host: stream.twitter.com\r\n”;
$request .= “Authorization: ” . $oauth . “\r\n”;
$request .= “Content-Length: ” . strlen($request_data) . “\r\n”;
$request .= “Content-Type: application/x-www-form-urlencoded\r\n\r\n”;
$request .= $request_data;
//
// write the request
//
fwrite($fp, $request);
//
// set it to non-blocking
//
stream_set_blocking($fp, 0);
while(!feof($fp))
{
$read = array($fp);
$write = null;
$except = null;
//
// select, waiting up to 10 minutes for a tweet; if we don’t get one, then
// then reconnect, because it’s possible something went wrong.
//
$res = stream_select($read, $write, $except, 600, 0);
if ( ($res == false) || ($res == 0) )
{
break;
}
//
// read the JSON object from the socket
//
$json = fgets($fp);
//
// look for a HTTP response code
//
if (strncmp($json, ‘HTTP/1.1’, 8) == 0)
{
$json = trim($json);
if ($json != ‘HTTP/1.1 200 OK’)
{
echo ‘ERROR: ‘ . $json . “\n”;
return false;
}
}
//
// if there is some data, then process it
//
if ( ($json !== false) && (strlen($json) > 0) )
{
//
// decode the socket to a PHP array
//
$data = json_decode($json, true);
if ($data)
{
//
// process it
//
$this->process_tweet($data);
}
}
}
}
fclose($fp);
sleep(10);
}
return;
}
Thnx for your script. It was really helpful for me.
Hi there!
I’m using your library but i’m stuck… I want to retrieve the user object (as profile, picture, id_user and all that stuff). I tried to modify your class but i just cant.
Please, help me.
@La_JennyLove
Hi, this is great! easiest twitter stream script to follow. Can you help me to loop through the _data array to only show the “text” etc. If i use the following it fails.
private function process_tweet(array $_data)
{
//print_r($_data);
foreach ($_data as $tweet)
{
echo “{$tweet->text}\n”;
}
// return true;
}
thanks so much
Hi,
I’m trying to make a GET request with streams but nothing work, could you please check? http://pastebin.com/X1DK4LnL
thanks a lot 🙂
Hey Rob,
If you look at the output from print_r(), and the process_tweet() function definition, that $_data is an array, not an object.
Also, this function is called once per matching tweet- so it would just be $_data[‘text’];
Mike
Hey Bruno,
You added “count=100” to your $request2 string, but you didn’t add it to the $base_string above- so I assume it’s failing to authenticate.
On the twitter developer tools site, there’s a Oauth testing tool, where you can pick an API call, and it will show you the Oauth request for it, including how the base string should look. I’d suggest you confirm with that after adding the count=100 to make sure your base string matches.
Mike
count=100 was just a try, previoulsy I used it the right way but the result was the same, however i discovered the problem. firehose isn’t allowed to most developer (exept data reseller -.-) so no way…
Pingback: Spoiled Milk | Taming the Twitter API v1.1
Last question, I swear 🙂
i want to send a POST request with statuses/filter, and it work with one parameter but not with two!
post request generated by twitter is this: http://pastebin.com/vUS0tAL7
my basic code is: http://pastebin.com/2RMLynps
to add a language filter basing on twitter’s post request i shouldn’t change $data but change this way: http://pastebin.com/zuyy3JYS
but nothing work, I’ve no ideas left, please help me 🙂
Bruno,
The main start() function takes an array of keywords- for example:
$t->start(array(‘facebook’, ‘fbook’, ‘fb’));
If you put one value in there, it will only search on one; if you put ten, it will search for all ten. This is exactly how the code is designed.
and as far as language filter- first off, since it’s your first URL argument, it should be ? and not &
$request = “POST /1.1/statuses/filter.json?language=it HTTP/1.1\r\n”;
also, I’m not sure if that’s the right place for it when doing a post, it may have to be in the POST’d data
$data = ‘track=’ . rawurlencode(implode($_keywords, ‘,’)) . ‘&language=it’;
Mike
I know that the code is designed to use $_keywords but i did lots of try so i replaced it to be faster.
adding ?language=it or &language=it in request doesn’t work, adding it to data was my first idea but it’s wrong, infact the twitter code add it in the request with & and it’s strange… (http://pastebin.com/vUS0tAL7)
Hello,
I created the program using the example provided. however when i run my application http://127.0.0.1:8081/twitterstreamtest/index.php which calls cstream_twitter.php, it only returns $data as HTTP/1.1 200 OK
And so i see the following error message:
Catchable fatal error: Argument 1 passed to ctwitter_stream::process_tweet() must be an array, string given, called in C:\AppServ\www\twitterstreamtest\ctwitter_stream.php on line 185 and defined in C:\AppServ\www\twitterstreamtest\ctwitter_stream.php on line 44
Please help!
As usual, very intresting content mike.
It seems that twitter has become too sensitive cause im getting error 420 after running this script twice
Hey i am using your Tweet class to get tweets from hashtag using live stream api. I am running this as a cronjob for every hour to get tweets. Whats happening is, it’s getting data for first time after that it’s not getting data. May i know why is it related already open socket problem.
Hey Venkat,
Well, the library is designed to start up and run in the background as a live streaming service, and not really as a cron job. You’ll notice in the start() function in the library, there is a while(1) loop, and inside that, it opens the socket, and waits on select() for new data (streaming).
The twitter API (from what I can remember) will only let you open one connection with a single key; is it possible it’s running the first time, and then staying running in the background, and then the second time it runs via cron, it’s failing, because the first instance is still running?
I would say, just let it run in the background, rather than running it via cron- if you can- otherwise you’re going to have to change the library, but then you’re going to end up missing data between polls.
Mike
will rate limit be a problem while using stream api or not?
What if I want to change a limit or time period of this fetching?
Can I search based on geo-location using this library ?
Please tell me how i can do this.
Thanks for great script, it works great and fast!
I’m trying to include condition that only geo-enabled tweets would pass through, and I tried many methods, but none of them was successful.
With this line unmodified, everything works fine:
$data = ‘track=’ . rawurlencode(implode($_keywords, ‘,’));
But when I try to add search for geo-enabled tweets:
$data = ‘track=’ . rawurlencode(implode($_keywords, ‘,’)) . ‘&locations=-180,-90,180,90’;
like this, or in different way, it always gives back this error:
ERROR: HTTP/1.1 401 Unauthorized false
Do you know how to solve this? Thank you!
Hi Mike,
I also wanted to use additional parameters like geo-location or language but I keep getting the following error:
ERROR: HTTP/1.1 401 Unauthorized
Without the additional parameters it works just fine. I tried to append the new parameters to the $data string in ctwitter_stream::start().
I also tried to append them to this url: https://stream.twitter.com/1.1/statuses/filter.json like this: https://stream.twitter.com/1.1/statuses/filter.json?language=en
I don’t know if this is the right way.. Do you have any ideas?
Thanks.
Tamás
I read twitter streaming API documentations and you above article for my interest based query on twitter data. But I could not get information I want. What I want is to get top tweets from twitter based on location, category (news, entertainment, sports ..), rt_count values, type (tweets containing image, video, link). Is there any twitter API support for this? Or I need to write my own algorithm for this. can you please suggest any direction for this?
Thanks for the script – really useful!
Tony
hej,
can any one share an example using this class? (a link or email it to uomian2004@yahoo.com)
I copied the class, and then use the code provided. it keep on running and never stops.
i change the loop to run only 100 times, but this program is still running without producing any results.
I must be doing something stupid, but at the moment I am not able to figure out what am i doing wrong.
Thanks in advance
Hey Adil,
The script is supposed to run forever; it’s designed to connect to the twitter streaming API, which is a “live” stream of tweets. The idea is to add your “per-tweet” handling code in the “process_tweet()” method, and then let this script run in the background.
Try putting a common search term in (like “facebook” or something), and then start the script up with just print_r($_data) in the process_tweet() method (as per the example), and you should see tweets coming in.
Mike
I have copied your code.. it works for a few results and then i get this (randomly)
PHP Catchable fatal error: Argument 1 passed to ctwitter_stream::process_tweet() must be of the type array, integer given, called in /root/twitter/ctwitter_stream.php on line 188 and defined in /root/twitter/ctwitter_stream.php on line 48
any ideas?
I appreciate your help! and your awesome code! 🙂
Hey Chris,
So the process_tweet() method types the argument passed to it as an array
private function process_tweet(array $_data);
which should always be the case; the only thing I can think of, is that json_decode() is failing (for some reason), so it’s passing a bool into it.
You can try changing process_tweet() to:
private function process_tweet($_data)
and then check the value of $_data first, and in any case where it’s not an array (maybe is_array() on it), then spit out what it is, to try and work back and find out where the error is coming from.
Mike
Hi,
How can i change timezone in your code. It always searching tweets from US and CANADA.
[0] => pr 24 14:07:28 +0000 2008″,”favourites_count”:319,”utc_offset”:-14400,”time_zone”:”Eastern Time (US & Canada)”,”geo_enabled”:true,”verified”:true,”statuses_count”:325078,”lang”:”en”,”contributors_enabled”:false,”is_translator”:false,”is_translation_enabled”:true,”profile_background_color”:”2E7060″,”profile_background_image_url”:”http:\/\/pbs.twimg.com\/profile_background_images\/662779208\/drm9hjltviedp5of1mft.png”,”profile_background_image_url_https”:”https:\/\/pbs.twimg.com\/profile_background_images\/662779208\/drm9hjltviedp5of1mft.png”,”profile_background_tile”:false,”profile_image_url”:”http:\/\/pbs.twimg.com\/profile_images\/456395600636829697\/7gQUqiUI_normal.jpeg”,”profile_image_url_https”:”https:\/\/pbs.twimg.com\/profile_images\/456395600636829697\/7gQUqiUI_normal.jpeg”,”profile_banner_url”:”https:\/\/pbs.twimg.com\/profile_banners\/14511951\/1393823682″,”profile_link_color”:”015E50″,”profile_sidebar_border_color”:”FFFFFF”,”profile_sidebar_fill_color”:”E3E2DE”,”profile_text_color”:”000000″,”profile_use_background_image”:true,”default_profile”:false,”default_profile_image”:false,”following”:null,”follow_request_sent”:null,”notifications”:null},”geo”:null,”coordinates”:null,”place”:null,”contributors”:null,”retweet_count”:2,”favorite_count”:0,”entities”:{“hashtags”:[],”symbols”:[],”urls”:[{“url”:”http:\/\/t.co\/Vzf6Hd95j8″,”expanded_url”:”http:\/\/huff.to\/1knXJXH”,”display_url”:”huff.to\/1knXJXH”,”indices”:[64,86]}],”user_mentions”:[]},”favorited”:false,”retweeted”:false,”possibly_sensitive”:false,”lang”:”en”},”retweet_count”:0,”favorite_count”:0,”entities”:{“hashtags”:[],”symbols”:[],”urls”:[{“url”:”http:\/\/t.co\/Vzf6Hd95j8″,”expanded_url”:”http:\/\/huff.to\/1knXJXH”,”display_url”:”huff.to\/1knXJXH”,”indices”:[84,106]}],”user_mentions”:[{“screen_name”:”HuffingtonPost”,”name”:”Huffington Post”,”id”:14511951,”id_str”:”14511951″,”indices”:[3,18]}]},”favorited”:false,”retweeted”:false,”possibly_sensitive”:true,”filter_level”:”medium”,”lang”:”en”}
I can show only one tweet at a time?
Im kind of new with twitter API, though i do appreciate the code. my challenge is how can i use geo location with this code. and also splitting the array contents. my aim is to insert the tweets in a database. thanks .
The script worked great until a couple months ago. Now I always get a 401 error and I just created a new developer account to verify it wasn’t my account + regen’d my tokens
Nice explanation! Really helps in understanding the basics of twitter streams.
I’ve applied the keys and tokens yesterday and put it in your codes but I still get a 401 response…Can you please tell me why? I’m not sure is it because twitter has used OAuth 1.1 instead of OAuth 1.0?
Hey Mike,
Thanks for the code sample and I hope you’re still paying attention to this post.
I was wondering if you had any hints as to a problem I’m running into. Theoretically, you should be able to switch the ‘track’ for either ‘follow’ or ‘locations’ and switch your start array to match those arguments and it should work correct?
Whenever I change from track to either of the other two mandatory options it kicks back a 401 ‘authentication required.’ Any ideas as to why this would not work?
Hey Kameron,
Not sure if you’ve figured this out yet- but yes, you can change the type of request you’re doing, as long as you encode things properly, and your account has access for the requests you’re making.
And for example, if you changed it from a “filter” request, to something else, you need to make sure to change it in both places (both when building the oauth values, and when making the actual request).
If you didn’t figure it out yet, just post your code,
Mike
Thank you so much for getting back to me.
This is my data declaration in the back end
$data = ‘locations=’ . rawurlencode(implode($_keywords, ‘,’));
and my start on the front end looks like this
$t->start(array(-106.77,35.00,-106.43,35.25));
Am I missing something?
Thanks again for any help!
You can also see my entire code here
Any ideas Mike?
How to close the connection if I do not want to run it forever ?
I will restart connection fresh for next use
The library is designed to run in the background as a daemon and collect tweets; this is how the “fire hose” Twitter service is meant to be used.
If you want to run one-off versions, you’ll have to redesign the library.
Mike
Hey Mike,
Great example of clean and well documented code!
I am using this as a basis for learning the Twitter streaming api, and you have saved me weeks of frustration!
Thanks for the insights and for sharing,
Allen
Hi Mike,
Thanks for this script. Please, I tried to run the script but my browser is just rolling. I also tried to save the $_data into a database table just to test if tweet is returned, but nothing is saved and the browser still keeps rolling without any error. Please what is the problem?
Pingback: How can I "stop" a PHP page from javaScript - meaning mimic the behaviour of closing browser window - CSS PHP
Very very good script!
Thank you very much from Italy!
Paolo
Hi,
I am using hou code and it works wihtout any problems until yesterday., I found that sometimg when the tweet contains an Arabic content , the script failed, Is there any utf8 function that I had to use or it is any issue from twitter side.