Google Speech API – Full Duplex PHP Version

So this is a follow up to my post a while ago, talking about how to use the Google Speech Recognition API built in to Google Chrome.

Since my last post, Chrome has had some significant upgrades to this feature- specifically around the length of audio you can pass to the API. The old version would only let you pass very short clips (only a few seconds), but the new API is a full-duplex streaming API. What this means, is that it actually uses two HTTP connections- one POST request to upload the content as a “live” chunked stream, and a second GET request to access the results, which makes much more sense for longer audio samples, or for streaming audio.

I created a simple PHP class to access this API; while this likely won’t make sense for anybody that wants to do a real-time stream, it should satisfy most cases where people just want to send “longer” audio clips.

Before you can use this PHP class, you must get a developer API key from Google. The class does not include one, and I cannot give you one- they’re free, and easy to get just go to the Google APIs site, and sign up for one.

google_apis

Then download the class below, and start with a simple example:

<? 
require 'google_speech.php';

$s = new cgoogle_speech('put your API key here'); 

$output = $s->process('@test.flac', 'en-US', 8000);      

print_r($output);
?>

Audio can be passed as a filename (by prefixing the ‘@’ sign in front of the file name), or by passing in raw FLAC content. The second argument is an IETF language tag. I’ve only been able to test with both English and French, but I assume others work. It defaults to ‘en-US’. The third argument is sample rate, it defaults to 8000.

** Your sample rate must match your file- if it doesn’t, you’ll either get nothing returned, or you’ll get a really bad transcription. **

The output will return as an array, and should look something like this:

Array
(
    [0] => Array
        (
            [alternative] => Array
                (
                    [0] => Array
                        (
                            [transcript] => my CPU is a neural net processor a learning computer
                            [confidence] => 0.74177068
                        )
                    [1] => Array
                        (
                            [transcript] => my CPU is the neuron that process of learning
                        )
                    [2] => Array
                        (
                            [transcript] => my CPU is the neural net processor a learning
                        )
                    [3] => Array
                        (
                            [transcript] => my CPU is the neuron that process a balloon
                        )
                    [4] => Array
                        (
                            [transcript] => my CPU is the neural net processor a living
                        )
                )
            [final] => 1
        )
)

Get the PHP class here:Β http://mikepultz.com/uploads/google_speech.php.zip

66 thoughts on “Google Speech API – Full Duplex PHP Version

  1. Pingback: Accessing Google Speech API / Chrome 11 | mike pultz

  2. Dmitriy

    I wanted to receive a developer key from Google, but speech API service already there isn’t present!

  3. mike Post author

    It’s possible it’s only US & Canada (I’m in Canada)- it doesn’t say anything specifically on the site though, so I can’t be 100% sure.

    You could try creating and logging in with a Google account using a US web proxy, and see if the option shows up.

    Mike

  4. Juanan

    I have used a SOCKS connection (using an instance from US located Amazon AWS as proxy) to no avail. I suspect that Google is opening that Speech API keys to a selected group of users. Mike, do you know some other user with that Speech API available in their console? Thanks.

  5. Rohit

    join the chromium-dev list and the speech api option will be available πŸ™‚

  6. James

    Got an API key but it just keeps returning an empty array. Any ideas on where to troubleshoot?

    Thanks

  7. mike Post author

    Hey James,

    I would start by confirming that you’re audio is correctly encoded as FLAC, and that the bitrate of your audio file, correctly matches what you’re passing to the library.

    If all of that is correct, I would try a passing a simple 16 or 8 kbps sample and see if that does it.

    Mike

  8. Juanan

    @Rohit: you were right! I joined chromium-dev list and the speech api option became available. Now, I’ll give it a try to Mike’s PHP class. Thanks!

  9. Pingback: Google Speech API

  10. Michael

    I am also getting an empty array returned. I have verified the file I am attempting to process is flac with 8khz:

    Input File : ‘mt1KxV.flac’
    Channels : 1
    Sample Rate : 8000
    Precision : 16-bit
    Duration : 00:00:08.44 = 67520 samples ~ 633 CDDA sectors
    File Size : 39.6k
    Bit Rate : 37.5k
    Sample Encoding: 16-bit FLAC
    Comment : ‘Comment=Processed by SoX’

    Any thoughts?

    PS. When I check the API query quota, it doesn’t show a change. Makes me think either: 1) the quota doesn’t update in real-time, or 2) it’s just not getting processed…

    thoughts?

  11. Michael

    …figured it out. I was trying to use a server key since I am using this with an asterisk server, however, looking at how you make the request, it looks like a referrer key (the default) is what’s required.

    Got it working. Great code.

    Thanks.

  12. alvin

    I have managed to get this working but don’t understand the advantages of this being duplex since the API does not seem to allow pipelining and the response seems to be ready at the time of post completion. Because the GET request must be made before the POST, it seems like they’re requiring an extra socket connection rather than just one.

  13. mike Post author

    I haven’t played with it too much, but I assume the value of two sockets, is so you could *technically* do “live”- well- “continuous” streaming. With the assumption that it would return multiple chunks as responses- maybe based on silence events, etc.

    The PHP class I wrote doesn’t support this- so the only value in using this new API is 1) I assume at some point they will discontinue the old service, and 2) you can send longer audio clips.

    Has anybody tried sending *really* long audio files? or sent it continuous streams?

    Mike

  14. Alkor

    It is working, but always returns error 403:

    =================

    Error 403 (Forbidden)!!1

    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}


    403. That’s an error.
    Your client does not have permission to get URL /speech-api/full-duplex/v1/down?pair=1331547998992979 from this server. That’s all we know.
    =================

    What should I do to fix this issue? Speech API is allowed in console, and I am using Browser API Key.

    Thank you in advance.

  15. Lucas

    Hi @Michael,

    I also got empty array, but I didn’t understand what “server key” and “referrer key” would be. Would you mind explaining more about it? Or someone who knows about it.

    I think my situation is like yours: flac file, rate= 8000 ; API query quota didn’t show any change.
    I got the flac file from a .wav file. I used a java code to convert. The java code allows to set the flac’s rate.

    Thank you

  16. Nitish Reddy

    Please start using the term “sample rate” rather than “bit rate” πŸ™

  17. Lucas

    Does someone have a sound file (.flac) which I could use like an example in order to check if my sound file is wrong?

    Thank you

  18. Dorian

    Hey Lucas. To get a .flac file you can use to test the code with, do the following:
    – Go to http://audio.online-convert.com/convert-to-flac,
    – Give it an mp3 with some example speech, by clicking on the ‘Choose file’ button,
    – Put in the settings 16 Bit, 8000 Hz, and Mono (in that order),
    – Press the ‘Convert’ button. If you choose a large mp3, this could take a couple minutes.
    – When the page gives you the option to save the new .flac file, maybe call it something the PHP file expects like ‘test.flac’.

    Happy coding πŸ™‚

  19. Chris

    This seems to be limited to 10 – 15 second audio files. I get an Internal Server Error 500 if I send anything longer. Is anyone else seeing this?

  20. Nobuhiro Kumai

    The API only accepts audio files of 15 seconds or less.” Is there any way to make it longer? I would like to convert longer audio files to text like 2 or 3 minutes. Has anyone tried it yet?

  21. Tact

    I’ve been using this python version, and have seen the same problem… when I try to stream a file of 1 min in lenghth, the connection drops around the 15 second point.

    https://gist.github.com/offlinehacker/5780124

    There’s something else going on that we’re not aware of. There HTTP parameters are listed pretty clearly in the Chrome code…
    https://code.google.com/p/chromium/codesearch#chromium/src/content/browser/speech/google_streaming_remote_engine.cc&q=full-duplex&sq=package:chromium&dr=C&l=316-363

    not sure about the headers, or if it runs reconnects. No luck with Wireshark since its all over SSL.

  22. Chang

    i simulated your perl code, but met this : HTTP::Response=HASH(0x3e8c550)
    what’s this ? could you please help ?
    by the way I’m China(mainland)
    thanks a lot !

  23. Jason Smith

    Hey is it possible to process .wav file instead of .flac if yes will you please tell me how.

    Thanks in advance

  24. mike Post author

    No- from the Chrome source code, it looks like it relies on the FLAC format.

    Mike

  25. Shauzab

    please explain how to not get empty array. what key should i put in ? i have perfect flac. still i am getting empty array. the site is not even processing. it doesn’t load and gives only this : Array().
    what should i do ? please help.

  26. Uma

    Mike,
    I joined chromium-dev, got the speech API key, created a sample .flac with 8000Hz and tried the code above. I get no errors but an empty array. Can you please set up your sample .flac for download and testing?

    thanks

  27. Brad

    How do you join chromium-dev? Not sure what this means, would love help!

    Looking fwd to playing with speech API.

  28. Luke Brown

    I can confirm as a few others have suggested that signing up to the chromium-dev lists does indeed give you speech as an option. This information is valid and tested as of the date of this post. I haven’t yet tried using the API, though.

  29. Philip

    Great explanation! It works fine for me. But can I ask for some help?

    I’m trying to auto-transcribe some long audio files with this API and generate subtitle files (for searching, not for display). That is, I don’t need just one long paragraph; I need to process small chunks of audio at a time and add the text to the subtitle file.

    The problem is, when I split the audio, I inevitably split some words in half. Is there a way to continuously send short FLAC files, as sort of an audio stream, so split words are recombined before being transcribed? Could there be sort of an “open stream” mode for this, which could be manually closed by the script?

    But I would still need the individual results for each audio segment. What’s a good way to approach this?

    Thanks!

  30. awning contractor singapore

    I like what you guys tend to be up too. This type of clever work and
    reporting! Keep up the terrific works guys I’ve added you guys to
    my personal blogroll.

  31. Fanglin

    @Lucas, The referrrer key Michael means here is the Browser Key. Use that not the server key would work.

    Fanglin

  32. Marco

    Hi, I am writing my own script and have gotten past the Your client does not have permission to get URL /speech-api/full-duplex/v1/-error. When creating a key for server applications you are entering an IP-address for the api-key. This very IP-address had to be handed in my case as a query-parameter to the server(e.g.: userIp=172.25.112.216).
    I hope this help someone!
    (google tells you about the parameter when you create a new server key: Per-user limits will be enforced using the address found in each request’s userIp parameter, (if specified) – In my case public or private IP-addresses never worked).
    Now I am haggling with the up/down-stream… fun πŸ™‚

  33. Marco

    FYI – I have the “recognize”-api working. (=> https://gist.github.com/alotaiba/1730160) If you use the curl code and put in your api-key as well as your IP-address as a parameter it works just fine. As with the the “full-duplex”-api in this example – the up-stream is working with the api-key and my ip-address-parameter, but I could not get the down-stream to work…


    curl -X POST \
    --data-binary @my_file.flac \
    --user-agent 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7' \
    --header 'Content-Type: audio/x-flac; rate=8000;' \
    'https://www.google.com/speech-api/v1/recognize?client=chromium&lang=en-US&maxresults=10&key=&userIp='

  34. Chad Smith

    Sorry, just say this link at the top to this new article. Now is this the latest one?

  35. Josue

    hello!

    i’m trying to make this works and finally i found your class, but it gaves me
    Deprecated: curl_setopt(): The usage of the @filename API for file uploading is deprecated. Please use the CURLFile

    :/

  36. brian

    I am trying to post the transcription to specified plugin in wo, id like to create an interface allowing visitors to my website to click record have my server send a request to google voice API then return results to a web page. Any ideas?

  37. Mary Joy

    Hi Mike,
    Can you please provide the for sample FLAC file you used for testing?
    I did everything (got the speech API key (browser key), created a sample .flac with 8000Hz and tried the code above. ) but still got empty Array().

    Can you please help?

    Thanks

  38. teo

    I’m getting an empty array.
    Checked:
    – My flac file’s sample rate matches the parameter passed (22050).
    – I’m using a proper api key (browser key, not a server key)
    – I have activated the speech api for my application

    Is there at least a way I can debug and get more information about what’s going wrong??

  39. teo

    Ok, by converting the file to 8000Hz it worked.
    So please note: not only the sample rate must match the declared rate. Also not all sample rates work.

  40. Matteo

    I’ve found out how to overcome the 10-15 seconds limit.
    It’s not actually a duration or size limit, but Google will return an empty result if you upload the audio too quickly.
    My guess is that the service is designed to do real-time speech recognition; so if audio data is sent at a much higher-than-real-time rate, the server detects this as abuse (or is unable to process the request) and interrupts the connection returning an empty array.

    I’ve solved this by SLOWING DOWN THE UPLOAD, that is, by limiting the transfer rate to around 30 kBytes/sec, which in my case is approximately the bitrate of the audio files (this may vary depending on sample rate and other factors).
    By limiting the rate of the upload to 30 kBytes per second I’ve been able to get transcriptions of files of up to three minutes! (I haven’t tried more).

    Of course this means that the processing takes as long as the duration of the audio
    I haven’t tried to fine-tune the upload rate limit. Maybe it can be pushed pretty much higher.

    If you’re using PHP 5.4 or newer (with cURL 7.15.5 or newer), it will be easy: you just have to add these lines to Mike’s code:
    curl_setopt($this->m_up_handle, CURLOPT_MAX_SEND_SPEED_LARGE, 30000);
    curl_setopt($this->m_up_handle, CURLOPT_LOW_SPEED_TIME, 9999);
    curl_setopt($this->m_dn_handle, CURLOPT_LOW_SPEED_TIME, 9999);

    (note than the last two are needed so that the connections don’t time out on the client side. Also remember to increase PHP’s execution time limit).

    If you’re using PHP 5.3 or higher but below 5.4, limiting curl transfer rate can be achieved with the hack explained in this StackOverflow answer:
    http://stackoverflow.com/questions/21152315/is-it-possible-to-slow-down-a-php-curl-request#answer-21263055
    (note the particular answer, not the accepted one)

    For older versions of PHP it becomes more complicated…

  41. Vignesh

    I think v2 speech-api has changed πŸ™

    Your client does not have permission to get URL /speech-api/v2/recognize?output=json&lang=en-us&key=my_api_key_here from this server. Invalid key. That’s all we know.

  42. Daniel Hemmerich

    Did Google recently remove the ability to activate this API? I do not see it in my list of available APIs. I do see the “Site Verification API” from the picture in this article, but right below it is “Static Maps API”.

Leave a Reply

Your email address will not be published. Required fields are marked *