So this is a follow up to my post a while ago, talking about how to use the Google Speech Recognition API built in to Google Chrome.
Since my last post, Chrome has had some significant upgrades to this feature- specifically around the length of audio you can pass to the API. The old version would only let you pass very short clips (only a few seconds), but the new API is a full-duplex streaming API. What this means, is that it actually uses two HTTP connections- one POST request to upload the content as a “live” chunked stream, and a second GET request to access the results, which makes much more sense for longer audio samples, or for streaming audio.
I created a simple PHP class to access this API; while this likely won’t make sense for anybody that wants to do a real-time stream, it should satisfy most cases where people just want to send “longer” audio clips.
Before you can use this PHP class, you must get a developer API key from Google. The class does not include one, and I cannot give you one- they’re free, and easy to get just go to the Google APIs site, and sign up for one.
Then download the class below, and start with a simple example:
<? require 'google_speech.php'; $s = new cgoogle_speech('put your API key here'); $output = $s->process('@test.flac', 'en-US', 8000); print_r($output); ?>
Audio can be passed as a filename (by prefixing the ‘@’ sign in front of the file name), or by passing in raw FLAC content. The second argument is an IETF language tag. I’ve only been able to test with both English and French, but I assume others work. It defaults to ‘en-US’. The third argument is sample rate, it defaults to 8000.
** Your sample rate must match your file- if it doesn’t, you’ll either get nothing returned, or you’ll get a really bad transcription. **
The output will return as an array, and should look something like this:
Array ( [0] => Array ( [alternative] => Array ( [0] => Array ( [transcript] => my CPU is a neural net processor a learning computer [confidence] => 0.74177068 ) [1] => Array ( [transcript] => my CPU is the neuron that process of learning ) [2] => Array ( [transcript] => my CPU is the neural net processor a learning ) [3] => Array ( [transcript] => my CPU is the neuron that process a balloon ) [4] => Array ( [transcript] => my CPU is the neural net processor a living ) ) [final] => 1 ) )
Get the PHP class here:Β http://mikepultz.com/uploads/google_speech.php.zip
Pingback: Accessing Google Speech API / Chrome 11 | mike pultz
I wanted to receive a developer key from Google, but speech API service already there isn’t present!
Same problem here. There is no Speech API available in the Google APIs Console.
http://imageshack.us/photo/my-images/577/umry.png/
Could it be that it’s only available for US users?
It’s possible it’s only US & Canada (I’m in Canada)- it doesn’t say anything specifically on the site though, so I can’t be 100% sure.
You could try creating and logging in with a Google account using a US web proxy, and see if the option shows up.
Mike
I have used a SOCKS connection (using an instance from US located Amazon AWS as proxy) to no avail. I suspect that Google is opening that Speech API keys to a selected group of users. Mike, do you know some other user with that Speech API available in their console? Thanks.
join the chromium-dev list and the speech api option will be available π
Got an API key but it just keeps returning an empty array. Any ideas on where to troubleshoot?
Thanks
Hey James,
I would start by confirming that you’re audio is correctly encoded as FLAC, and that the bitrate of your audio file, correctly matches what you’re passing to the library.
If all of that is correct, I would try a passing a simple 16 or 8 kbps sample and see if that does it.
Mike
@Rohit: you were right! I joined chromium-dev list and the speech api option became available. Now, I’ll give it a try to Mike’s PHP class. Thanks!
Pingback: Google Speech API
I am also getting an empty array returned. I have verified the file I am attempting to process is flac with 8khz:
Input File : ‘mt1KxV.flac’
Channels : 1
Sample Rate : 8000
Precision : 16-bit
Duration : 00:00:08.44 = 67520 samples ~ 633 CDDA sectors
File Size : 39.6k
Bit Rate : 37.5k
Sample Encoding: 16-bit FLAC
Comment : ‘Comment=Processed by SoX’
Any thoughts?
PS. When I check the API query quota, it doesn’t show a change. Makes me think either: 1) the quota doesn’t update in real-time, or 2) it’s just not getting processed…
thoughts?
…figured it out. I was trying to use a server key since I am using this with an asterisk server, however, looking at how you make the request, it looks like a referrer key (the default) is what’s required.
Got it working. Great code.
Thanks.
I have managed to get this working but don’t understand the advantages of this being duplex since the API does not seem to allow pipelining and the response seems to be ready at the time of post completion. Because the GET request must be made before the POST, it seems like they’re requiring an extra socket connection rather than just one.
I haven’t played with it too much, but I assume the value of two sockets, is so you could *technically* do “live”- well- “continuous” streaming. With the assumption that it would return multiple chunks as responses- maybe based on silence events, etc.
The PHP class I wrote doesn’t support this- so the only value in using this new API is 1) I assume at some point they will discontinue the old service, and 2) you can send longer audio clips.
Has anybody tried sending *really* long audio files? or sent it continuous streams?
Mike
It is working, but always returns error 403:
=================
Error 403 (Forbidden)!!1
*{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}
403. That’s an error.
Your client does not have permission to get URL
/speech-api/full-duplex/v1/down?pair=1331547998992979
from this server. That’s all we know.=================
What should I do to fix this issue? Speech API is allowed in console, and I am using Browser API Key.
Thank you in advance.
Hi @Michael,
I also got empty array, but I didn’t understand what “server key” and “referrer key” would be. Would you mind explaining more about it? Or someone who knows about it.
I think my situation is like yours: flac file, rate= 8000 ; API query quota didn’t show any change.
I got the flac file from a .wav file. I used a java code to convert. The java code allows to set the flac’s rate.
Thank you
Please start using the term “sample rate” rather than “bit rate” π
Does someone have a sound file (.flac) which I could use like an example in order to check if my sound file is wrong?
Thank you
Hey Lucas. To get a .flac file you can use to test the code with, do the following:
– Go to http://audio.online-convert.com/convert-to-flac,
– Give it an mp3 with some example speech, by clicking on the ‘Choose file’ button,
– Put in the settings 16 Bit, 8000 Hz, and Mono (in that order),
– Press the ‘Convert’ button. If you choose a large mp3, this could take a couple minutes.
– When the page gives you the option to save the new .flac file, maybe call it something the PHP file expects like ‘test.flac’.
Happy coding π
Yup, you’re right- typo.
Mike
This seems to be limited to 10 – 15 second audio files. I get an Internal Server Error 500 if I send anything longer. Is anyone else seeing this?
@Dorian, thank you very much!! I got an excellent flac sound file
The API only accepts audio files of 15 seconds or less.” Is there any way to make it longer? I would like to convert longer audio files to text like 2 or 3 minutes. Has anyone tried it yet?
I’ve been using this python version, and have seen the same problem… when I try to stream a file of 1 min in lenghth, the connection drops around the 15 second point.
https://gist.github.com/offlinehacker/5780124
There’s something else going on that we’re not aware of. There HTTP parameters are listed pretty clearly in the Chrome code…
https://code.google.com/p/chromium/codesearch#chromium/src/content/browser/speech/google_streaming_remote_engine.cc&q=full-duplex&sq=package:chromium&dr=C&l=316-363
not sure about the headers, or if it runs reconnects. No luck with Wireshark since its all over SSL.
i simulated your perl code, but met this : HTTP::Response=HASH(0x3e8c550)
what’s this ? could you please help ?
by the way I’m China(mainland)
thanks a lot !
Hey is it possible to process .wav file instead of .flac if yes will you please tell me how.
Thanks in advance
No- from the Chrome source code, it looks like it relies on the FLAC format.
Mike
please explain how to not get empty array. what key should i put in ? i have perfect flac. still i am getting empty array. the site is not even processing. it doesn’t load and gives only this : Array().
what should i do ? please help.
Mike,
I joined chromium-dev, got the speech API key, created a sample .flac with 8000Hz and tried the code above. I get no errors but an empty array. Can you please set up your sample .flac for download and testing?
thanks
How do you join chromium-dev? Not sure what this means, would love help!
Looking fwd to playing with speech API.
I can confirm as a few others have suggested that signing up to the chromium-dev lists does indeed give you speech as an option. This information is valid and tested as of the date of this post. I haven’t yet tried using the API, though.
Hi Brad,
To join the chromium-dev group you need to go here: https://groups.google.com/a/chromium.org/forum/#!forum/chromium-dev
With your Google account already set, you will see a button saying “join the group to post” (something like that)
Great explanation! It works fine for me. But can I ask for some help?
I’m trying to auto-transcribe some long audio files with this API and generate subtitle files (for searching, not for display). That is, I don’t need just one long paragraph; I need to process small chunks of audio at a time and add the text to the subtitle file.
The problem is, when I split the audio, I inevitably split some words in half. Is there a way to continuously send short FLAC files, as sort of an audio stream, so split words are recombined before being transcribed? Could there be sort of an “open stream” mode for this, which could be manually closed by the script?
But I would still need the individual results for each audio segment. What’s a good way to approach this?
Thanks!
I like what you guys tend to be up too. This type of clever work and
reporting! Keep up the terrific works guys I’ve added you guys to
my personal blogroll.
@Lucas, The referrrer key Michael means here is the Browser Key. Use that not the server key would work.
Fanglin
Bummer.. It only allows 50 requests/day.
Hi, I am writing my own script and have gotten past the
Your client does not have permission to get URL /speech-api/full-duplex/v1/
-error. When creating akey for server applications
you are entering an IP-address for the api-key. This very IP-address had to be handed in my case as a query-parameter to the server(e.g.:userIp=172.25.112.216
).I hope this help someone!
(google tells you about the parameter when you create a new server key: Per-user limits will be enforced using the address found in each request’s userIp parameter, (if specified) – In my case public or private IP-addresses never worked).
Now I am haggling with the up/down-stream… fun π
FYI – I have the “recognize”-api working. (=> https://gist.github.com/alotaiba/1730160) If you use the curl code and put in your api-key as well as your IP-address as a parameter it works just fine. As with the the “full-duplex”-api in this example – the up-stream is working with the api-key and my ip-address-parameter, but I could not get the down-stream to work…
curl -X POST \
--data-binary @my_file.flac \
--user-agent 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7' \
--header 'Content-Type: audio/x-flac; rate=8000;' \
'https://www.google.com/speech-api/v1/recognize?client=chromium&lang=en-US&maxresults=10&key=&userIp='
Hello all,
Did anybody do a port of the PHP class to Java ?
Thanks!
There is no “speech api” in the console. I’m in the USA…?
Sorry, just say this link at the top to this new article. Now is this the latest one?
hello!
i’m trying to make this works and finally i found your class, but it gaves me
Deprecated: curl_setopt(): The usage of the @filename API for file uploading is deprecated. Please use the CURLFile
:/
I am trying to post the transcription to specified plugin in wo, id like to create an interface allowing visitors to my website to click record have my server send a request to google voice API then return results to a web page. Any ideas?
Hi Mike,
Can you please provide the for sample FLAC file you used for testing?
I did everything (got the speech API key (browser key), created a sample .flac with 8000Hz and tried the code above. ) but still got empty Array().
Can you please help?
Thanks
I’m getting an empty array.
Checked:
– My flac file’s sample rate matches the parameter passed (22050).
– I’m using a proper api key (browser key, not a server key)
– I have activated the speech api for my application
Is there at least a way I can debug and get more information about what’s going wrong??
Ok, by converting the file to 8000Hz it worked.
So please note: not only the sample rate must match the declared rate. Also not all sample rates work.
I’ve found out how to overcome the 10-15 seconds limit.
It’s not actually a duration or size limit, but Google will return an empty result if you upload the audio too quickly.
My guess is that the service is designed to do real-time speech recognition; so if audio data is sent at a much higher-than-real-time rate, the server detects this as abuse (or is unable to process the request) and interrupts the connection returning an empty array.
I’ve solved this by SLOWING DOWN THE UPLOAD, that is, by limiting the transfer rate to around 30 kBytes/sec, which in my case is approximately the bitrate of the audio files (this may vary depending on sample rate and other factors).
By limiting the rate of the upload to 30 kBytes per second I’ve been able to get transcriptions of files of up to three minutes! (I haven’t tried more).
Of course this means that the processing takes as long as the duration of the audio
I haven’t tried to fine-tune the upload rate limit. Maybe it can be pushed pretty much higher.
If you’re using PHP 5.4 or newer (with cURL 7.15.5 or newer), it will be easy: you just have to add these lines to Mike’s code:
curl_setopt($this->m_up_handle, CURLOPT_MAX_SEND_SPEED_LARGE, 30000);
curl_setopt($this->m_up_handle, CURLOPT_LOW_SPEED_TIME, 9999);
curl_setopt($this->m_dn_handle, CURLOPT_LOW_SPEED_TIME, 9999);
(note than the last two are needed so that the connections don’t time out on the client side. Also remember to increase PHP’s execution time limit).
If you’re using PHP 5.3 or higher but below 5.4, limiting curl transfer rate can be achieved with the hack explained in this StackOverflow answer:
http://stackoverflow.com/questions/21152315/is-it-possible-to-slow-down-a-php-curl-request#answer-21263055
(note the particular answer, not the accepted one)
For older versions of PHP it becomes more complicated…
I think v2 speech-api has changed π
Your client does not have permission to get URL
/speech-api/v2/recognize?output=json&lang=en-us&key=my_api_key_here
from this server. Invalid key. Thatβs all we know.Did Google recently remove the ability to activate this API? I do not see it in my list of available APIs. I do see the “Site Verification API” from the picture in this article, but right below it is “Static Maps API”.
@Daniel Hemmerich I think no, I can see “Speech API” there.