Accessing Google Speech API / Chrome 11

I’ve posted an updated version of this article here, using the new full-duplex streaming API.

Just yesterday, Google pushed version 11 of their Chrome browser into beta, and along with it, one really interesting new feature- support for the HTML5 speech input API. This means that you’ll be able to talk to your computer, and Chrome will be able to interpret it. This feature has been available for awhile on Android devices, so many of you will already be used to it, and welcome the new feature.

If you’re running Chrome version 11, you can test out the new speech capabilities by going to their simple test page on the html5rocks.com site:

http://slides.html5rocks.com/#speech-input

Genius! but how does it work? I started digging around in the Chromium source code, to find out if the speech recognition is implemented as a library built into Chrome, or, if it sends the audio back to Google to process- I know I’ve seen the Sphynx libraries in the Android build, but I was sure the latter was the case- the speech recognition was really good, and that’s really hard to do without really good language models- not something you’d be able to build into a browser.

I found the files I was looking for in the chromium source repo:

http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/

It looks like the audio is collected from the mic, and then passed via an HTTPS POST to a Google web service, which responds with a JSON object with the results. Looking through their audio encoder code, it looks like the audio can be either FLAC or Speex– but it looks like it’s some sort of specially modified version of Speex- I’m not sure what it is, but it just didn’t look quite right.

If that’s the case, there should be no reason why I can’t just POST something to it myself?

The URL listed in speech_recognition_request.cc is:

https://www.google.com/speech-api/v1/recognize

So a quick few lines of PERL (or PHP or just use wget on the command line):

#!/usr/bin/perl

require LWP::UserAgent;

my $url = "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US";
my $audio = "";

open(FILE, "<" . $ARGV[0]);
while(<FILE>)
{
    $audio .= $_;
}
close(FILE);

my $ua = LWP::UserAgent->new;

my $response = $ua->post($url, Content_Type => "audio/x-flac; rate=16000", Content => $audio);
if ($response->is_success)
{
    print $response->content;
}

1;

This quick PERL script uses LWP::UserAgent to POST the binary audio from my audio clip; I recorded a quick wav file, and then converted it to FLAC on the command line (see SoX for more info)

To run it, just do:

[root@prague mike]# ./speech i_like_pickles.flac

The response is pretty straight forward JSON:

{
    "status": 0,
    "id": "b3447b5d98c5653e0067f35b32c0a8ca-1",
    "hypotheses": [
    {
        "utterance": "i like pickles",
        "confidence": 0.9012539
    },
    {
        "utterance": "i like pickle"
    }]
}

I’m not sure if Google is intending this to be a public, usable web service API, but it works- and has all sorts of possibilities!

265 thoughts on “Accessing Google Speech API / Chrome 11”

mike Post author2013/03/06 at 10:04 am

Yeah- new versions of chrome are using a streaming service, that lets you pass chunked data longer than 40 seconds- but it’s a lot more involved than just making a single HTTP POST.

I’m working on an example to use it directly, but haven’t finished it yet.

Mike
Ramesh 2013/03/11 at 8:30 am

Thanks for replying,

Ok, I am waiting for the u r example Mike to pass chunked data longer than 40 seconds.
Johannes 2013/03/21 at 10:29 am

Note to @pozy

# curl -H “Content-Type: audio/x-flac; rate=16000″ “https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US” -F myfile=”@C:\input.flac” -k -o”C:\output.txt”

Works excellent! Just some notes:

1) copy & paste: use different quote signs

2) make sure rate=16000 corresponds to your bitrate (audacity: before recording!)

3) I had slightly better results recording mono

Does anybody has anything about:

* Done waiting for 100-continue

I’m missing this milliseconds to wait for that…
Manish 2013/03/25 at 6:13 am

Hi Mike
I come across a speech api demo from google at “http://www.google.com/intl/en/chrome/demos/speech.html” dose it uses same URL for speech recognition. (https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US)
mike Post author2013/03/29 at 7:02 pm

Hey Manish,

I believe this is using their new streaming server, not the single post-server. I’m working on an example to use this new service, but it’s much more complicated than the old single post system.

Mike
eman 2013/03/30 at 2:59 pm

after the audio is collected from the mic, does it get stored in a buffer or a certain file?? and is it possible to access this file or get hold of it?
Parikshit 2013/04/02 at 10:19 pm

Hi guys, I am trying to run the above script with my input file but i am not able to do so…
I converted a wav file into flac by using the online converter : http://www.convertfiles.com

I am getting the response as below:

{“status”:5,”id”:”89834e12f70d8f9e4d42cbf8f524bf8b-1″,”hypotheses”:[]}

My file is a simple hello world file.
mike Post author2013/04/03 at 1:22 pm

Usually when I’ve seen this, it’s because the bitrate on the content-type header doesn’t match the audio.

Mike
Pingback: 使用Google语音识别引擎（Google Speech API) | 长叶子的树
Anon 2013/04/17 at 4:51 pm

Any updates on this?
Brianb 2013/04/19 at 5:43 am

+1 for seeing chunked demo!
JJ 2013/04/19 at 12:19 pm

Hi,

When I tried send files with 10 seconds, the server returns error 500: Internal Server Error.

Can I do something?

Thanks
Anon 2013/04/29 at 4:15 pm

+1,000,000 for a demo!!!!
MaxZhang 2013/05/09 at 8:08 am

It can not recognize digits ?
I uploaded a file contains characters and digits, only characters got recognized. why?…….
Hee-dong Yoon 2013/05/14 at 6:08 am

Hello 🙂

I have a problem.In order to solve this problem, i need your help.

The problem is follow

I using ‘c++’ and make same ‘pear’ in exercise but Result is follow.

{“status”;5,”id”:”a8be65bfad9baf7f1717f3de46e3926e-1”,”hypotheses”:[]}

Dictation result is nothing and Status value is 5.

I want to know Status what it means. and I hope to know reference which described status (url)

Thank you for read
Roger 2013/05/16 at 2:07 pm

Any progress on the full duplex example?
Paul 2013/05/21 at 11:25 am

Hi Mike,

Any luck with the full-duplex api? I’m receiving nothing but Error 403’s…

Hope to hear from ya,
Paul
suryateja 2013/05/22 at 3:32 am

thanks for the post…I’ve been looking for it…But does anyone know how to make a python script out of your ruby code??
Vijay 2013/05/24 at 3:00 am

Anybody, please provide the wav format audio file which is have just hello voice. So I can test my application.
mike Post author2013/05/25 at 7:37 pm

Hey Everyone,

I haven’t had a chance to dig into the full duplex example again- swamped at the “real” job 😉

I’m hoping to have some more time later this week.

Mike
Anon 2013/06/03 at 2:15 pm

Hey Mike – have you had space open up to take on the challenge?
Sean 2013/06/07 at 3:30 pm

Is there a legit API for this yet that anyone knows of? All I see is https://www.google.com/intl/en/chrome/demos/speech.html

I’m looking to find a paid or free API for speech to text . I am transcribing phone calls so the length can vary from a minute to 30 minutes…..
robert rowntree 2013/06/14 at 7:01 pm

full duplex : https://gist.github.com/offlinehacker/5780124
robert rowntree 2013/06/14 at 7:03 pm

https://gist.github.com/offlinehacker/5780124 is gist for full duplex
robert rowntree 2013/06/15 at 9:43 am

http://src.chromium.org/svn/trunk/src/content/browser/speech/google_streaming_remote_engine.cc

server-side. that is the google service.
robert rowntree 2013/06/16 at 7:44 pm

rob@ beacon$ curl “https://www.google.com/speech-api/full-duplex/v1/down?pair=12345678901234567” & curl -X POST “https://www.google.com/speech-api/full-duplex/v1/up?lang=en-US&lm=dictation&client=chromium&pair=12345678901234567&key=…..MU” –header “Transfer-Encoding: chunked” –header “Content-Type: audio/x-flac; rate=22050” –data-binary @11.rec

{“result”:[]}
rob@ beacon$ {“result”:[{“alternative”:[{“transcript”:”hi how are you we have to go down to the store and see if we can get the groceries for this week so we can bring them back in the car”,”confidence”:0.971865}],”final”:true}],”result_index”:0}

duplex session on curl works fine
Anon 2013/06/24 at 5:15 pm

Any way to take advantage of this sans-API key? I think there’s a 50 per day limit to the key, so useless for any applications other than testing.
ildar 2013/06/25 at 7:03 am

robert rowntree: thanks for the duplex example. But I’ve got the error: Error 411 (Length Required)!!
POST requests require a Content-length header. That’s all we know.

Johannes’ example doesn’t work too, giving “status”;5

🙁
Mukund Kumar 2013/07/09 at 5:08 am

I have done the same thing but using python…but when i pass the flac file the response generated contains empty hypothesis like this
{“status”:5,”id”:”4322b1f93df3c60bbd9fd3d5c7a22eca-1″,”hypotheses”:[]}
Wat is the problem??any suggestion is welcomed….
Thanks in Advance
Pingback: Google Speech API – Full Duplex PHP Version | mike pultz
mike Post author2013/07/11 at 5:41 pm

updated version here: http://mikepultz.com/2013/07/google-speech-api-full-duplex-php-version/
mike Post author2013/07/11 at 5:41 pm

updated version here: http://mikepultz.com/2013/07/google-speech-api-full-duplex-php-version/
mike Post author2013/07/11 at 5:42 pm

Hey everyone,

I’ve posted an updated full-duplex version here: http://mikepultz.com/2013/07/google-speech-api-full-duplex-php-version/
robert rowntree 2013/07/12 at 12:48 pm

Full Log of successful CURL session at this link:

https://gist.github.com/rowntreerob/849d6a40aec758505686

That may help with some of the unexpected ( empty ) responses that have been posted.
robert rowntree 2013/07/12 at 1:06 pm

https://gist.github.com/rowntreerob/7de7a9761edf980347b1

Android sample above – the relevant parts that mimic exactly the Curl example. Android test OK with either amr-nb @8000 or with FLAC @22050. For audio streams of duration > 15 seconds , there are additional issues over rate of delivery of successive audio chunks. Note that if you are mapping bytes from a bigger audio file into memory and then feeding that block to the http IO channel on the UP channel , you may have to address a rough rate of delivery that mimics the rate of bits resulting from your choice of Encoding/bitRate for the audio stream. I did that by introducing and intermittently sleeping thread that writes about 2K at a time to the channel on the UP connection before sleeping for an appropriate, rough interval.
kumogami 2013/07/16 at 2:48 pm

Thanks for sharing. <3
Danny 2013/08/07 at 10:07 am

@robert rowntree
I have been trying to build the Android Sample you posted. However, what are the values for the variables below?
API_DOWN_URL
PAIR
API_UP_URL_p1
API_UP_URL_p2
MIN
MAX
robert rowntree 2013/08/12 at 11:12 am

@Danny

private static final long MIN = 10000000;
private static final long MAX = 900000009999999L;
PAIR = MIN + (long)(Math.random() * ((MAX – MIN) + 1L));
DOWN = root + dwn

UP = root + up_p1 + PAIR + up_p2 + api_key

https://www.google.com/speech-api/full-duplex/v1/
down?maxresults=1&pair=
up?lang=en-US&lm=dictation&client=chromium&pair=
&key=
get a key from google
robert rowntree 2013/08/12 at 11:14 am

private static final long MIN = 10000000;
private static final long MAX = 900000009999999L;
PAIR = MIN + (long)(Math.random() * ((MAX – MIN) + 1L));
DOWN = root + dwn

UP = root + up_p1 + PAIR + up_p2 + api_key

private static final long MIN = 10000000;
private static final long MAX = 900000009999999L;
PAIR = MIN + (long)(Math.random() * ((MAX – MIN) + 1L));
DOWN = root + dwn

UP = root + up_p1 + PAIR + up_p2 + api_key

https://www.google.com/speech-api/full-duplex/v1/
down?maxresults=1&pair=
up?lang=en-US&lm=dictation&client=chromium&pair=
&key=
get a key from google
robert rowntree 2013/08/12 at 11:24 am

@Danny

i added a comment to the orig. Gist that has your ans.
David Favor 2013/09/09 at 12:39 pm

The Google Speech API switch appears to have disappeared from the Google API page.

Anyone know how to get an API Key now?
Daria 2013/09/18 at 9:08 am

I try to recognize a little phraze – less than 4 seconds. tmp.flac is 16bit, 16kHz, mono. I send POST-request but have no response. No response at all. Curlib ouput “Empty reply from server”. This is my code:

FILE *f;
f = fopen(“tmp.flac”, “r”);
if (f == NULL) {
perror(NULL);
}

long uploaded_len = 0;
std::string filecontent = “”;
if (f )
{
fseek ( f, 0, SEEK_END );
uploaded_len = ftell( f );
fclose (f);
f = fopen(“tmp.flac”, “rb”);
char c = 0;
for(int i = 0; i < uploaded_len; ++i) {
fread( &c, sizeof(char), 1, f);
filecontent += c;
}
fclose( f );
}

_curl = curl_easy_init();
CURLcode res;

if( _curl && f ) {
curl_easy_setopt(_curl, CURLOPT_ERRORBUFFER, errorBuffer);
curl_easy_setopt(_curl, CURLOPT_URL, "https://www.google.com/speech-api/v1/recognize?lang=ru-RU&xjerr=1&client=chromium");
curl_easy_setopt(_curl, CURLOPT_VERBOSE, 1);
curl_easy_setopt(_curl, CURLOPT_POST, 1);
curl_easy_setopt(_curl, CURLOPT_POSTFIELDS, filecontent.c_str());

struct curl_slist *url_list = 0;
url_list = curl_slist_append(url_list, "Content-type: audio/x-flac; rate=16000");

char lng[256];
sprintf(lng, "%i", uploaded_len);
std::string tmpstr = "Content-length: " + std::string(lng);
url_list = curl_slist_append(url_list, tmpstr.c_str());

curl_easy_setopt(_curl, CURLOPT_POSTFIELDS, filecontent.c_str());
curl_easy_setopt(_curl, CURLOPT_SSL_VERIFYPEER, false);
curl_easy_setopt(_curl, CURLOPT_HEADER, 1);
curl_easy_setopt(_curl, CURLOPT_HTTPHEADER, url_list);
curl_easy_setopt(_curl, CURLOPT_WRITEFUNCTION, writer);
curl_easy_setopt(_curl, CURLOPT_WRITEDATA, &buffer);
res = curl_easy_perform(_curl);

if (res == CURLE_OK) {
std::cout << buffer << "\n";
}
else {
std::cout << "Error! " << errorBuffer << std::endl;
}

curl_slist_free_all(url_list );
curl_easy_cleanup(_curl);
}

What's wrong? Why I have empty reply from server?
Daria 2013/09/19 at 2:29 am

I found that I was doing wrong. I missed the following string:
curl_easy_setopt(_curl, CURLOPT_POSTFIELDSIZE, uploaded_len);
The request should contain information about file size, but mentioning it in headers doesn’t work.
Chang 2013/09/24 at 12:52 am

i simulated your perl code, but met this : HTTP::Response=HASH(0x3e8c550)
what’s this ? could you please help ?
by the way I’m China(mainland)
thanks a lot !
mike Post author2013/09/28 at 10:39 am

Hey Chang,

I’m not sure what that is- usually when it says that, it’s a PERL HASH object- you could try taking it and iterating through it like a HASH and see what is looks like? or use Data::Dumper to print it out directly.

Mike
irux 2013/10/20 at 11:05 pm

Hi , i try to do that with curl , but when the server respons me i recive words that i didnt say . I recorded the audio very well and it doesnt work.
Nissim 2013/10/22 at 6:22 am

Hello Speech recognition Expert ,
As hobby, I like to run Google Speech recognition on my small Linux box. The audio comes from USB audio Microphone as 16Khz 16Bit.
It can be flac file compressed, I presume.
Do you have experience of that ?
If yes, can I hire someone for this help ?
Please call my Skype: nissm.test
Thank you
Nissim.zur@gmail.com
Marcus 2013/11/06 at 12:49 pm

How would it be with wget? Thanks.
Pingback: 使用Google语音识别引擎（Google Speech API） |
Shobhit 2014/02/27 at 7:54 pm

It works really well with very high accuracy almost 100%. But it is too slow. It takes lot of time to get a reply from the server. I used wget to fire the request. Has anyone else experienced the same?

mike pultz

personal and professional blog of mike pultz, technology specialist and serial entrepreneur.

Accessing Google Speech API / Chrome 11

265 thoughts on “Accessing Google Speech API / Chrome 11”

Leave a Reply