Accessing Google Speech API / Chrome 11

I’ve posted an updated version of this article here, using the new full-duplex streaming API.

Just yesterday, Google pushed version 11 of their Chrome browser into beta, and along with it, one really interesting new feature- support for the HTML5 speech input API. This means that you’ll be able to talk to your computer, and Chrome will be able to interpret it. This feature has been available for awhile on Android devices, so many of you will already be used to it, and welcome the new feature.

If you’re running Chrome version 11, you can test out the new speech capabilities by going to their simple test page on the html5rocks.com site:

http://slides.html5rocks.com/#speech-input

Genius! but how does it work? I started digging around in the Chromium source code, to find out if the speech recognition is implemented as a library built into Chrome, or, if it sends the audio back to Google to process- I know I’ve seen the Sphynx libraries in the Android build, but I was sure the latter was the case- the speech recognition was really good, and that’s really hard to do without really good language models- not something you’d be able to build into a browser.

I found the files I was looking for in the chromium source repo:

http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/

It looks like the audio is collected from the mic, and then passed via an HTTPS POST to a Google web service, which responds with a JSON object with the results. Looking through their audio encoder code, it looks like the audio can be either FLAC or Speex– but it looks like it’s some sort of specially modified version of Speex- I’m not sure what it is, but it just didn’t look quite right.

If that’s the case, there should be no reason why I can’t just POST something to it myself?

The URL listed in speech_recognition_request.cc is:

https://www.google.com/speech-api/v1/recognize

So a quick few lines of PERL (or PHP or just use wget on the command line):

#!/usr/bin/perl

require LWP::UserAgent;

my $url = "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US";
my $audio = "";

open(FILE, "<" . $ARGV[0]);
while(<FILE>)
{
    $audio .= $_;
}
close(FILE);

my $ua = LWP::UserAgent->new;

my $response = $ua->post($url, Content_Type => "audio/x-flac; rate=16000", Content => $audio);
if ($response->is_success)
{
    print $response->content;
}

1;

This quick PERL script uses LWP::UserAgent to POST the binary audio from my audio clip; I recorded a quick wav file, and then converted it to FLAC on the command line (see SoX for more info)

To run it, just do:

[root@prague mike]# ./speech i_like_pickles.flac

The response is pretty straight forward JSON:

{
    "status": 0,
    "id": "b3447b5d98c5653e0067f35b32c0a8ca-1",
    "hypotheses": [
    {
        "utterance": "i like pickles",
        "confidence": 0.9012539
    },
    {
        "utterance": "i like pickle"
    }]
}

I’m not sure if Google is intending this to be a public, usable web service API, but it works- and has all sorts of possibilities!

265 thoughts on “Accessing Google Speech API / Chrome 11”

rani 2012/07/24 at 12:09 am

Hey m working in java and getting same output as mike ie.
{“status”:5,”id”:”f90432629c25087d95ffd780f7d52838-1″,”hypotheses”:[]}

please help:(
ax206geek 2012/07/24 at 8:22 pm

This was executed from a Raspberry Pi: https://gist.github.com/3023698. Yes, this works on the Pi too
rani 2012/07/24 at 11:54 pm

Sorry i didn’t understood 🙁
var 2012/07/25 at 10:40 pm

great work!

but I’m sad because speech api can not be done long sentence recognition…
rani 2012/07/25 at 11:11 pm

Ya i know it canno’t done done for long sentence i just want to try for small sentences please help me please.

Awaiting reply……
shiva 2012/07/26 at 11:24 am

How about applying that code to c#?? can anyone help
Al 2012/07/26 at 12:58 pm

I implemented the following script that split audio files into chunnks of 8 secs and send it tot the service. If the response code was not 200, I retry 3 times after waiting .5 seconds in between. This script uses sox, and assumes that tmp directory is writable . This is not the best implementaion and variables were created out of convince..
Hope this helps:

$file_name = /path/to/file
$url = “http://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”;
// Convert ot flac
$converted_file_name = ‘/tmp/audio_flac.flac’;
$conversion_command = “sox $filename $converted_file_name”;
exec($conversion_command);
// get the lendth of the flac file
$length_command = “soxi -D $converted_file_name” ;

exec($length_command, $out, $ret);
$length = $out;
$length = $length[0];
$i = 0;
$x = 0;
// Trimmed file legnth
$trim_lenght = 8;

$result = array();
// the file that will hold the trimmed audio data
$temp_trimmed_file = ‘/tmp/trimmed.flac’;
// Temp file to convert the sample rate to 16000
$temp_trimmed_file_2 = ‘/tmp/trimmed_rate.flac’;
while( $i < $length) {
$command = "sox $converted_file_name $temp_trimmed_file trim 0:$i 0:$trim_lenght";
exec($command);
// Convert to 16000 sample rate
$command2 = "sox $temp_trimmed_file -r 16000 $temp_trimmed_file_2";
exec($command2);

$audio = file_get_contents($temp_trimmed_file_2);
$speech_info_request = curl_init();
curl_setopt($speech_info_request, CURLOPT_URL, $url);
curl_setopt($speech_info_request, CURLOPT_HTTPHEADER, array('Content-Type: audio/x-flac; rate=16000' ));
curl_setopt($speech_info_request, CURLOPT_POST, TRUE);
curl_setopt($speech_info_request, CURLOPT_POSTFIELDS, $audio);
curl_setopt($speech_info_request, CURLOPT_RETURNTRANSFER, 1);
$speech_info_response = curl_exec($speech_info_request);

$responseCode = curl_getinfo($speech_info_request,CURLINFO_HTTP_CODE);

if($responseCode==200) {
$jsonObj = json_decode($speech_info_response, TRUE);
$result[$x] = $jsonObj['hypotheses'][0]['utterance'];
}
elseif($responseCode != 200) {
// Try 3 times
$count = 0;
while($count $value) {
$script .= ‘ ‘. $value;
}
return $script;
}
Al 2012/07/26 at 1:54 pm

I am not sure why the code was chopped in my preivous comment. Here it is again in a function:
function google_transcribe($pathtofile) {
$filename = $pathtofile;
$url = “http://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”;
// Convert ot flac
$converted_file_name = ‘/tmp/audio_flac.flac’;
$conversion_command = “sox $filename $converted_file_name”;
exec($conversion_command);
// get the lendth of the flac file
$length_command = “soxi -D $converted_file_name” ;

exec($length_command, $out, $ret);
$length = $out;
$length = $length[0];
$i = 0;
$x = 0;
// Trimmed file legnth
$trim_lenght = 8;

$result = array();

$temp_trimmed_file = ‘/tmp/trimmed.flac’;
// Temp file to convert the sample rate to 16000
$temp_trimmed_file_2 = ‘/tmp/trimmed_rate.flac’;
while( $i < $length) {
$command = "sox $converted_file_name $temp_trimmed_file trim 0:$i 0:$trim_lenght";
exec($command);
// Convert to 16000 sample rate
$command2 = "sox $temp_trimmed_file -r 16000 $temp_trimmed_file_2";
exec($command2);

$audio = file_get_contents($temp_trimmed_file_2);
$speech_info_request = curl_init();
curl_setopt($speech_info_request, CURLOPT_URL, $url);
curl_setopt($speech_info_request, CURLOPT_HTTPHEADER, array('Content-Type: audio/x-flac; rate=16000' ));
curl_setopt($speech_info_request, CURLOPT_POST, TRUE);
curl_setopt($speech_info_request, CURLOPT_POSTFIELDS, $audio);
curl_setopt($speech_info_request, CURLOPT_RETURNTRANSFER, 1);
$speech_info_response = curl_exec($speech_info_request);

$responseCode = curl_getinfo($speech_info_request,CURLINFO_HTTP_CODE);

if($responseCode==200) {
$jsonObj = json_decode($speech_info_response, TRUE);
$result[$x] = $jsonObj['hypotheses'][0]['utterance'];
}
elseif($responseCode != 200) {
// Try 3 times
$count = 0;
while($count $value) {
$script .= ‘ ‘. $value;
}
return $script;
}
Dave 2012/07/26 at 4:52 pm

Hi Al,

I apologize in advance for this dumb question.. but how do I run the script you posted?? What environment should I be in?

Thanks!
Dave 2012/07/27 at 10:22 am

Sorry – let me rephrase.. How do you run the function? Is there a basic script I can call to do so? Can you provide sample syntax for running it?
Cupidvogel 2012/08/04 at 5:27 am

Hi, another question, how did you obtain the URL of the page to where the POST request is made? It is not mentioned anywhere in the source code of the speech test page, plus the source code doesn’t have a form element inside which the input can be found, so how can a POST request be made anyway? Plus where did you get the additional parameters to append to the form-action-page URL in your code?
Pingback: Proxy Servers » Google Speech API doesn’t give correct result when audio is sent in file
Pingback: Google Chrome Speech-To-Text API Kullanımı | GDG ANKARA
Amin 2012/08/06 at 5:04 pm

Hey Mike,
Great post here. Can you explain how Google now works? I assume it uses this API as a module within a bigger software that converts the searched results into speech? Right?
mike Post author2012/08/06 at 6:55 pm

It’s listed in the Chrome source code:

http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/google_one_shot_remote_engine.cc?view=markup

Mike
mike Post author2012/08/06 at 7:00 pm

Well, I outlined exactly how it works in the post, but

Chrome records your audio via your mic, converts it to FLAC, and then does an HTTP POST to their system for the conversion, and then returns a JSON object with the result.

What they use on their side to do the actual VR, I have no idea.

Mike
ankkit 2012/08/14 at 5:11 am

do you know what all languages this supports? is it the same list as http://support.google.com/translate/
Pingback: Dolphin Browser – navegue sem as mãos | danielcamargo.com
appsforios 2012/08/28 at 2:02 pm

Has anyone successfully been able to contact Google for permission to use this in a commercial app?!?!?!
Pingback: Does Anyone Uses Google Speech API in Production? | appsgoogleplus.com
Omar BELKHODJA 2012/09/21 at 12:37 pm

Found something owesome ! Everything is provided as a shell script, but no time to test 🙁

http://wiki.openmoko.org/wiki/Google_Voice_Recognition
Tim Wood 2012/10/01 at 5:45 pm

@sushant Google is your best bet – great first result http://www.freefileconvert.com – if you’re looking for automated tool on your server try the flac libs http://flac.sourceforge.net/
4mla1fn 2012/10/17 at 3:49 pm

is this service still available? i tried the wget examples (on linux) but all it says is “connecting to http://www.google.com|173.194.73.104|:433…”. i also tried the php example mike and others have posted, but the return is empty.
calvin 2012/12/10 at 9:44 pm

Yes, it is still working, I copied the html to a local file and it ran great! Thanks.
dky 2012/12/22 at 7:39 am

Anyone could upload a file to test the script?
Thanks in advice
pozy 2012/12/27 at 10:46 pm

No need to use script , just convert the WAV into flac format and use the curl.exe with the example below and you will receive the JSON format from Google, and then parse back the JSON

curl -H “Content-Type: audio/x-flac; rate=16000” “https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US” -F myfile=”@C:\input.flac” -k -o”C:\output.txt”
Luke 2012/12/30 at 6:58 pm

I’ve tried this, but I get an empty response (“hypotheses”: [])

Is there any way you can post the sound file that’s known to work?

**********CAN YOU UPLOAD i_like_pickles.flac FOR US????

I created my Flac file with LAME instead of SOX, so the problem could be with that. I installed SOX successfully, but the SOX command for some reason does not work. LAME works great for MP3, but I’m not sure if LAME is actually encoding to MP3 and just giving the file a .flac extension! Thanks
Manish 2012/12/31 at 11:26 am

Hi,
Can any one suggest how can i use this in iOS (objective-C).Right now i am using this and it gives me error “Content-Type media type is not audio” error 400.

Here is code that i am using

-(void)SpeechFromGoogle{
NSURL *url = [NSURL URLWithString:@”https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”];

ASIFormDataRequest *request = [ASIFormDataRequest requestWithURL:url];
NSString *homeDirectory = [NSHomeDirectory() stringByAppendingPathComponent:@”Documents”];
NSString *filePath = [NSString stringWithFormat:@”%@/%@”, homeDirectory, @”testFLAC.flac”];

NSData *myData = [NSData dataWithContentsOfFile:filePath];
[request addPostValue:myData forKey:@”Content”];
[request addPostValue:@”audio/x-flac; rate=16000″ forKey:@”Content-Type”];
[request startSynchronous];

NSLog(@”req: %@”, [request responseString]);
}
mike Post author2012/12/31 at 4:43 pm

The Content-Type header needs to be added as a header, not a value:

[request addRequestHeader:@”Content-Type” value:@”audio/x-flac; rate=16000″];

Mike
Manish 2013/01/01 at 6:01 am

Thanks for quick reply Mike but still have same error.here is exactly what i am getting on console when updating changes..
Error:

Content-Type media type is not audio

Content-Type media type is not audio
Error 400
mike Post author2013/01/01 at 7:46 pm

Hey Manish,

Anytime that error has come up, it’s been because the Content-Type header isn’t being set properly. There’s obviously no documentation on this, so there’s no way to be sure.

Mike
Manish 2013/01/02 at 5:01 am

Thanks Mike ,
Will look at it and get back to you if i am done.
Manish 2013/01/03 at 7:40 am

Hi Mike,
Here is full working example for objective-c

-(void)SpeechFromGoogle{

NSString *homeDirectory = [NSHomeDirectory() stringByAppendingPathComponent:@”Documents”];
NSString *filePath = [NSString stringWithFormat:@”%@/%@”, homeDirectory, @”test.flac”];

NSData *myData = [NSData dataWithContentsOfFile:filePath];

NSMutableURLRequest *request = [[NSMutableURLRequest alloc]
initWithURL:[NSURL
URLWithString:@”https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”]];

[request setHTTPMethod:@”POST”];

//set headers

[request addValue:@”Content-Type” forHTTPHeaderField:@”audio/x-flac; rate=16000″];

[request addValue:@”audio/x-flac; rate=16000″ forHTTPHeaderField:@”Content-Type”];

[request setHTTPBody:myData];

[request setValue:[NSString stringWithFormat:@”%d”,[myData length]] forHTTPHeaderField:@”Content-length”];

NSHTTPURLResponse* urlResponse = nil;
NSError *error = [[NSError alloc] init];
NSData *responseData = [NSURLConnection sendSynchronousRequest:request returningResponse:&urlResponse error:&error];
NSString *result = [[NSString alloc] initWithData:responseData encoding:NSUTF8StringEncoding];

NSLog(@”The answer is: %@”,result);
}
Pingback: Google Speech Api使用 | 自留地
Nick 2013/01/10 at 7:23 am

Hi Mike,

How can I connect the google speech api in vb?

Do you know how to connect the google speech api in vb?

thanks..
sandeep 2013/01/10 at 11:54 am

Can I get php code for same,

Thanks
MiBo 2013/01/11 at 6:25 am

I think I’m doing something wrong. The code you provided Manish works perfectly but when I get the response is never what I said in the audio file. I recorded wav and then transformed it to flac. Could it be that I’m doing something wrong when transforming? How did you guys transformed/record the audio file?
mike Post author2013/01/11 at 9:33 am

Sandeep, Nick- You should look back in the comments- there have been many examples in many languages.

Mike
Manish 2013/01/17 at 11:01 am

Hi MiBo ,
Try some short duration recording first.I use audacity for recording.
Binev 2013/01/22 at 6:04 am

Does it still work because when I click at the link provided (https://www.google.com/speech-api/v1/recognize) it gives me a response: HTTP method GET is not supported by this URL

Error 405
I want to use the api for C++ or C# application. Is it still accessible?
mike Post author2013/01/22 at 8:50 am

You have to POST to the URL.

Mike
NIck 2013/01/22 at 9:09 am

hi mike,

where is the previous comments? I cannot view the past comments..

thanks,,

=D
mike Post author2013/01/22 at 12:44 pm

near the bottom of the post, there’s a “Older Comments” link

http://mikepultz.com/2011/03/accessing-google-speech-api-chrome-11/#comments
Anon 2013/02/03 at 1:22 pm

Has anyone successfully gotten the full-duplex url working, with streaming audio? If so, could you please provide a code sample?
sandeep 2013/02/07 at 8:22 am

Thanks guys!

I am trying to access it using Javascript/ Jquery. But i Could not. Do anyone knows to call google API server using JS/JQuery ?
mike Post author2013/02/07 at 9:52 am

Hey Sandeep,

I’m not sure how you’d access it via AJAX- XSS aside, you need to be able to post RAW audio data.

Mike
Pingback: wget for Mac OSX; Google speech API | Big Old Geek
rowntreerob 2013/02/13 at 7:52 pm

https://github.com/GoogleChrome/webplatform-samples/blob/master/webspeechdemo/webspeechdemo.html

has some newer code.
Ramesh 2013/03/04 at 8:37 am

Hi Mike,
Thanks for the tutorial , I have done with both method, i.e by using curl method on window and using sox to convert wav fiel to flac then wget to get txt file on ubuntu. it worked fien for me….
But by using both method if audio size is more 40 sec , it doesn’t produce any thing , simply blank txt file.
Is it drawback off this api or any other reason…

SO I need that there is any method from which is it possible to convert speech to txt,
or method to split audio files on ubuntu using Sox or using any other command so that I can split it and given i/p to google api
James 2013/03/06 at 9:20 am

I am getting the 40 second time limit as well – is there an API that allows larger file lengths? I’d love to be able to transcribe my voicemails through this service.

mike pultz

personal and professional blog of mike pultz, technology specialist and serial entrepreneur.

Accessing Google Speech API / Chrome 11

265 thoughts on “Accessing Google Speech API / Chrome 11”

Leave a Reply