I’ve posted an updated version of this article here, using the new full-duplex streaming API.
Just yesterday, Google pushed version 11 of their Chrome browser into beta, and along with it, one really interesting new feature- support for the HTML5 speech input API. This means that you’ll be able to talk to your computer, and Chrome will be able to interpret it. This feature has been available for awhile on Android devices, so many of you will already be used to it, and welcome the new feature.
If you’re running Chrome version 11, you can test out the new speech capabilities by going to their simple test page on the html5rocks.com site:
http://slides.html5rocks.com/#speech-input
Genius! but how does it work? I started digging around in the Chromium source code, to find out if the speech recognition is implemented as a library built into Chrome, or, if it sends the audio back to Google to process- I know I’ve seen the Sphynx libraries in the Android build, but I was sure the latter was the case- the speech recognition was really good, and that’s really hard to do without really good language models- not something you’d be able to build into a browser.
I found the files I was looking for in the chromium source repo:
http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/
It looks like the audio is collected from the mic, and then passed via an HTTPS POST to a Google web service, which responds with a JSON object with the results. Looking through their audio encoder code, it looks like the audio can be either FLAC or Speex– but it looks like it’s some sort of specially modified version of Speex- I’m not sure what it is, but it just didn’t look quite right.
If that’s the case, there should be no reason why I can’t just POST something to it myself?
The URL listed in speech_recognition_request.cc is:
https://www.google.com/speech-api/v1/recognize
So a quick few lines of PERL (or PHP or just use wget on the command line):
#!/usr/bin/perl require LWP::UserAgent; my $url = "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US"; my $audio = ""; open(FILE, "<" . $ARGV[0]); while(<FILE>) { $audio .= $_; } close(FILE); my $ua = LWP::UserAgent->new; my $response = $ua->post($url, Content_Type => "audio/x-flac; rate=16000", Content => $audio); if ($response->is_success) { print $response->content; } 1;
This quick PERL script uses LWP::UserAgent to POST the binary audio from my audio clip; I recorded a quick wav file, and then converted it to FLAC on the command line (see SoX for more info)
To run it, just do:
[root@prague mike]# ./speech i_like_pickles.flac
The response is pretty straight forward JSON:
{ "status": 0, "id": "b3447b5d98c5653e0067f35b32c0a8ca-1", "hypotheses": [ { "utterance": "i like pickles", "confidence": 0.9012539 }, { "utterance": "i like pickle" }] }
I’m not sure if Google is intending this to be a public, usable web service API, but it works- and has all sorts of possibilities!
Hey m working in java and getting same output as mike ie.
{“status”:5,”id”:”f90432629c25087d95ffd780f7d52838-1″,”hypotheses”:[]}
please help:(
This was executed from a Raspberry Pi: https://gist.github.com/3023698. Yes, this works on the Pi too
Sorry i didn’t understood 🙁
great work!
but I’m sad because speech api can not be done long sentence recognition…
Ya i know it canno’t done done for long sentence i just want to try for small sentences please help me please.
Awaiting reply……
How about applying that code to c#?? can anyone help
I implemented the following script that split audio files into chunnks of 8 secs and send it tot the service. If the response code was not 200, I retry 3 times after waiting .5 seconds in between. This script uses sox, and assumes that tmp directory is writable . This is not the best implementaion and variables were created out of convince..
Hope this helps:
$file_name = /path/to/file
$url = “http://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”;
// Convert ot flac
$converted_file_name = ‘/tmp/audio_flac.flac’;
$conversion_command = “sox $filename $converted_file_name”;
exec($conversion_command);
// get the lendth of the flac file
$length_command = “soxi -D $converted_file_name” ;
exec($length_command, $out, $ret);
$length = $out;
$length = $length[0];
$i = 0;
$x = 0;
// Trimmed file legnth
$trim_lenght = 8;
$result = array();
// the file that will hold the trimmed audio data
$temp_trimmed_file = ‘/tmp/trimmed.flac’;
// Temp file to convert the sample rate to 16000
$temp_trimmed_file_2 = ‘/tmp/trimmed_rate.flac’;
while( $i < $length) {
$command = "sox $converted_file_name $temp_trimmed_file trim 0:$i 0:$trim_lenght";
exec($command);
// Convert to 16000 sample rate
$command2 = "sox $temp_trimmed_file -r 16000 $temp_trimmed_file_2";
exec($command2);
$audio = file_get_contents($temp_trimmed_file_2);
$speech_info_request = curl_init();
curl_setopt($speech_info_request, CURLOPT_URL, $url);
curl_setopt($speech_info_request, CURLOPT_HTTPHEADER, array('Content-Type: audio/x-flac; rate=16000' ));
curl_setopt($speech_info_request, CURLOPT_POST, TRUE);
curl_setopt($speech_info_request, CURLOPT_POSTFIELDS, $audio);
curl_setopt($speech_info_request, CURLOPT_RETURNTRANSFER, 1);
$speech_info_response = curl_exec($speech_info_request);
$responseCode = curl_getinfo($speech_info_request,CURLINFO_HTTP_CODE);
if($responseCode==200) {
$jsonObj = json_decode($speech_info_response, TRUE);
$result[$x] = $jsonObj['hypotheses'][0]['utterance'];
}
elseif($responseCode != 200) {
// Try 3 times
$count = 0;
while($count $value) {
$script .= ‘ ‘. $value;
}
return $script;
}
I am not sure why the code was chopped in my preivous comment. Here it is again in a function:
function google_transcribe($pathtofile) {
$filename = $pathtofile;
$url = “http://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”;
// Convert ot flac
$converted_file_name = ‘/tmp/audio_flac.flac’;
$conversion_command = “sox $filename $converted_file_name”;
exec($conversion_command);
// get the lendth of the flac file
$length_command = “soxi -D $converted_file_name” ;
exec($length_command, $out, $ret);
$length = $out;
$length = $length[0];
$i = 0;
$x = 0;
// Trimmed file legnth
$trim_lenght = 8;
$result = array();
$temp_trimmed_file = ‘/tmp/trimmed.flac’;
// Temp file to convert the sample rate to 16000
$temp_trimmed_file_2 = ‘/tmp/trimmed_rate.flac’;
while( $i < $length) {
$command = "sox $converted_file_name $temp_trimmed_file trim 0:$i 0:$trim_lenght";
exec($command);
// Convert to 16000 sample rate
$command2 = "sox $temp_trimmed_file -r 16000 $temp_trimmed_file_2";
exec($command2);
$audio = file_get_contents($temp_trimmed_file_2);
$speech_info_request = curl_init();
curl_setopt($speech_info_request, CURLOPT_URL, $url);
curl_setopt($speech_info_request, CURLOPT_HTTPHEADER, array('Content-Type: audio/x-flac; rate=16000' ));
curl_setopt($speech_info_request, CURLOPT_POST, TRUE);
curl_setopt($speech_info_request, CURLOPT_POSTFIELDS, $audio);
curl_setopt($speech_info_request, CURLOPT_RETURNTRANSFER, 1);
$speech_info_response = curl_exec($speech_info_request);
$responseCode = curl_getinfo($speech_info_request,CURLINFO_HTTP_CODE);
if($responseCode==200) {
$jsonObj = json_decode($speech_info_response, TRUE);
$result[$x] = $jsonObj['hypotheses'][0]['utterance'];
}
elseif($responseCode != 200) {
// Try 3 times
$count = 0;
while($count $value) {
$script .= ‘ ‘. $value;
}
return $script;
}
Hi Al,
I apologize in advance for this dumb question.. but how do I run the script you posted?? What environment should I be in?
Thanks!
Sorry – let me rephrase.. How do you run the function? Is there a basic script I can call to do so? Can you provide sample syntax for running it?
Hi, another question, how did you obtain the URL of the page to where the POST request is made? It is not mentioned anywhere in the source code of the speech test page, plus the source code doesn’t have a form element inside which the input can be found, so how can a POST request be made anyway? Plus where did you get the additional parameters to append to the form-action-page URL in your code?
Pingback: Proxy Servers » Google Speech API doesn’t give correct result when audio is sent in file
Pingback: Google Chrome Speech-To-Text API Kullanımı | GDG ANKARA
Hey Mike,
Great post here. Can you explain how Google now works? I assume it uses this API as a module within a bigger software that converts the searched results into speech? Right?
It’s listed in the Chrome source code:
http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/google_one_shot_remote_engine.cc?view=markup
Mike
Well, I outlined exactly how it works in the post, but
Chrome records your audio via your mic, converts it to FLAC, and then does an HTTP POST to their system for the conversion, and then returns a JSON object with the result.
What they use on their side to do the actual VR, I have no idea.
Mike
do you know what all languages this supports? is it the same list as http://support.google.com/translate/
Pingback: Dolphin Browser – navegue sem as mãos | danielcamargo.com
Has anyone successfully been able to contact Google for permission to use this in a commercial app?!?!?!
Pingback: Does Anyone Uses Google Speech API in Production? | appsgoogleplus.com
Found something owesome ! Everything is provided as a shell script, but no time to test 🙁
http://wiki.openmoko.org/wiki/Google_Voice_Recognition
@sushant Google is your best bet – great first result http://www.freefileconvert.com – if you’re looking for automated tool on your server try the flac libs http://flac.sourceforge.net/
is this service still available? i tried the wget examples (on linux) but all it says is “connecting to http://www.google.com|173.194.73.104|:433…”. i also tried the php example mike and others have posted, but the return is empty.
Yes, it is still working, I copied the html to a local file and it ran great! Thanks.
Anyone could upload a file to test the script?
Thanks in advice
No need to use script , just convert the WAV into flac format and use the curl.exe with the example below and you will receive the JSON format from Google, and then parse back the JSON
curl -H “Content-Type: audio/x-flac; rate=16000” “https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US” -F myfile=”@C:\input.flac” -k -o”C:\output.txt”
I’ve tried this, but I get an empty response (“hypotheses”: [])
Is there any way you can post the sound file that’s known to work?
**********CAN YOU UPLOAD i_like_pickles.flac FOR US????
I created my Flac file with LAME instead of SOX, so the problem could be with that. I installed SOX successfully, but the SOX command for some reason does not work. LAME works great for MP3, but I’m not sure if LAME is actually encoding to MP3 and just giving the file a .flac extension! Thanks
Hi,
Can any one suggest how can i use this in iOS (objective-C).Right now i am using this and it gives me error “Content-Type media type is not audio” error 400.
Here is code that i am using
-(void)SpeechFromGoogle{
NSURL *url = [NSURL URLWithString:@”https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”];
ASIFormDataRequest *request = [ASIFormDataRequest requestWithURL:url];
NSString *homeDirectory = [NSHomeDirectory() stringByAppendingPathComponent:@”Documents”];
NSString *filePath = [NSString stringWithFormat:@”%@/%@”, homeDirectory, @”testFLAC.flac”];
NSData *myData = [NSData dataWithContentsOfFile:filePath];
[request addPostValue:myData forKey:@”Content”];
[request addPostValue:@”audio/x-flac; rate=16000″ forKey:@”Content-Type”];
[request startSynchronous];
NSLog(@”req: %@”, [request responseString]);
}
The Content-Type header needs to be added as a header, not a value:
[request addRequestHeader:@”Content-Type” value:@”audio/x-flac; rate=16000″];
Mike
Thanks for quick reply Mike but still have same error.here is exactly what i am getting on console when updating changes..
Error:
Content-Type media type is not audio
Content-Type media type is not audio
Error 400
Hey Manish,
Anytime that error has come up, it’s been because the Content-Type header isn’t being set properly. There’s obviously no documentation on this, so there’s no way to be sure.
Mike
Thanks Mike ,
Will look at it and get back to you if i am done.
Hi Mike,
Here is full working example for objective-c
-(void)SpeechFromGoogle{
NSString *homeDirectory = [NSHomeDirectory() stringByAppendingPathComponent:@”Documents”];
NSString *filePath = [NSString stringWithFormat:@”%@/%@”, homeDirectory, @”test.flac”];
NSData *myData = [NSData dataWithContentsOfFile:filePath];
NSMutableURLRequest *request = [[NSMutableURLRequest alloc]
initWithURL:[NSURL
URLWithString:@”https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”]];
[request setHTTPMethod:@”POST”];
//set headers
[request addValue:@”Content-Type” forHTTPHeaderField:@”audio/x-flac; rate=16000″];
[request addValue:@”audio/x-flac; rate=16000″ forHTTPHeaderField:@”Content-Type”];
[request setHTTPBody:myData];
[request setValue:[NSString stringWithFormat:@”%d”,[myData length]] forHTTPHeaderField:@”Content-length”];
NSHTTPURLResponse* urlResponse = nil;
NSError *error = [[NSError alloc] init];
NSData *responseData = [NSURLConnection sendSynchronousRequest:request returningResponse:&urlResponse error:&error];
NSString *result = [[NSString alloc] initWithData:responseData encoding:NSUTF8StringEncoding];
NSLog(@”The answer is: %@”,result);
}
Pingback: Google Speech Api使用 | 自留地
Hi Mike,
How can I connect the google speech api in vb?
Do you know how to connect the google speech api in vb?
thanks..
Can I get php code for same,
Thanks
I think I’m doing something wrong. The code you provided Manish works perfectly but when I get the response is never what I said in the audio file. I recorded wav and then transformed it to flac. Could it be that I’m doing something wrong when transforming? How did you guys transformed/record the audio file?
Sandeep, Nick- You should look back in the comments- there have been many examples in many languages.
Mike
Hi MiBo ,
Try some short duration recording first.I use audacity for recording.
Does it still work because when I click at the link provided (https://www.google.com/speech-api/v1/recognize) it gives me a response: HTTP method GET is not supported by this URL
Error 405
I want to use the api for C++ or C# application. Is it still accessible?
You have to POST to the URL.
Mike
hi mike,
where is the previous comments? I cannot view the past comments..
thanks,,
=D
near the bottom of the post, there’s a “Older Comments” link
http://mikepultz.com/2011/03/accessing-google-speech-api-chrome-11/#comments
Has anyone successfully gotten the full-duplex url working, with streaming audio? If so, could you please provide a code sample?
Thanks guys!
I am trying to access it using Javascript/ Jquery. But i Could not. Do anyone knows to call google API server using JS/JQuery ?
Hey Sandeep,
I’m not sure how you’d access it via AJAX- XSS aside, you need to be able to post RAW audio data.
Mike
Pingback: wget for Mac OSX; Google speech API | Big Old Geek
https://github.com/GoogleChrome/webplatform-samples/blob/master/webspeechdemo/webspeechdemo.html
has some newer code.
Hi Mike,
Thanks for the tutorial , I have done with both method, i.e by using curl method on window and using sox to convert wav fiel to flac then wget to get txt file on ubuntu. it worked fien for me….
But by using both method if audio size is more 40 sec , it doesn’t produce any thing , simply blank txt file.
Is it drawback off this api or any other reason…
SO I need that there is any method from which is it possible to convert speech to txt,
or method to split audio files on ubuntu using Sox or using any other command so that I can split it and given i/p to google api
I am getting the 40 second time limit as well – is there an API that allows larger file lengths? I’d love to be able to transcribe my voicemails through this service.