Accessing Google Speech API / Chrome 11

I’ve posted an updated version of this article here, using the new full-duplex streaming API.

Just yesterday, Google pushed version 11 of their Chrome browser into beta, and along with it, one really interesting new feature- support for the HTML5 speech input API. This means that you’ll be able to talk to your computer, and Chrome will be able to interpret it. This feature has been available for awhile on Android devices, so many of you will already be used to it, and welcome the new feature.

If you’re running Chrome version 11, you can test out the new speech capabilities by going to their simple test page on the html5rocks.com site:

http://slides.html5rocks.com/#speech-input

Genius! but how does it work? I started digging around in the Chromium source code, to find out if the speech recognition is implemented as a library built into Chrome, or, if it sends the audio back to Google to process- I know I’ve seen the Sphynx libraries in the Android build, but I was sure the latter was the case- the speech recognition was really good, and that’s really hard to do without really good language models- not something you’d be able to build into a browser.

I found the files I was looking for in the chromium source repo:

http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/

It looks like the audio is collected from the mic, and then passed via an HTTPS POST to a Google web service, which responds with a JSON object with the results. Looking through their audio encoder code, it looks like the audio can be either FLAC or Speex– but it looks like it’s some sort of specially modified version of Speex- I’m not sure what it is, but it just didn’t look quite right.

If that’s the case, there should be no reason why I can’t just POST something to it myself?

The URL listed in speech_recognition_request.cc is:

https://www.google.com/speech-api/v1/recognize

So a quick few lines of PERL (or PHP or just use wget on the command line):

#!/usr/bin/perl

require LWP::UserAgent;

my $url = "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US";
my $audio = "";

open(FILE, "<" . $ARGV[0]);
while(<FILE>)
{
    $audio .= $_;
}
close(FILE);

my $ua = LWP::UserAgent->new;

my $response = $ua->post($url, Content_Type => "audio/x-flac; rate=16000", Content => $audio);
if ($response->is_success)
{
    print $response->content;
}

1;

This quick PERL script uses LWP::UserAgent to POST the binary audio from my audio clip; I recorded a quick wav file, and then converted it to FLAC on the command line (see SoX for more info)

To run it, just do:

[root@prague mike]# ./speech i_like_pickles.flac

The response is pretty straight forward JSON:

{
    "status": 0,
    "id": "b3447b5d98c5653e0067f35b32c0a8ca-1",
    "hypotheses": [
    {
        "utterance": "i like pickles",
        "confidence": 0.9012539
    },
    {
        "utterance": "i like pickle"
    }]
}

I’m not sure if Google is intending this to be a public, usable web service API, but it works- and has all sorts of possibilities!

265 thoughts on “Accessing Google Speech API / Chrome 11

  1. rani

    Hey m working in java and getting same output as mike ie.
    {“status”:5,”id”:”f90432629c25087d95ffd780f7d52838-1″,”hypotheses”:[]}

    please help:(

  2. rani

    Ya i know it canno’t done done for long sentence i just want to try for small sentences please help me please.

    Awaiting reply……

  3. Al

    I implemented the following script that split audio files into chunnks of 8 secs and send it tot the service. If the response code was not 200, I retry 3 times after waiting .5 seconds in between. This script uses sox, and assumes that tmp directory is writable . This is not the best implementaion and variables were created out of convince..
    Hope this helps:

    $file_name = /path/to/file
    $url = “http://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”;
    // Convert ot flac
    $converted_file_name = ‘/tmp/audio_flac.flac’;
    $conversion_command = “sox $filename $converted_file_name”;
    exec($conversion_command);
    // get the lendth of the flac file
    $length_command = “soxi -D $converted_file_name” ;

    exec($length_command, $out, $ret);
    $length = $out;
    $length = $length[0];
    $i = 0;
    $x = 0;
    // Trimmed file legnth
    $trim_lenght = 8;

    $result = array();
    // the file that will hold the trimmed audio data
    $temp_trimmed_file = ‘/tmp/trimmed.flac’;
    // Temp file to convert the sample rate to 16000
    $temp_trimmed_file_2 = ‘/tmp/trimmed_rate.flac’;
    while( $i < $length) {
    $command = "sox $converted_file_name $temp_trimmed_file trim 0:$i 0:$trim_lenght";
    exec($command);
    // Convert to 16000 sample rate
    $command2 = "sox $temp_trimmed_file -r 16000 $temp_trimmed_file_2";
    exec($command2);

    $audio = file_get_contents($temp_trimmed_file_2);
    $speech_info_request = curl_init();
    curl_setopt($speech_info_request, CURLOPT_URL, $url);
    curl_setopt($speech_info_request, CURLOPT_HTTPHEADER, array('Content-Type: audio/x-flac; rate=16000' ));
    curl_setopt($speech_info_request, CURLOPT_POST, TRUE);
    curl_setopt($speech_info_request, CURLOPT_POSTFIELDS, $audio);
    curl_setopt($speech_info_request, CURLOPT_RETURNTRANSFER, 1);
    $speech_info_response = curl_exec($speech_info_request);

    $responseCode = curl_getinfo($speech_info_request,CURLINFO_HTTP_CODE);

    if($responseCode==200) {
    $jsonObj = json_decode($speech_info_response, TRUE);
    $result[$x] = $jsonObj['hypotheses'][0]['utterance'];
    }
    elseif($responseCode != 200) {
    // Try 3 times
    $count = 0;
    while($count $value) {
    $script .= ‘ ‘. $value;
    }
    return $script;
    }

  4. Al

    I am not sure why the code was chopped in my preivous comment. Here it is again in a function:
    function google_transcribe($pathtofile) {
    $filename = $pathtofile;
    $url = “http://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”;
    // Convert ot flac
    $converted_file_name = ‘/tmp/audio_flac.flac’;
    $conversion_command = “sox $filename $converted_file_name”;
    exec($conversion_command);
    // get the lendth of the flac file
    $length_command = “soxi -D $converted_file_name” ;

    exec($length_command, $out, $ret);
    $length = $out;
    $length = $length[0];
    $i = 0;
    $x = 0;
    // Trimmed file legnth
    $trim_lenght = 8;

    $result = array();

    $temp_trimmed_file = ‘/tmp/trimmed.flac’;
    // Temp file to convert the sample rate to 16000
    $temp_trimmed_file_2 = ‘/tmp/trimmed_rate.flac’;
    while( $i < $length) {
    $command = "sox $converted_file_name $temp_trimmed_file trim 0:$i 0:$trim_lenght";
    exec($command);
    // Convert to 16000 sample rate
    $command2 = "sox $temp_trimmed_file -r 16000 $temp_trimmed_file_2";
    exec($command2);

    $audio = file_get_contents($temp_trimmed_file_2);
    $speech_info_request = curl_init();
    curl_setopt($speech_info_request, CURLOPT_URL, $url);
    curl_setopt($speech_info_request, CURLOPT_HTTPHEADER, array('Content-Type: audio/x-flac; rate=16000' ));
    curl_setopt($speech_info_request, CURLOPT_POST, TRUE);
    curl_setopt($speech_info_request, CURLOPT_POSTFIELDS, $audio);
    curl_setopt($speech_info_request, CURLOPT_RETURNTRANSFER, 1);
    $speech_info_response = curl_exec($speech_info_request);

    $responseCode = curl_getinfo($speech_info_request,CURLINFO_HTTP_CODE);

    if($responseCode==200) {
    $jsonObj = json_decode($speech_info_response, TRUE);
    $result[$x] = $jsonObj['hypotheses'][0]['utterance'];
    }
    elseif($responseCode != 200) {
    // Try 3 times
    $count = 0;
    while($count $value) {
    $script .= ‘ ‘. $value;
    }
    return $script;
    }

  5. Dave

    Hi Al,

    I apologize in advance for this dumb question.. but how do I run the script you posted?? What environment should I be in?

    Thanks!

  6. Dave

    Sorry – let me rephrase.. How do you run the function? Is there a basic script I can call to do so? Can you provide sample syntax for running it?

  7. Cupidvogel

    Hi, another question, how did you obtain the URL of the page to where the POST request is made? It is not mentioned anywhere in the source code of the speech test page, plus the source code doesn’t have a form element inside which the input can be found, so how can a POST request be made anyway? Plus where did you get the additional parameters to append to the form-action-page URL in your code?

  8. Pingback: Proxy Servers » Google Speech API doesn’t give correct result when audio is sent in file

  9. Pingback: Google Chrome Speech-To-Text API Kullanımı | GDG ANKARA

  10. Amin

    Hey Mike,
    Great post here. Can you explain how Google now works? I assume it uses this API as a module within a bigger software that converts the searched results into speech? Right?

  11. mike Post author

    Well, I outlined exactly how it works in the post, but

    Chrome records your audio via your mic, converts it to FLAC, and then does an HTTP POST to their system for the conversion, and then returns a JSON object with the result.

    What they use on their side to do the actual VR, I have no idea.

    Mike

  12. Pingback: Dolphin Browser – navegue sem as mãos | danielcamargo.com

  13. appsforios

    Has anyone successfully been able to contact Google for permission to use this in a commercial app?!?!?!

  14. Pingback: Does Anyone Uses Google Speech API in Production? | appsgoogleplus.com

  15. 4mla1fn

    is this service still available? i tried the wget examples (on linux) but all it says is “connecting to http://www.google.com|173.194.73.104|:433…”. i also tried the php example mike and others have posted, but the return is empty.

  16. calvin

    Yes, it is still working, I copied the html to a local file and it ran great! Thanks.

  17. pozy

    No need to use script , just convert the WAV into flac format and use the curl.exe with the example below and you will receive the JSON format from Google, and then parse back the JSON

    curl -H “Content-Type: audio/x-flac; rate=16000” “https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US” -F myfile=”@C:\input.flac” -k -o”C:\output.txt”

  18. Luke

    I’ve tried this, but I get an empty response (“hypotheses”: [])

    Is there any way you can post the sound file that’s known to work?

    **********CAN YOU UPLOAD i_like_pickles.flac FOR US????

    I created my Flac file with LAME instead of SOX, so the problem could be with that. I installed SOX successfully, but the SOX command for some reason does not work. LAME works great for MP3, but I’m not sure if LAME is actually encoding to MP3 and just giving the file a .flac extension! Thanks

  19. Manish

    Hi,
    Can any one suggest how can i use this in iOS (objective-C).Right now i am using this and it gives me error “Content-Type media type is not audio” error 400.

    Here is code that i am using

    -(void)SpeechFromGoogle{
    NSURL *url = [NSURL URLWithString:@”https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”];

    ASIFormDataRequest *request = [ASIFormDataRequest requestWithURL:url];
    NSString *homeDirectory = [NSHomeDirectory() stringByAppendingPathComponent:@”Documents”];
    NSString *filePath = [NSString stringWithFormat:@”%@/%@”, homeDirectory, @”testFLAC.flac”];

    NSData *myData = [NSData dataWithContentsOfFile:filePath];
    [request addPostValue:myData forKey:@”Content”];
    [request addPostValue:@”audio/x-flac; rate=16000″ forKey:@”Content-Type”];
    [request startSynchronous];

    NSLog(@”req: %@”, [request responseString]);
    }

  20. mike Post author

    The Content-Type header needs to be added as a header, not a value:

    [request addRequestHeader:@”Content-Type” value:@”audio/x-flac; rate=16000″];

    Mike

  21. Manish

    Thanks for quick reply Mike but still have same error.here is exactly what i am getting on console when updating changes..
    Error:

    Content-Type media type is not audio

    Content-Type media type is not audio
    Error 400

  22. mike Post author

    Hey Manish,

    Anytime that error has come up, it’s been because the Content-Type header isn’t being set properly. There’s obviously no documentation on this, so there’s no way to be sure.

    Mike

  23. Manish

    Hi Mike,
    Here is full working example for objective-c

    -(void)SpeechFromGoogle{

    NSString *homeDirectory = [NSHomeDirectory() stringByAppendingPathComponent:@”Documents”];
    NSString *filePath = [NSString stringWithFormat:@”%@/%@”, homeDirectory, @”test.flac”];

    NSData *myData = [NSData dataWithContentsOfFile:filePath];

    NSMutableURLRequest *request = [[NSMutableURLRequest alloc]
    initWithURL:[NSURL
    URLWithString:@”https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”]];

    [request setHTTPMethod:@”POST”];

    //set headers

    [request addValue:@”Content-Type” forHTTPHeaderField:@”audio/x-flac; rate=16000″];

    [request addValue:@”audio/x-flac; rate=16000″ forHTTPHeaderField:@”Content-Type”];

    [request setHTTPBody:myData];

    [request setValue:[NSString stringWithFormat:@”%d”,[myData length]] forHTTPHeaderField:@”Content-length”];

    NSHTTPURLResponse* urlResponse = nil;
    NSError *error = [[NSError alloc] init];
    NSData *responseData = [NSURLConnection sendSynchronousRequest:request returningResponse:&urlResponse error:&error];
    NSString *result = [[NSString alloc] initWithData:responseData encoding:NSUTF8StringEncoding];

    NSLog(@”The answer is: %@”,result);
    }

  24. Pingback: Google Speech Api使用 | 自留地

  25. Nick

    Hi Mike,

    How can I connect the google speech api in vb?

    Do you know how to connect the google speech api in vb?

    thanks..

  26. MiBo

    I think I’m doing something wrong. The code you provided Manish works perfectly but when I get the response is never what I said in the audio file. I recorded wav and then transformed it to flac. Could it be that I’m doing something wrong when transforming? How did you guys transformed/record the audio file?

  27. mike Post author

    Sandeep, Nick- You should look back in the comments- there have been many examples in many languages.

    Mike

  28. Manish

    Hi MiBo ,
    Try some short duration recording first.I use audacity for recording.

  29. NIck

    hi mike,

    where is the previous comments? I cannot view the past comments..

    thanks,,

    =D

  30. Anon

    Has anyone successfully gotten the full-duplex url working, with streaming audio? If so, could you please provide a code sample?

  31. sandeep

    Thanks guys!

    I am trying to access it using Javascript/ Jquery. But i Could not. Do anyone knows to call google API server using JS/JQuery ?

  32. mike Post author

    Hey Sandeep,

    I’m not sure how you’d access it via AJAX- XSS aside, you need to be able to post RAW audio data.

    Mike

  33. Pingback: wget for Mac OSX; Google speech API | Big Old Geek

  34. Ramesh

    Hi Mike,
    Thanks for the tutorial , I have done with both method, i.e by using curl method on window and using sox to convert wav fiel to flac then wget to get txt file on ubuntu. it worked fien for me….
    But by using both method if audio size is more 40 sec , it doesn’t produce any thing , simply blank txt file.
    Is it drawback off this api or any other reason…

    SO I need that there is any method from which is it possible to convert speech to txt,
    or method to split audio files on ubuntu using Sox or using any other command so that I can split it and given i/p to google api

  35. James

    I am getting the 40 second time limit as well – is there an API that allows larger file lengths? I’d love to be able to transcribe my voicemails through this service.

Leave a Reply

Your email address will not be published. Required fields are marked *