Accessing Google Speech API / Chrome 11

Like this article? Follow me on Twitter @mikepultz for more updates.

Just yesterday, Google pushed version 11 of their Chrome browser into beta, and along with it, one really interesting new feature- support for the HTML5 speech input API. This means that you’ll be able to talk to your computer, and Chrome will be able to interpret it. This feature has been available for awhile on Android devices, so many of you will already be used to it, and welcome the new feature.

If you’re running Chrome version 11, you can test out the new speech capabilities by going to their simple test page on the html5rocks.com site:

http://slides.html5rocks.com/#speech-input

Genius! but how does it work? I started digging around in the Chromium source code, to find out if the speech recognition is implemented as a library built into Chrome, or, if it sends the audio back to Google to process- I know I’ve seen the Sphynx libraries in the Android build, but I was sure the latter was the case- the speech recognition was really good, and that’s really hard to do without really good language models- not something you’d be able to build into a browser.

I found the files I was looking for in the chromium source repo:

http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/

It looks like the audio is collected from the mic, and then passed via an HTTPS POST to a Google web service, which responds with a JSON object with the results. Looking through their audio encoder code, it looks like the audio can be either FLAC or Speex- but it looks like it’s some sort of specially modified version of Speex- I’m not sure what it is, but it just didn’t look quite right.

If that’s the case, there should be no reason why I can’t just POST something to it myself?

The URL listed in speech_recognition_request.cc is:

https://www.google.com/speech-api/v1/recognize

So a quick few lines of PERL (or PHP or just use wget on the command line):

#!/usr/bin/perl

require LWP::UserAgent;

my $url = "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US";
my $audio = "";

open(FILE, "<" . $ARGV[0]);
while(<FILE>)
{
    $audio .= $_;
}
close(FILE);

my $ua = LWP::UserAgent->new;

my $response = $ua->post($url, Content_Type => "audio/x-flac; rate=16000", Content => $audio);
if ($response->is_success)
{
    print $response->content;
}

1;

This quick PERL script uses LWP::UserAgent to POST the binary audio from my audio clip; I recorded a quick wav file, and then converted it to FLAC on the command line (see SoX for more info)

To run it, just do:

[root@prague mike]# ./speech i_like_pickles.flac

The response is pretty straight forward JSON:

{
    "status": 0,
    "id": "b3447b5d98c5653e0067f35b32c0a8ca-1",
    "hypotheses": [
    {
        "utterance": "i like pickles",
        "confidence": 0.9012539
    },
    {
        "utterance": "i like pickle"
    }]
}

I’m not sure if Google is intending this to be a public, usable web service API, but it works- and has all sorts of possibilities!

215 thoughts on “Accessing Google Speech API / Chrome 11

  1. mike Post author

    Yeah- new versions of chrome are using a streaming service, that lets you pass chunked data longer than 40 seconds- but it’s a lot more involved than just making a single HTTP POST.

    I’m working on an example to use it directly, but haven’t finished it yet.

    Mike

  2. Ramesh

    Thanks for replying,

    Ok, I am waiting for the u r example Mike to pass chunked data longer than 40 seconds.

  3. Johannes

    Note to @pozy

    # curl -H “Content-Type: audio/x-flac; rate=16000″ “https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US” -F myfile=”@C:\input.flac” -k -o”C:\output.txt”

    Works excellent! Just some notes:

    1) copy & paste: use different quote signs

    2) make sure rate=16000 corresponds to your bitrate (audacity: before recording!)

    3) I had slightly better results recording mono

    Does anybody has anything about:

    * Done waiting for 100-continue

    I’m missing this milliseconds to wait for that…

  4. mike Post author

    Hey Manish,

    I believe this is using their new streaming server, not the single post-server. I’m working on an example to use this new service, but it’s much more complicated than the old single post system.

    Mike

  5. eman

    after the audio is collected from the mic, does it get stored in a buffer or a certain file?? and is it possible to access this file or get hold of it?

  6. Parikshit

    Hi guys, I am trying to run the above script with my input file but i am not able to do so…
    I converted a wav file into flac by using the online converter : http://www.convertfiles.com

    I am getting the response as below:

    {“status”:5,”id”:”89834e12f70d8f9e4d42cbf8f524bf8b-1″,”hypotheses”:[]}

    My file is a simple hello world file.

  7. mike Post author

    Usually when I’ve seen this, it’s because the bitrate on the content-type header doesn’t match the audio.

    Mike

  8. Pingback: 使用Google语音识别引擎(Google Speech API) | 长叶子的树

  9. JJ

    Hi,

    When I tried send files with 10 seconds, the server returns error 500: Internal Server Error.

    Can I do something?

    Thanks

  10. MaxZhang

    It can not recognize digits ?
    I uploaded a file contains characters and digits, only characters got recognized. why?…….

  11. Hee-dong Yoon

    Hello :)

    I have a problem.In order to solve this problem, i need your help.

    The problem is follow

    I using ‘c++’ and make same ‘pear’ in exercise but Result is follow.

    {“status”;5,”id”:”a8be65bfad9baf7f1717f3de46e3926e-1”,”hypotheses”:[]}

    Dictation result is nothing and Status value is 5.

    I want to know Status what it means. and I hope to know reference which described status (url)

    Thank you for read

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>