Accessing Google Speech API / Chrome 11

I’ve posted an updated version of this article here, using the new full-duplex streaming API.

Just yesterday, Google pushed version 11 of their Chrome browser into beta, and along with it, one really interesting new feature- support for the HTML5 speech input API. This means that you’ll be able to talk to your computer, and Chrome will be able to interpret it. This feature has been available for awhile on Android devices, so many of you will already be used to it, and welcome the new feature.

If you’re running Chrome version 11, you can test out the new speech capabilities by going to their simple test page on the html5rocks.com site:

http://slides.html5rocks.com/#speech-input

Genius! but how does it work? I started digging around in the Chromium source code, to find out if the speech recognition is implemented as a library built into Chrome, or, if it sends the audio back to Google to process- I know I’ve seen the Sphynx libraries in the Android build, but I was sure the latter was the case- the speech recognition was really good, and that’s really hard to do without really good language models- not something you’d be able to build into a browser.

I found the files I was looking for in the chromium source repo:

http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/

It looks like the audio is collected from the mic, and then passed via an HTTPS POST to a Google web service, which responds with a JSON object with the results. Looking through their audio encoder code, it looks like the audio can be either FLAC or Speex– but it looks like it’s some sort of specially modified version of Speex- I’m not sure what it is, but it just didn’t look quite right.

If that’s the case, there should be no reason why I can’t just POST something to it myself?

The URL listed in speech_recognition_request.cc is:

https://www.google.com/speech-api/v1/recognize

So a quick few lines of PERL (or PHP or just use wget on the command line):

#!/usr/bin/perl

require LWP::UserAgent;

my $url = "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US";
my $audio = "";

open(FILE, "<" . $ARGV[0]);
while(<FILE>)
{
    $audio .= $_;
}
close(FILE);

my $ua = LWP::UserAgent->new;

my $response = $ua->post($url, Content_Type => "audio/x-flac; rate=16000", Content => $audio);
if ($response->is_success)
{
    print $response->content;
}

1;

This quick PERL script uses LWP::UserAgent to POST the binary audio from my audio clip; I recorded a quick wav file, and then converted it to FLAC on the command line (see SoX for more info)

To run it, just do:

[root@prague mike]# ./speech i_like_pickles.flac

The response is pretty straight forward JSON:

{
    "status": 0,
    "id": "b3447b5d98c5653e0067f35b32c0a8ca-1",
    "hypotheses": [
    {
        "utterance": "i like pickles",
        "confidence": 0.9012539
    },
    {
        "utterance": "i like pickle"
    }]
}

I’m not sure if Google is intending this to be a public, usable web service API, but it works- and has all sorts of possibilities!

265 thoughts on “Accessing Google Speech API / Chrome 11

  1. Pingback: I Hope Apple Improves Voice Recognition... » Gadgets, Software » Russell Heimlich

  2. slm32006

    [code]
    $filename = ‘/path/to/flac/file/my_soundfile.flac’;

    $handle = fopen($filename, “r”);
    $XPost = fread($handle, filesize($filename));
    fclose($handle);

    $url = “https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”;

    $ch = curl_init(); // initialize curl handle
    curl_setopt($ch, CURLOPT_URL, $url); // set url to post to
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // return into a variable
    curl_setopt($ch, CURLOPT_HTTPHEADER, Array(“Content-Type: audio/x-flac; rate=16000”));
    curl_setopt($ch, CURLOPT_TIMEOUT, 300); // times out after 30s
    curl_setopt($ch, CURLOPT_POSTFIELDS, $XPost); // add POST fields
    curl_setopt($ch, CURLOPT_POST, 1);
    $str = curl_exec($ch); // run the whole process
    curl_close($ch);

    //decode returned json to associative array
    $objs = json_decode($str, true);

    //extract the data we need
    $converted_text = $objs[“hypotheses”][0][“utterance”];
    $score = round($objs[“hypotheses”][0][“confidence”], 2);

    ?>
    [/code]

  3. hidabe

    I am trying use: “https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=es”; for spanish recongnition and it dont work, any suggest?

    Regards!

  4. joshi

    Hi,
    found this by pure chance – great thread !
    Unfortunately, I do not get a result. wget returns: “Authority of issuer of the certificate cannot checked
    locally.”
    This surprises me, since I have called wget with option –no-check-certificate
    Any ideas ?
    Thanks a lot in advance for your help

    Joshi

  5. Adam

    I want to support multiple languages with this link, and I know it’s possible, because lang=es-mx for example, recognizes spanish.
    My questions are:
    * How do I know which languages are supported?
    * Can I use a flag to indicate “auto-recognize”?

  6. Zibri

    Did anyone try with WAV or g711 files? Without conversion… I mean with something like:
    Content-Type: audio/x-wav; rate=16000 ?

  7. fps

    Zibri, I tried giving it a wav but the web service rejects it. Flac is the way to go for now it seems

  8. victor

    I recorded a wav file test.wav, and then used sox to convert it to test.flac. I copied the posted perl script into file test.pl, and created a test.txt which contains the test.flac file name. When i run: perl test.pl test.txt, i got no recognition result. {“status”:5,”id”:”915a84d84d46e13fed2f52a44b652bfc-1″,”hypotheses”:[]}
    BTW, I run it in cygwin
    Could you please tell me where i made mistake?

    thanks,

    Victor

  9. mike Post author

    Hey Victor,

    Why are you passing in a txt file?

    The PERL script takes the name of an audio file as the first argument, and not a txt file that contains the name of the audio file.

    Also make sure you’ve encoded it at either 8khz or 16khz, and adjust the Content-Type header accordingly; The example I posted uses 16.

    Mike

  10. Javier

    Very interesting feature. Someone could share a PHP code that reads the FLAC archive name locally in server and do this?

  11. vic kumar

    Hey, I’ve written a bit of Perl code, and created sort of my own version of Siri / Iris. It’s still pretty rough around the edges, but I wanted to post it on this blog since it’s where I got some of the code. You can use sox to normalize the sound to -5 dB, which helps improve accuracy.

    So, how can I post a file on here?

  12. mike Post author

    Hey Vic,

    I dont really have a way to post files- but you could just include a link somewhere to download-

    Mike

  13. Carlitos

    Hi Mike,
    I wanted to say that you are one of the only references online for posting data to the google speech recognition engine. Thanks for sharing the info.

    I’m trying to use the google speech recognition with my robot and I would like to write a bash script to do so. I am not familiar with Perl and I’m having a hard time posting the right data to the server using wget or curl.

    Can you give me any tip on how to use wget (as you mention in the post) to post the flac file?

    Right now, if I try this:

    wget –post-file=out.flac https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US

    it gives me ERROR 400: Content-Type media type is not audio.

    Any help will be very appreciated. Thanks!

  14. mike Post author

    Carlitos-

    It looks like you’re missing the Content-Type header- if you look back in some of the old comments, I’m pretty sure there was somebody that posted an example using wget.

    Mike

  15. Carlitos

    Oh! I’m an idiot for forgetting the headers and for not realizing there where more comments!

    Here is my result, this will record, encode to flac, send the request to google ans save teh output to speech.txt:

    arecord -f cd -t wav -d 5 -r 16000 | flac – -f –best –sample-rate 16000 -o out.flac; wget –post-file out.flac –header=”Content-Type: audio/x-flac; rate=16000″ -O speech.txt http://www.google.com/speech-api/v1/recognize?lang=en

    Thanks a lot for your code and help Mike!

  16. Tha Narie

    FYI:

    status: 0 – correct
    status: 4 – missing audio file
    status: 5 – incorrect audio file

    Sample rate can be anything between 8000 and 44000 (not 44100), and doesn’t have to be exactly 8000 or 16000. If it is out of bounds you get an 400 html error page returned.

  17. framillo

    very interesting, thank you a lot for sharing. Does anybody have an idea on how to send audio from mic in real time instead that posting the flac file? i would like to reuse chrome for speech recognition, but my system would not have a monitor or tv, so clicking the mic picture to start recognition is not possible.
    thank you all

  18. Pingback: Hack into the Google Chrome beta speech recognition api | Robert Buzink

  19. Pingback: Nao 1337 uses Google Speech-to-Text Service | Carlitos' Contraptions

  20. Travis W

    Hiyassat, I would imagine you can change where it says “lang=en-us” to lang= . Haven’t tested that though.

  21. Pingback: Building My Own Siri / Jarvis « cranklin.com

  22. AnthonyCE

    Mike,
    any examples of iOS use of this API? I have an Xcode project where I need to convert the voice recorded .wav file in real-time to .flac and then would like to POST to the google API. Any thoughts or examples would be great.

  23. DogWang

    Interesting! It is very easy to use.

    The Speech Input API Specification can be found at http://www.w3.org/2005/Incubator/htmlspeech/2010/10/google-api-draft.html ( and I found it in the repo )
    I think anyone who reads this article and wants to use Google Speech API should have a look at the code in the
    source repository first.

    BTW, this is the URL I use: ( lang=zh-CN is for Chinese and it works fine 😀 )
    http://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=zh-CN&maxresults=1

  24. Raveesh Sharma

    Is there a way that google speech api can detect pauses in a statement? as in
    If I say “Java is a programming language. and so is C.
    can i forcefully make google translate the statement uptil the pause and then translate the rest of the statement?

  25. TrkHefner_

    I guess I’m almost there, just can’t seem to get the wget example working, my result file keeps returning empty and I’m not getting any errors either.

    Could anyone please post a quick example as to how to go about posting to the API using cURL?

    many thanks!

  26. TrkHefner_

    Okay, I did the wget call working in the end, but I’m still not there.

    I am calling the API from a php file with an exec command with the api call.
    Strangely, the addressbar changes from “speech.php” (my file) to “main?url=out.flac&tid=0&w=1440&h=809″ and my result file contains some weird html containing an iframe (with main?url=out.flac&tid=0&w=1440&h=809 as its src).

    Here’s my code:

    $cmd = ‘wget –post-file out.flac -header=”Content-Type: audio/x-flac; rate=16000” -O resultaat.html https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US‘;
    exec($cmd);
    echo file_get_contents(“resultaat.html”);

    It seems like instead of returning the JSON object I desire, the API tries to redirect the call.
    I hardly doubt the Google people have just now restricted the use of their speech-to-text, so I figure it’s me who fails.

    Any pointers would be greatly appreciated.

  27. TrkHefner_

    Okay, I finally got it working.
    I’m sorry for the blob comments I placed in the process.
    Since I figure I won’t be the last person to walk in on the troubles I experienced, I’ll try to make up my blobbing by explaining how I resolved the matter. I hope that by doing so I can help others googling into this thread.

    First of all, I must point out that I have been trying to make this work on a windows machine. So in order to successfully make a wget call I had to install GnuWin Wget (http://sourceforge.net/projects/gnuwin32/files/wget/1.11.4-1/). The problems I experienced earlier were due to the fact that I was using the wrong release. Make sure you download and run the setup program with all dependency files included.

    Second of all, the win32 version of wget seems to accept its parameters slightly different than explained in this article and the comments underneath.
    The command I issue is now “wget –post-file=”out.flac” –header=”Content-Type: audio/x-flac; rate=16000″ –output-file=”result.txt” –no-check-certificate https://www.google.com/speech-api/v1/recognize?lang=en“.
    Unlike I expected this didn’t write the actual desired google JSON response to result.txt.
    Instead it writes the wget log to the file. This log contained “HTTP request sent, awaiting response… 200 OK Length: unspecified [application/json] Saving to: ‘recognize@lang=en'”. Why on earth it writes the JSON obj to a file called ‘recognize@lang=en’ is a riddle to me, but sure enough, the file (saved in the wget executable dir) contains the text I desired.

    Strangely, the accuracy is pretty low and thus the result/recognition level is quite bad. The api recognizes only half of the words the example at html5rocks and google translate do. I figure they use a higher bitrate or something or there’s something else I am not taking into account.

    Anyway, thanks for this article and the comments that helped my on my way, I hope I was able to contribute.

  28. Pingback: Asiri: Let Siri Speak Your Language | Abdulrahman Alotaiba's Blog

  29. Rob

    I have no idea what I’m doing wrong or what this output means when I do wget. I’ve tried everyone’s statements here for wget and none seem to work for me.

    wget http://www.google.com/speech-api/v1/recognize?lang=en-us -header “Content-Type: audio/x-flac; rate=16000? -post-file=rec1.flac -output-file=output.txt

    SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
    syswgetrc = C:\Program Files\GnuWin32/etc/wgetrc
    –2012-02-05 17:34:11– http://www.google.com/speech-api/v1/recognize?lang=en-us
    Resolving http://www.google.com... 74.125.113.104, 74.125.113.105, 74.125.113.106, …
    Connecting to http://www.google.com|74.125.113.104|:80… connected.
    HTTP request sent, awaiting response… 405 HTTP method GET is not supported by this URL
    2012-02-05 17:34:12 ERROR 405: HTTP method GET is not supported by this URL.

    –2012-02-05 17:34:12– http://%96header/
    Resolving \226header… failed: No data record of requested type.
    wget: unable to resolve host address `-header’
    –2012-02-05 17:34:14– ftp://%93content-type/
    => `.listing’
    Resolving \223content-type… failed: No data record of requested type.
    wget: unable to resolve host address `”content-type’
    unlink: No such file or directory
    –2012-02-05 17:34:16– http://audio/x-flac;
    Resolving audio… failed: No data record of requested type.
    wget: unable to resolve host address `audio’
    –2012-02-05 17:34:18– http://rate=16000/?
    Resolving rate=16000… failed: No data record of requested type.
    wget: unable to resolve host address `rate=16000′
    –2012-02-05 17:34:21– http://%96post-file=rec1.flac/
    Resolving \226post-file=rec1.flac… failed: No data record of requested type.
    wget: unable to resolve host address `-post-file=rec1.flac’
    –2012-02-05 17:34:21– http://%96output-file=output.txt/
    Resolving \226output-file=output.txt… failed: No data record of requested type.

    wget: unable to resolve host address `-output-file=output.txt’

  30. vfnik

    I have tried this code in PHP.

    <?php
    $url = "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US&quot;;
    $file = 'hello.flac';

    $audio = "";
    $file=fopen("hello.flac","r");
    while(!feof($file)) {
    $audio .= fgets($file). "”;
    }
    fclose($file);
    $data = array(‘Content_Type’ => ‘audio/x-flac; rate=16000′,’Content’ => @$audio);
    $ch = curl_init();
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_POST,true);
    curl_setopt($ch,CURLOPT_POSTFIELDS,$data);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $result = curl_exec($ch);
    echo $result;
    curl_close($ch);

    ?>

    But it is giving me response as:

    Content-Type media type is not audio

    Content-Type media type is not audio
    Error 400

    Can anybody suggest me, where I am doing wrong???????

  31. stranger

    vfnik, the @ in a curl postfields array is a reference to a file, so, instead of reading the data in with the fopen/fread etc, just put the filename there so, instead of @$audio, use @$file

    untested but i think that will work

  32. Jon Schwartz

    I ported your code to Python (thanks!!):

    import urllib2
    import os
    import sys

    audio = open(sys.argv[1], ‘rb’)
    filesize = os.path.getsize(sys.argv[1])

    print sys.argv[1],’ Read’,”\n”

    req = urllib2.Request(url=’https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US’)
    req.add_header(‘Content-type’,’audio/x-flac; rate=16000′)
    req.add_header(‘Content-length’, str(filesize))
    req.add_data(audio)

    print ‘Request built’,”\n”

    response = urllib2.urlopen(req)

    print ‘Response returned’,”\n”

    print response.read()

  33. Doug

    Your code works fine without modification, thank you; however, the response from Google is somewhat confusing.

    “hypotheses”:[{“utterance”:”poke poke poke poke poke poke poke poke”,”confidence”:0.96547246}]}

    This is from a voicemail test that I just recorded and I can assure you that the word “poke” was not included.

    Has anyone else seen this response from Google?

  34. Doug

    Doh! If your phone system records at a 8000 bit rate; Google works better if you tell it to use 8000 instead of 16000!

    If anyone else gets odd responses from Google, check that your bit rate is the same.

  35. criniera

    hello .. does anyone know if it-IT is the right configuration to set it to italian language?

  36. savez

    hello,
    I made this code, but I always reports this error

    Content-Type media type is not sound

    Error 400

    how can I fix this?

    ‘audio/x-flac; rate=16000′,’Content’ => @$file);
    $ch = curl_init();
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_POST,true);curl_setopt($ch,CURLOPT_POSTFIELDS,$data);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $result = curl_exec($ch);
    echo $result;
    curl_close($ch);
    ?>

  37. Bob12321

    Hi, i am trying to get .net to work with this…

    Here is my code below– what am I doing wrong!?

    Dim rdr As New FileStream(“c:\record.flac”, FileMode.Open)
    Dim req As HttpWebRequest = DirectCast(WebRequest.Create(“https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”), HttpWebRequest)
    req.Method = “POST”
    req.ContentType = “audio/x-flac; rate=16000”
    req.ContentLength = rdr.Length
    req.AllowWriteStreamBuffering = True
    Dim reqStream As Stream = req.GetRequestStream()
    Dim inData As Byte() = New Byte(rdr.Length – 1) {}
    Dim bytesRead As Integer = rdr.Read(inData, 0, rdr.Length)
    reqStream.Write(inData, 0, rdr.Length)
    rdr.Close()
    reqStream.Close()
    Dim response As HttpWebResponse = req.GetResponse()
    Using reader As StreamReader = New StreamReader(response.GetResponseStream())
    s = reader.ReadToEnd()
    End Using

    All fiddler says to me is Tunnel through http://www.google.com:443

    PLEASE HELP ME!

  38. savez

    HELP ME!!

    hello,
    I made this code, but I always reports this error
    Content-Type media type is not sound
    Error 400
    how can I fix this?
    ‘audio/x-flac; rate=16000′,’Content’ => @$file);
    $ch = curl_init();
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_POST,true);curl_setopt($ch,CURLOPT_POSTFIELDS,$data);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $result = curl_exec($ch);
    echo $result;
    curl_close($ch);
    ?>

  39. mike Post author

    Savez,

    The code you provided is incomplete;

    where is $url set? or $data? or $file?

    and I don’t understand what this line does:

    ‘audio/x-flac; rate=16000′,’Content’ => @$file);

    that’s not even syntactically valid- I assume it was cut off when you posted it?

    Somewhere on that line, you need to actually set “Content-Type” to the audio/flac value-

    Without the rest of this code, it’s impossible to say why it’s not working.

    Mike

  40. savez

    sorry, you have deleted the code I posted qunado

    I put the code that gives me error.

    ‘audio/x-wav’,’Content’ => @$audio);
    $ch = curl_init();
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_POST,true);curl_setopt($ch,CURLOPT_POSTFIELDS,$data);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $result = curl_exec($ch);
    echo $result;
    curl_close($ch);
    ?>

  41. savez

    ‘audio/x-wav’,’Content’ => @$audio);
    $ch = curl_init();
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_POST,true);curl_setopt($ch,CURLOPT_POSTFIELDS,$data);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $result = curl_exec($ch);
    echo $result;
    curl_close($ch);
    ?>

  42. savez

    $url = “https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”;
    $file = ‘hello.flac’; $audio = “”; $files=fopen($file,”r”);
    while(!feof($files)) { $audio .= fgets($files);}
    fclose($files); $data = array(‘Content_Type’ => ‘audio/x-wav’,’Content’ => @$audio);
    $ch = curl_init(); curl_setopt($ch,CURLOPT_URL,$url); curl_setopt($ch,CURLOPT_POST,true);curl_setopt($ch,CURLOPT_POSTFIELDS,$data);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $result = curl_exec($ch); echo $result; curl_close($ch);

Leave a Reply

Your email address will not be published. Required fields are marked *