don't_panic
personal and professional blog of mike pultz, technology specialist and serial entrepreneur.

23Mar/1189

Accessing Google Speech API / Chrome 11

Like this article? Follow me on Twitter @mikepultz for more updates.

Just yesterday, Google pushed version 11 of their Chrome browser into beta, and along with it, one really interesting new feature- support for the HTML5 speech input API. This means that you'll be able to talk to your computer, and Chrome will be able to interpret it. This feature has been available for awhile on Android devices, so many of you will already be used to it, and welcome the new feature.

If you're running Chrome version 11, you can test out the new speech capabilities by going to their simple test page on the html5rocks.com site:

http://slides.html5rocks.com/#speech-input

Genius! but how does it work? I started digging around in the Chromium source code, to find out if the speech recognition is implemented as a library built into Chrome, or, if it sends the audio back to Google to process- I know I've seen the Sphynx libraries in the Android build, but I was sure the latter was the case- the speech recognition was really good, and that's really hard to do without really good language models- not something you'd be able to build into a browser.

I found the files I was looking for in the chromium source repo:

http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/

It looks like the audio is collected from the mic, and then passed via an HTTPS POST to a Google web service, which responds with a JSON object with the results. Looking through their audio encoder code, it looks like the audio can be either FLAC or Speex- but it looks like it's some sort of specially modified version of Speex- I'm not sure what it is, but it just didn't look quite right.

If that's the case, there should be no reason why I can't just POST something to it myself?

The URL listed in speech_recognition_request.cc is:

https://www.google.com/speech-api/v1/recognize

So a quick few lines of PERL (or PHP or just use wget on the command line):

#!/usr/bin/perl

require LWP::UserAgent;

my $url = "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US";
my $audio = "";

open(FILE, "<" . $ARGV[0]);
while(<FILE>)
{
    $audio .= $_;
}
close(FILE);

my $ua = LWP::UserAgent->new;

my $response = $ua->post($url, Content_Type => "audio/x-flac; rate=16000", Content => $audio);
if ($response->is_success)
{
    print $response->content;
}

1;

This quick PERL script uses LWP::UserAgent to POST the binary audio from my audio clip; I recorded a quick wav file, and then converted it to FLAC on the command line (see SoX for more info)

To run it, just do:

[root@prague mike]# ./speech i_like_pickles.flac

The response is pretty straight forward JSON:

{
    "status": 0,
    "id": "b3447b5d98c5653e0067f35b32c0a8ca-1",
    "hypotheses": [
    {
        "utterance": "i like pickles",
        "confidence": 0.9012539
    },
    {
        "utterance": "i like pickle"
    }]
}

I'm not sure if Google is intending this to be a public, usable web service API, but it works- and has all sorts of possibilities!

Comments (89) Trackbacks (13)
  1. Zibri, I tried giving it a wav but the web service rejects it. Flac is the way to go for now it seems

  2. I recorded a wav file test.wav, and then used sox to convert it to test.flac. I copied the posted perl script into file test.pl, and created a test.txt which contains the test.flac file name. When i run: perl test.pl test.txt, i got no recognition result. {“status”:5,”id”:”915a84d84d46e13fed2f52a44b652bfc-1″,”hypotheses”:[]}
    BTW, I run it in cygwin
    Could you please tell me where i made mistake?

    thanks,

    Victor

  3. Hey Victor,

    Why are you passing in a txt file?

    The PERL script takes the name of an audio file as the first argument, and not a txt file that contains the name of the audio file.

    Also make sure you’ve encoded it at either 8khz or 16khz, and adjust the Content-Type header accordingly; The example I posted uses 16.

    Mike

  4. Very interesting feature. Someone could share a PHP code that reads the FLAC archive name locally in server and do this?

  5. Hey, I’ve written a bit of Perl code, and created sort of my own version of Siri / Iris. It’s still pretty rough around the edges, but I wanted to post it on this blog since it’s where I got some of the code. You can use sox to normalize the sound to -5 dB, which helps improve accuracy.

    So, how can I post a file on here?

  6. Hey Vic,

    I dont really have a way to post files- but you could just include a link somewhere to download-

    Mike

  7. Hi Mike,
    I wanted to say that you are one of the only references online for posting data to the google speech recognition engine. Thanks for sharing the info.

    I’m trying to use the google speech recognition with my robot and I would like to write a bash script to do so. I am not familiar with Perl and I’m having a hard time posting the right data to the server using wget or curl.

    Can you give me any tip on how to use wget (as you mention in the post) to post the flac file?

    Right now, if I try this:

    wget –post-file=out.flac https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US

    it gives me ERROR 400: Content-Type media type is not audio.

    Any help will be very appreciated. Thanks!

  8. Carlitos-

    It looks like you’re missing the Content-Type header- if you look back in some of the old comments, I’m pretty sure there was somebody that posted an example using wget.

    Mike

  9. Oh! I’m an idiot for forgetting the headers and for not realizing there where more comments!

    Here is my result, this will record, encode to flac, send the request to google ans save teh output to speech.txt:

    arecord -f cd -t wav -d 5 -r 16000 | flac – -f –best –sample-rate 16000 -o out.flac; wget –post-file out.flac –header=”Content-Type: audio/x-flac; rate=16000″ -O speech.txt http://www.google.com/speech-api/v1/recognize?lang=en

    Thanks a lot for your code and help Mike!

  10. FYI:

    status: 0 – correct
    status: 4 – missing audio file
    status: 5 – incorrect audio file

    Sample rate can be anything between 8000 and 44000 (not 44100), and doesn’t have to be exactly 8000 or 16000. If it is out of bounds you get an 400 html error page returned.

  11. very interesting, thank you a lot for sharing. Does anybody have an idea on how to send audio from mic in real time instead that posting the flac file? i would like to reuse chrome for speech recognition, but my system would not have a monitor or tv, so clicking the mic picture to start recognition is not possible.
    thank you all

  12. Thanks for researching Chromes speech recognition API. Great work! I made a little video using the api from the Linux command line: http://www.youtube.com/watch?v=Sf3dMgooufc (use it if you like)

  13. this is for English language , how to set the required language

  14. Hiyassat, I would imagine you can change where it says “lang=en-us” to lang= . Haven’t tested that though.

  15. Mike,
    any examples of iOS use of this API? I have an Xcode project where I need to convert the voice recorded .wav file in real-time to .flac and then would like to POST to the google API. Any thoughts or examples would be great.

  16. Interesting! It is very easy to use.

    The Speech Input API Specification can be found at http://www.w3.org/2005/Incubator/htmlspeech/2010/10/google-api-draft.html ( and I found it in the repo )
    I think anyone who reads this article and wants to use Google Speech API should have a look at the code in the
    source repository first.

    BTW, this is the URL I use: ( lang=zh-CN is for Chinese and it works fine :D )
    http://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=zh-CN&maxresults=1

  17. Is there a way that google speech api can detect pauses in a statement? as in
    If I say “Java is a programming language. and so is C.
    can i forcefully make google translate the statement uptil the pause and then translate the rest of the statement?

  18. I guess I’m almost there, just can’t seem to get the wget example working, my result file keeps returning empty and I’m not getting any errors either.

    Could anyone please post a quick example as to how to go about posting to the API using cURL?

    many thanks!

  19. Okay, I did the wget call working in the end, but I’m still not there.

    I am calling the API from a php file with an exec command with the api call.
    Strangely, the addressbar changes from “speech.php” (my file) to “main?url=out.flac&tid=0&w=1440&h=809″ and my result file contains some weird html containing an iframe (with main?url=out.flac&tid=0&w=1440&h=809 as its src).

    Here’s my code:

    $cmd = ‘wget –post-file out.flac -header=”Content-Type: audio/x-flac; rate=16000″ -O resultaat.html https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US';
    exec($cmd);
    echo file_get_contents(“resultaat.html”);

    It seems like instead of returning the JSON object I desire, the API tries to redirect the call.
    I hardly doubt the Google people have just now restricted the use of their speech-to-text, so I figure it’s me who fails.

    Any pointers would be greatly appreciated.

  20. Okay, I finally got it working.
    I’m sorry for the blob comments I placed in the process.
    Since I figure I won’t be the last person to walk in on the troubles I experienced, I’ll try to make up my blobbing by explaining how I resolved the matter. I hope that by doing so I can help others googling into this thread.

    First of all, I must point out that I have been trying to make this work on a windows machine. So in order to successfully make a wget call I had to install GnuWin Wget (http://sourceforge.net/projects/gnuwin32/files/wget/1.11.4-1/). The problems I experienced earlier were due to the fact that I was using the wrong release. Make sure you download and run the setup program with all dependency files included.

    Second of all, the win32 version of wget seems to accept its parameters slightly different than explained in this article and the comments underneath.
    The command I issue is now “wget –post-file=”out.flac” –header=”Content-Type: audio/x-flac; rate=16000″ –output-file=”result.txt” –no-check-certificate https://www.google.com/speech-api/v1/recognize?lang=en“.
    Unlike I expected this didn’t write the actual desired google JSON response to result.txt.
    Instead it writes the wget log to the file. This log contained “HTTP request sent, awaiting response… 200 OK Length: unspecified [application/json] Saving to: ‘recognize@lang=en’”. Why on earth it writes the JSON obj to a file called ‘recognize@lang=en’ is a riddle to me, but sure enough, the file (saved in the wget executable dir) contains the text I desired.

    Strangely, the accuracy is pretty low and thus the result/recognition level is quite bad. The api recognizes only half of the words the example at html5rocks and google translate do. I figure they use a higher bitrate or something or there’s something else I am not taking into account.

    Anyway, thanks for this article and the comments that helped my on my way, I hope I was able to contribute.

  21. This is the command that worked for me on Ubuntu 11.10

    wget http://www.google.com/speech-api/v1/recognize?lang=en-us –header “Content-Type: audio/x-flac; rate=16000″ –post-file=NameOfFile.flac –output-file=output.txt

    Hope this helped! :)

  22. I have no idea what I’m doing wrong or what this output means when I do wget. I’ve tried everyone’s statements here for wget and none seem to work for me.

    wget http://www.google.com/speech-api/v1/recognize?lang=en-us -header “Content-Type: audio/x-flac; rate=16000? -post-file=rec1.flac -output-file=output.txt

    SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
    syswgetrc = C:\Program Files\GnuWin32/etc/wgetrc
    –2012-02-05 17:34:11– http://www.google.com/speech-api/v1/recognize?lang=en-us
    Resolving http://www.google.com... 74.125.113.104, 74.125.113.105, 74.125.113.106, …
    Connecting to http://www.google.com|74.125.113.104|:80… connected.
    HTTP request sent, awaiting response… 405 HTTP method GET is not supported by this URL
    2012-02-05 17:34:12 ERROR 405: HTTP method GET is not supported by this URL.

    –2012-02-05 17:34:12– http://%96header/
    Resolving \226header… failed: No data record of requested type.
    wget: unable to resolve host address `-header’
    –2012-02-05 17:34:14– ftp://%93content-type/
    => `.listing’
    Resolving \223content-type… failed: No data record of requested type.
    wget: unable to resolve host address `”content-type’
    unlink: No such file or directory
    –2012-02-05 17:34:16– http://audio/x-flac;
    Resolving audio… failed: No data record of requested type.
    wget: unable to resolve host address `audio’
    –2012-02-05 17:34:18– http://rate=16000/?
    Resolving rate=16000… failed: No data record of requested type.
    wget: unable to resolve host address `rate=16000′
    –2012-02-05 17:34:21– http://%96post-file=rec1.flac/
    Resolving \226post-file=rec1.flac… failed: No data record of requested type.
    wget: unable to resolve host address `-post-file=rec1.flac’
    –2012-02-05 17:34:21– http://%96output-file=output.txt/
    Resolving \226output-file=output.txt… failed: No data record of requested type.

    wget: unable to resolve host address `-output-file=output.txt’

  23. I have tried this code in PHP.

    <?php
    $url = "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US";
    $file = 'hello.flac';

    $audio = "";
    $file=fopen("hello.flac","r");
    while(!feof($file)) {
    $audio .= fgets($file). "”;
    }
    fclose($file);
    $data = array(‘Content_Type’ => ‘audio/x-flac; rate=16000′,’Content’ => @$audio);
    $ch = curl_init();
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_POST,true);
    curl_setopt($ch,CURLOPT_POSTFIELDS,$data);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $result = curl_exec($ch);
    echo $result;
    curl_close($ch);

    ?>

    But it is giving me response as:

    Content-Type media type is not audio

    Content-Type media type is not audio
    Error 400

    Can anybody suggest me, where I am doing wrong???????

  24. vfnik, the @ in a curl postfields array is a reference to a file, so, instead of reading the data in with the fopen/fread etc, just put the filename there so, instead of @$audio, use @$file

    untested but i think that will work

  25. For those interested in using SPEEX codec, there is a fork that implements the MIME-Type “x-speex-with-header-byte” and works perfectly against Google’s Speech Recognition APIs. Its available at http://qxip.net/wiki or directly on GitHUB: https://github.com/QXIP/Speex-with-header-bytes

  26. I ported your code to Python (thanks!!):

    import urllib2
    import os
    import sys

    audio = open(sys.argv[1], ‘rb’)
    filesize = os.path.getsize(sys.argv[1])

    print sys.argv[1],’ Read’,”\n”

    req = urllib2.Request(url=’https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US’)
    req.add_header(‘Content-type’,'audio/x-flac; rate=16000′)
    req.add_header(‘Content-length’, str(filesize))
    req.add_data(audio)

    print ‘Request built’,”\n”

    response = urllib2.urlopen(req)

    print ‘Response returned’,”\n”

    print response.read()

  27. Your code works fine without modification, thank you; however, the response from Google is somewhat confusing.

    “hypotheses”:[{"utterance":"poke poke poke poke poke poke poke poke","confidence":0.96547246}]}

    This is from a voicemail test that I just recorded and I can assure you that the word “poke” was not included.

    Has anyone else seen this response from Google?

  28. Doh! If your phone system records at a 8000 bit rate; Google works better if you tell it to use 8000 instead of 16000!

    If anyone else gets odd responses from Google, check that your bit rate is the same.

  29. hello .. does anyone know if it-IT is the right configuration to set it to italian language?

  30. hello,
    I made this code, but I always reports this error

    Content-Type media type is not sound

    Error 400

    how can I fix this?

    ‘audio/x-flac; rate=16000′,’Content’ => @$file);
    $ch = curl_init();
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_POST,true);curl_setopt($ch,CURLOPT_POSTFIELDS,$data);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $result = curl_exec($ch);
    echo $result;
    curl_close($ch);
    ?>

  31. Hi, i am trying to get .net to work with this…

    Here is my code below– what am I doing wrong!?

    Dim rdr As New FileStream(“c:\record.flac”, FileMode.Open)
    Dim req As HttpWebRequest = DirectCast(WebRequest.Create(“https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”), HttpWebRequest)
    req.Method = “POST”
    req.ContentType = “audio/x-flac; rate=16000″
    req.ContentLength = rdr.Length
    req.AllowWriteStreamBuffering = True
    Dim reqStream As Stream = req.GetRequestStream()
    Dim inData As Byte() = New Byte(rdr.Length – 1) {}
    Dim bytesRead As Integer = rdr.Read(inData, 0, rdr.Length)
    reqStream.Write(inData, 0, rdr.Length)
    rdr.Close()
    reqStream.Close()
    Dim response As HttpWebResponse = req.GetResponse()
    Using reader As StreamReader = New StreamReader(response.GetResponseStream())
    s = reader.ReadToEnd()
    End Using

    All fiddler says to me is Tunnel through http://www.google.com:443

    PLEASE HELP ME!

  32. HELP ME!!

    hello,
    I made this code, but I always reports this error
    Content-Type media type is not sound
    Error 400
    how can I fix this?
    ‘audio/x-flac; rate=16000′,’Content’ => @$file);
    $ch = curl_init();
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_POST,true);curl_setopt($ch,CURLOPT_POSTFIELDS,$data);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $result = curl_exec($ch);
    echo $result;
    curl_close($ch);
    ?>

  33. Savez,

    The code you provided is incomplete;

    where is $url set? or $data? or $file?

    and I don’t understand what this line does:

    ‘audio/x-flac; rate=16000′,’Content’ => @$file);

    that’s not even syntactically valid- I assume it was cut off when you posted it?

    Somewhere on that line, you need to actually set “Content-Type” to the audio/flac value-

    Without the rest of this code, it’s impossible to say why it’s not working.

    Mike

  34. sorry, you have deleted the code I posted qunado

    I put the code that gives me error.

    ‘audio/x-wav’,'Content’ => @$audio);
    $ch = curl_init();
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_POST,true);curl_setopt($ch,CURLOPT_POSTFIELDS,$data);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $result = curl_exec($ch);
    echo $result;
    curl_close($ch);
    ?>

  35. ‘audio/x-wav’,'Content’ => @$audio);
    $ch = curl_init();
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_POST,true);curl_setopt($ch,CURLOPT_POSTFIELDS,$data);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $result = curl_exec($ch);
    echo $result;
    curl_close($ch);
    ?>

  36. $url = “https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”;
    $file = ‘hello.flac’; $audio = “”; $files=fopen($file,”r”);
    while(!feof($files)) { $audio .= fgets($files);}
    fclose($files); $data = array(‘Content_Type’ => ‘audio/x-wav’,'Content’ => @$audio);
    $ch = curl_init(); curl_setopt($ch,CURLOPT_URL,$url); curl_setopt($ch,CURLOPT_POST,true);curl_setopt($ch,CURLOPT_POSTFIELDS,$data);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $result = curl_exec($ch); echo $result; curl_close($ch);

  37. Well, a few things

    1) you’re passing the content-type header as “audio/x-wav”- it should be flac if you’re passing in FLAC audio
    2) i’m not sure if ‘Content => @$audio is correct; if I’m not mistaken, putting an @ in front of the var in CURL assumes it’s a path to a file, while you’re passing in the raw audio content- but I’m not a big CURL user, so I’m not positive
    3) I’m not sure if you can pass both a header and the data in the same array in CURLOPT_POSTFIELDS; the PHP manual says it’s the full data, which isn’t headers- again, I don’t know much about CURL- but I think you have to use CURLOPT_HTTPHEADER to set a header
    4) the header is “Content-Type” not “Content_Type”

    I’m sure if you look back in the comments, there’s a functioning PHP example

    Mike

  38. Has anybody tried implementing this in Android? Looks like it should be doable, although I have no idea how one could record FLAC in Android. It’s easy to access the Google transcription service in Android if you want to record the audio at the same time, but I can’t find a way to run it with pre-recorded audio files.


Leave a comment

(required)