Accessing Google Speech API / Chrome 11

I’ve posted an updated version of this article here, using the new full-duplex streaming API.

Just yesterday, Google pushed version 11 of their Chrome browser into beta, and along with it, one really interesting new feature- support for the HTML5 speech input API. This means that you’ll be able to talk to your computer, and Chrome will be able to interpret it. This feature has been available for awhile on Android devices, so many of you will already be used to it, and welcome the new feature.

If you’re running Chrome version 11, you can test out the new speech capabilities by going to their simple test page on the html5rocks.com site:

http://slides.html5rocks.com/#speech-input

Genius! but how does it work? I started digging around in the Chromium source code, to find out if the speech recognition is implemented as a library built into Chrome, or, if it sends the audio back to Google to process- I know I’ve seen the Sphynx libraries in the Android build, but I was sure the latter was the case- the speech recognition was really good, and that’s really hard to do without really good language models- not something you’d be able to build into a browser.

I found the files I was looking for in the chromium source repo:

http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/

It looks like the audio is collected from the mic, and then passed via an HTTPS POST to a Google web service, which responds with a JSON object with the results. Looking through their audio encoder code, it looks like the audio can be either FLAC or Speex– but it looks like it’s some sort of specially modified version of Speex- I’m not sure what it is, but it just didn’t look quite right.

If that’s the case, there should be no reason why I can’t just POST something to it myself?

The URL listed in speech_recognition_request.cc is:

https://www.google.com/speech-api/v1/recognize

So a quick few lines of PERL (or PHP or just use wget on the command line):

#!/usr/bin/perl

require LWP::UserAgent;

my $url = "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US";
my $audio = "";

open(FILE, "<" . $ARGV[0]);
while(<FILE>)
{
    $audio .= $_;
}
close(FILE);

my $ua = LWP::UserAgent->new;

my $response = $ua->post($url, Content_Type => "audio/x-flac; rate=16000", Content => $audio);
if ($response->is_success)
{
    print $response->content;
}

1;

This quick PERL script uses LWP::UserAgent to POST the binary audio from my audio clip; I recorded a quick wav file, and then converted it to FLAC on the command line (see SoX for more info)

To run it, just do:

[root@prague mike]# ./speech i_like_pickles.flac

The response is pretty straight forward JSON:

{
    "status": 0,
    "id": "b3447b5d98c5653e0067f35b32c0a8ca-1",
    "hypotheses": [
    {
        "utterance": "i like pickles",
        "confidence": 0.9012539
    },
    {
        "utterance": "i like pickle"
    }]
}

I’m not sure if Google is intending this to be a public, usable web service API, but it works- and has all sorts of possibilities!

265 thoughts on “Accessing Google Speech API / Chrome 11

  1. mike Post author

    Well, a few things

    1) you’re passing the content-type header as “audio/x-wav”- it should be flac if you’re passing in FLAC audio
    2) i’m not sure if ‘Content => @$audio is correct; if I’m not mistaken, putting an @ in front of the var in CURL assumes it’s a path to a file, while you’re passing in the raw audio content- but I’m not a big CURL user, so I’m not positive
    3) I’m not sure if you can pass both a header and the data in the same array in CURLOPT_POSTFIELDS; the PHP manual says it’s the full data, which isn’t headers- again, I don’t know much about CURL- but I think you have to use CURLOPT_HTTPHEADER to set a header
    4) the header is “Content-Type” not “Content_Type”

    I’m sure if you look back in the comments, there’s a functioning PHP example

    Mike

  2. Phil

    Has anybody tried implementing this in Android? Looks like it should be doable, although I have no idea how one could record FLAC in Android. It’s easy to access the Google transcription service in Android if you want to record the audio at the same time, but I can’t find a way to run it with pre-recorded audio files.

  3. Pingback: django-transcription or maybe node.js | Saul Shanabrook

  4. sushant

    Hello Mike,
    I have made changes in php code as u told, still it is giving following error,

    Content-Type media type is not sound
    Error 400.

    Awaiting your reply

  5. Sushant

    Hi Mike and savez,

    i tried to run that php code.
    but still it gives error 400(content type in not audio)..
    i have done changes as mike said..
    what to do????
    how to solve that error???
    i uploaded that php file on local server with .flac file..
    please help…

    Thanks in Advance..

  6. sushant

    hello Mike,
    Thank u for replying.

    Here is my code,

    ‘audio/x-flac; rate=16000’, ‘Content’ => @$audio);
    $ch = curl_init();
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_POST,true);
    curl_setopt($ch,CURLOPT_POSTFIELDS,$data);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $result = curl_exec($ch);
    echo $result;
    curl_close($ch);
    ?>

  7. sushant

    take a look at this one,

    $url = “https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”;
    $file = ‘what_is_this.flac’;

    $audio = “”;

    $file = fopen(“what_is_this.flac”,”r”);

    while(!feof($file)) {
    $audio .= fgets($file). “”;
    }

    fclose($file);

    $data = array(‘Content-Type’ => ‘audio/x-flac; rate=16000’, ‘Content’ => @$audio);
    $ch = curl_init();
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_POST,true);
    curl_setopt($ch,CURLOPT_POSTFIELDS,$data);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $result = curl_exec($ch);
    echo $result;
    curl_close($ch);

  8. mike Post author

    The problem is, you’re passing the Content-Type header as data, and you should be passing it as a header.

    I’ll post a full example in a minute.

    Mike

  9. mike Post author

    A lot of people are asking for full examples of using PHP, so I thought I would post a full working example.

    Mike

    <?

    $url = 'https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US';
    $audio = file_get_contents('tts_order_new_service.flac');

    $ch = curl_init();

    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: audio/x-flac; rate=16000'));
    curl_setopt($ch, CURLOPT_POSTFIELDS, $audio);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

    $json = curl_exec($ch);

    curl_close($ch);

    $data = json_decode($json, true);

    print_r($data);

    ?>

  10. sushant

    hey Mike,
    Thank you for helping.
    Can u ask u for little more help?

    I am recording audio and sending it to php file. How can I save sent file to .flac format?

    Awaiting reply..

  11. mike Post author

    Hey Sushant,

    I’m not sure what you mean by recording it and sending it to PHP? If you just mean converting to flac, you should be able to find what you need here:

    http://flac.sourceforge.net/download.html

    I’m not a flac expert or anything- I was just using the command line flac utility for Linux.

    Mike

  12. sushant

    Hey Mike,
    I mean to say, converting wave file to flac using php.
    Is there any script for the same?

    If you are Skype let me knw, we can discuss over it.

    Sushant

  13. sushant

    Hey Mike,
    I mean to say, converting wave file to flac using php.
    Is there any script for the same?

    If you are on Skype let me know, we can discuss over it.

    Sushant

  14. Michael Palmer

    Any specific options for the FLAC files, I’m converting from WAV and not getting but three words back on a 50 word voicemail, they are not correct words either. The sample is a 16-bit Mono 8000Hz wav. The FLAC plays and sounds good.

  15. mike Post author

    The only thing to make sure, is that if you’re sending 8khz, that you’re setting the content-type correctly:

    Content-Type: audio/x-flac; rate=8000

    I’m also not sure how long of audio clip you can send it- I’ve noticed some errors when I tried to send too much.

    Mike

  16. Pingback: Record user voice from web browser • PHP HelpPHP Help

  17. Al

    How could this be done in PHP? it would be really apprecited if you can show me a snippet. thank You,

  18. Al

    Hi mike,
    Thanks for your response. Here is my command
    wget –post-file=”test.flac” –header= “Content-Type: audio/x-flac; rate=16000” –output-file=”result.txt

    in my result.txt I am getting the following:

    –2012-05-07 19:11:54– ftp://content-type/%20audio/x-flac;%20rate=16000
    => `x-flac’
    Resolving content-type (content-type)… failed: Name or service not known.
    wget: unable to resolve host address `content-type’
    Do you have any idea why this is happing?

    Is is possible that google have restricted it now?

  19. mike Post author

    I assume you want -O result.txt not –output-file (which puts the logs and not the content in results.txt)

    Also- did you actually include the URL to post to?

    This works fine for me:

    wget –post-file=filename.flac –header=”Content-Type: audio/x-flac; rate=16000″ -O result.txt “https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”

    Note the quotes around the header and URL.

    Mike

  20. Pingback: Speech to Text in Action Script 3 | Nguyễn Xuân Hòa

  21. Al

    it looks like my encoding was not correct . changed the content-type audio/flac instead of x-flac and now I am getting 400. Unknown encoding. thanks anyways.

  22. amit

    Hi there,

    Just wondering…following your process will enable me to access Google Speech API from any browser right? or at least modern browsers… I guess that’s what the point of the post is.. to by pass the need of Chrome… just making sure before I dive into coding everything

    Thanks,
    ak

  23. mike Post author

    It will let you access the Google Speech service via a simple web request- so from anything that can POST an audio file to google (so from the command line or from server-side languages like PHP, PERL, etc).

    Mike

  24. Pingback: Is there an API for Google's speech recognition technology? | PHP Developer Resource

  25. Sajin

    Hey Mike,

    I am woking on audio transciption webiste. I wanted to run the google api using cron in the linux machine. Is it possible to run your PHP in the backend.

  26. Al

    So I have tested to google api numerous times and it seems that it only works for short, less than 15 sec audio files, like in Mikes’ example. I am not sure what is the exact limits of terms of audio files size. I tried digging into the chromium source code as well and could not tell. But if you try posting an audio that is longer than 15 sec or more the service will return “Entity too large” response.

  27. mike Post author

    Hi Al,

    That was my experience as well- I’m not sure what the exact length is, but it makes sense as it’s meant for short in-browser commands.

    Mike

  28. Hanamiti

    This is awesome, thank you for sharing the knowledge! This might be perfect for a smart house system (where you shout ‘LET ME OUT! FIRE! FIRE!’ to open the door 🙂 )

  29. Dias

    Hi Mike

    Great Work! and many thanks.. I was trying to find out how to record using the speech api. But, now I can use this technique instead and send my recorded voice to the server.

    I have an issue with the format though, I recorded “1” the speech api on my browser returns “1”. But when I encode it to flac and try it returns “new york” with a confidence of 60%. I have posted SOX o/p of my recording. Kindly let me know the formats you used.

    prompt3.flac:

    File Size: 29.7k Bit Rate: 418k
    Encoding: FLAC Info: Processed by SoX
    Channels: 2 @ 16-bit
    Samplerate: 44100Hz
    Replaygain: off
    Duration: 00:00:00.57

    In:100% 00:00:00.57 [00:00:00.00] Out:25.1k [ | ] Hd:3.6 Clip:0
    Done.

  30. Dias

    Hey Its working now changed the bitrate as mentioned in the comments.. thanks again mike!

  31. Cupidvogel

    Thanks Mike. Great info! I just want to know one thing, since I have no idea about audio codecs and stuff, if I have a 3MB audio file (flac), and I want to test only a part of it, say 200 bytes following the first 100 bytes. Will that be regarded as valid codec if I let Perl extract those 200 bytes and send them?

  32. mike Post author

    No, unfortunately that wouldn’t work, because FLAC includes metadata information encoded in the audio files.

    What you would need to do, is decode the FLAC file into a linear RAW format, then extract the section of audio you want (which you could calculate based on the number of channels and bitrate), then re-encode that chunk as FLAC before you send it to Google.

    You could use a command line tool like sox to decode/encode the FLAC file if you didn’t want to deal with using the libraries directly.

    In fact- you may be able to use sox to extract your audio clip directly- I know it has a “trim” function.

    Mike

  33. Cupidvogel

    Thanks for your quick reply. Please elaborate a bit further. is it possible to do what I want to do? I mean that if I specify that I want to read 200 bytes from the 100th bytes onward of this flac file, is there a software which will encode those 200 bytes long of raw text to make it a FLAC file?

  34. Cupidvogel

    And secondly, while extracting a portion of the binary text, is it possible to ascertain that at what time does that portion of the clip play if played from the start, and for how long? That is, if I skip 300 bytes, extract 200 bytes, can I get info like ‘these 200 bytes are played back from 00:00:41 to 00:00:48’? A codec has lots of metadata, like bit-rate, frame-rate, etc, as far as I know, can they be of any help in this regard?

  35. Alex

    Hello Mike. I have realized your method on Java. It works, but not so good as i wish.
    I receive this response
    {“status”:0,”id”:”c0c1bd9ce960ff381cb6d3c41eb9cb12-1″,
    “hypotheses”:[{“utterance”:”Ñ…Ñ?ллоу”,”confidence”:0.7139321}]}
    when i send the request to the server.
    “Ñ…Ñ?ллоу” – is always different, but i can`t translate this.
    The “confidence” is always above 0.7.
    How do you think, why does it happen?

  36. mike Post author

    I’m not sure what that is- it could be UTF-8 encoded data?

    Have you tried using utf8_decode() on it?

    Are you trying to process english data?

    Mike

  37. danny

    Hello Mike,

    I have command in ubuntu:
    wget –post-file=ilv16k.flac –header=”Content-Type: audio/x-flac; rate=16000″ “https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”

    -> result is: ERROR 405: HTTP method GET is not supported by this URL.

    And I code sample java program like:
    ———-
    package test;

    import java.io.BufferedReader;
    import java.io.File;
    import java.io.FileInputStream;
    import java.io.InputStream;
    import java.io.InputStreamReader;
    import java.io.OutputStream;
    import java.net.Socket;
    import javax.net.ssl.SSLSocketFactory;

    public class test_https {
    public static final String TARGET_HTTPS_SERVER = “www.google.com”;
    public static final int TARGET_HTTPS_PORT = 443;
    public static final String FILE_NAME = “ilv16k.flac”;
    public static void main(String[] args) throws Exception {
    Socket socket = SSLSocketFactory.getDefault().createSocket(TARGET_HTTPS_SERVER, TARGET_HTTPS_PORT);

    try {
    int lenf=(int) new File(FILE_NAME).length();
    OutputStream os = socket.getOutputStream();
    String rq = “”;
    rq+=”POST /speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US HTTP/1.1\r\n”;
    rq+=”Host: ” + TARGET_HTTPS_SERVER+ “\r\n”;
    rq+=”Content-Type: audio/x-flac; rate=16000\r\n”;
    rq+=”Content-Length:” + lenf + “\r\n”;
    rq+=”\r\n”;

    System.out.println(lenf);

    byte [] bh = rq.getBytes(“UTF-8”);

    os.write(bh, 0, bh.length);
    //os.flush();
    InputStream ip = new FileInputStream(new File(FILE_NAME));

    byte [] buf= new byte[1024];
    int ll=0;
    int L=0;
    while ((ll=ip.read(buf))>0){
    //System.out.println(“ll=”+ll);
    os.write(buf, 0, ll);

    }
    os.flush();

    BufferedReader in = new BufferedReader(
    new InputStreamReader(socket.getInputStream(), “ISO-8859-1”));
    String line = null;
    while ((line = in.readLine()) != null) {
    System.out.println(line);
    }
    } finally {
    socket.close();
    }
    }
    }
    ————
    here is my response:
    HTTP/1.1 200 OK
    Content-Type: application/json; charset=utf-8
    Content-Disposition: attachment
    Date: Thu, 19 Jul 2012 13:12:39 GMT
    Expires: Thu, 19 Jul 2012 13:12:39 GMT
    Cache-Control: private, max-age=0
    X-Content-Type-Options: nosniff
    X-Frame-Options: SAMEORIGIN
    X-XSS-Protection: 1; mode=block
    Server: GSE
    Transfer-Encoding: chunked

    47
    {“status”:5,”id”:”f90432629c25087d95ffd780f7d52838-1″,”hypotheses”:[]}

    0
    ——-

    Status is 5 NOT 0

    Help me, thanks

  38. mike Post author

    Hey Danny,

    Well, the error message it’s giving you is pretty clear- “method GET is not supported by this URL.”

    The system only supports POST requests, not get requests.

    I assume something is failing when you’re calling

    --post-file=

    as including that is supposed to make wget make a POST request.

    When you make the request, it should be “--post-file=ilv16k.flac” with two dashes and not the single em dash char (–)

    Maybe that got screwed up when you copied/pasted?

    Mike

  39. danny

    Hi,
    Thanks your help, it work fine. But my, i have some change to corecct about Content-Type, this work fine for my flc file.
    Content-Type: audio/x-flac; rate=41000

  40. mike Post author

    Yes, for sure- you’ll need to set the bitrate of your audio clip in the content-type header.

    My file was a 16kbps file.

    Mike

Leave a Reply

Your email address will not be published. Required fields are marked *