Accessing Google Speech API / Chrome 11

I’ve posted an updated version of this article here, using the new full-duplex streaming API.

Just yesterday, Google pushed version 11 of their Chrome browser into beta, and along with it, one really interesting new feature- support for the HTML5 speech input API. This means that you’ll be able to talk to your computer, and Chrome will be able to interpret it. This feature has been available for awhile on Android devices, so many of you will already be used to it, and welcome the new feature.

If you’re running Chrome version 11, you can test out the new speech capabilities by going to their simple test page on the html5rocks.com site:

http://slides.html5rocks.com/#speech-input

Genius! but how does it work? I started digging around in the Chromium source code, to find out if the speech recognition is implemented as a library built into Chrome, or, if it sends the audio back to Google to process- I know I’ve seen the Sphynx libraries in the Android build, but I was sure the latter was the case- the speech recognition was really good, and that’s really hard to do without really good language models- not something you’d be able to build into a browser.

I found the files I was looking for in the chromium source repo:

http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/

It looks like the audio is collected from the mic, and then passed via an HTTPS POST to a Google web service, which responds with a JSON object with the results. Looking through their audio encoder code, it looks like the audio can be either FLAC or Speex– but it looks like it’s some sort of specially modified version of Speex- I’m not sure what it is, but it just didn’t look quite right.

If that’s the case, there should be no reason why I can’t just POST something to it myself?

The URL listed in speech_recognition_request.cc is:

https://www.google.com/speech-api/v1/recognize

So a quick few lines of PERL (or PHP or just use wget on the command line):

#!/usr/bin/perl

require LWP::UserAgent;

my $url = "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US";
my $audio = "";

open(FILE, "<" . $ARGV[0]);
while(<FILE>)
{
    $audio .= $_;
}
close(FILE);

my $ua = LWP::UserAgent->new;

my $response = $ua->post($url, Content_Type => "audio/x-flac; rate=16000", Content => $audio);
if ($response->is_success)
{
    print $response->content;
}

1;

This quick PERL script uses LWP::UserAgent to POST the binary audio from my audio clip; I recorded a quick wav file, and then converted it to FLAC on the command line (see SoX for more info)

To run it, just do:

[root@prague mike]# ./speech i_like_pickles.flac

The response is pretty straight forward JSON:

{
    "status": 0,
    "id": "b3447b5d98c5653e0067f35b32c0a8ca-1",
    "hypotheses": [
    {
        "utterance": "i like pickles",
        "confidence": 0.9012539
    },
    {
        "utterance": "i like pickle"
    }]
}

I’m not sure if Google is intending this to be a public, usable web service API, but it works- and has all sorts of possibilities!

265 thoughts on “Accessing Google Speech API / Chrome 11

  1. baael

    In my case problem was that for some reason temporary file was too big (somehow it reached 100 megabytes) and it timeouted – now i’m clearing this file every time before recording, also header and file should have same rate: “audio/x-flac; rate=YOUR_RATE;” otherwise google will return empty results.

  2. Pingback: Making my own Speech to Text engine - Perception

  3. Chad Smith

    I noticed this is from 2011, does it still work? Is there something more up to date?

  4. James

    google now will not accept flac files over 70kb or it returns a 500 error 🙁

  5. voicer

    The key for google API can be get in the google APIs console for free.
    But the question is how to increase the max limit of 50 requests ‘a day?
    I found no way to get more, and the link to order quote is dead.

    The Google corp. forces me and other developers to find any other way to solve this problem.
    I see the solution in java script for Chrome browser. But if anybody need this API i.e. for C# then he must to switch the source of sound from microphone into the file stream in flac format.

    My question is now – how to change the source of sound from microphone into binary file on disk?

  6. Pingback: How does Google Keep do Speech Recognition while saving the audio recording at the same time?

  7. Pingback: Statisticians are evil | Aswin van Woudenberg

  8. Pingback: How does Google Keep do Speech Recognition while saving the audio recording at the same time? - Technology

  9. Pingback: » Speech Recognition API

  10. Pingback: Winning the Rails Rumble Retrospective | Imran's Coding Blog

  11. Sourabh Gune

    Hey guys I am using speech api for generating subtitles for videos but as api accepts only 15 sec of audio we want to divide extracted audio into frames of 15 secs. Can anyone please help me how this can be done. Also I dont want to use any tool for dividing in frames.

  12. Pingback: Google Speech API, 그리고 음성인식 기술 | Chance's Home

Leave a Reply

Your email address will not be published. Required fields are marked *