Accessing Google Speech API / Chrome 11

I’ve posted an updated version of this article here, using the new full-duplex streaming API.

Just yesterday, Google pushed version 11 of their Chrome browser into beta, and along with it, one really interesting new feature- support for the HTML5 speech input API. This means that you’ll be able to talk to your computer, and Chrome will be able to interpret it. This feature has been available for awhile on Android devices, so many of you will already be used to it, and welcome the new feature.

If you’re running Chrome version 11, you can test out the new speech capabilities by going to their simple test page on the html5rocks.com site:

http://slides.html5rocks.com/#speech-input

Genius! but how does it work? I started digging around in the Chromium source code, to find out if the speech recognition is implemented as a library built into Chrome, or, if it sends the audio back to Google to process- I know I’ve seen the Sphynx libraries in the Android build, but I was sure the latter was the case- the speech recognition was really good, and that’s really hard to do without really good language models- not something you’d be able to build into a browser.

I found the files I was looking for in the chromium source repo:

http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/

It looks like the audio is collected from the mic, and then passed via an HTTPS POST to a Google web service, which responds with a JSON object with the results. Looking through their audio encoder code, it looks like the audio can be either FLAC or Speex– but it looks like it’s some sort of specially modified version of Speex- I’m not sure what it is, but it just didn’t look quite right.

If that’s the case, there should be no reason why I can’t just POST something to it myself?

The URL listed in speech_recognition_request.cc is:

https://www.google.com/speech-api/v1/recognize

So a quick few lines of PERL (or PHP or just use wget on the command line):

#!/usr/bin/perl

require LWP::UserAgent;

my $url = "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US";
my $audio = "";

open(FILE, "<" . $ARGV[0]);
while(<FILE>)
{
    $audio .= $_;
}
close(FILE);

my $ua = LWP::UserAgent->new;

my $response = $ua->post($url, Content_Type => "audio/x-flac; rate=16000", Content => $audio);
if ($response->is_success)
{
    print $response->content;
}

1;

This quick PERL script uses LWP::UserAgent to POST the binary audio from my audio clip; I recorded a quick wav file, and then converted it to FLAC on the command line (see SoX for more info)

To run it, just do:

[root@prague mike]# ./speech i_like_pickles.flac

The response is pretty straight forward JSON:

{
    "status": 0,
    "id": "b3447b5d98c5653e0067f35b32c0a8ca-1",
    "hypotheses": [
    {
        "utterance": "i like pickles",
        "confidence": 0.9012539
    },
    {
        "utterance": "i like pickle"
    }]
}

I’m not sure if Google is intending this to be a public, usable web service API, but it works- and has all sorts of possibilities!

265 thoughts on “Accessing Google Speech API / Chrome 11”

todd 2011/03/24 at 8:38 am

And i was just starting to go down this path and sure enough you beat me to the punch! Thank you this is really amazing information!
Pingback: Introducing Speech 2 Text API by Google | Captico
Milton 2011/03/25 at 1:43 pm

Thanks! I’ve been watching Speechrecognition API thread for a while now and hoping that someone would add some new information. I saw the HTML5 speech input announcement for Chrome 11 and was hoping that it could be accessed somehow…you just proved it can…let’s hope it stays that way!
Wojtek 2011/03/25 at 11:37 pm

Well,
Thank you, you just made my day 😉
Wojtek
KaiK 2011/03/29 at 6:32 am

Hi!

I’ve also been “playing” with google STT engine, just with wget.
It works fine, but I’ve not been able to add a link to a grammar. Have you tried something similar?
Looking in the code, it’s supossed to expect the variable lm as the URL to a standard SRGS grammar (grxml), for example:
wget –post-file flacs/pieles_de_naranja.flac –header=”Content-Type: audio/x-flac; rate=16000″ -O – “http://www.google.com/speech-api/v1/recognize?lang=es&lm=http://www.naradarobotics.com/testGrammar.grxml”

The response is the same I get without grammar (as free text).

Do you have any idea on how to deal with this issue?

Thanks in advanced!
mike Post author2011/03/29 at 10:59 pm

Hey Kaik,

I haven’t tried passing in grammer settings- I didn’t see much in the chrome code about grammer, other than it simply passing the value through the URL-

did you find anything that indicated that it expected the format as a SRGS XML file?

Mike
hesperaux 2011/03/30 at 11:08 pm

I’ve been trying to get this to work. It fails for me. Has google found out we’re using it (what would be their problem with that?)?

What I did:
wget -U “Mozilla/5.0″ –post-file=recording.flac –header=”Content-Type: audio/x-flac; rate=16000” -O – “http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium”

What I got:

HTTP request sent, awaiting response… 500 Internal Server Error
2011-03-30 17:11:54 ERROR 500: Internal Server Error.

Any thoughts?
mike Post author2011/03/31 at 9:43 am

I don’t think so- I tried it again this morning and it worked fine; what I did notice though, is if you send it a large amount of audio, it will timeout or throw an error- unfortunately, it’s tough to tell from this side of the fence.

That being said, I don’t really see how they can block this service- if they want it to work for chrome, then they’re going to have to leave it open; even if they force it to require certain HTTP headers or a key or something, it’s all visible via the chrome source code, and therefore re-producible.

Mike
hesperaux 2011/04/01 at 3:39 pm

Hey Mike,

Thanks for the tip. You’re right: it’s very important not to make the sound file too large. I have been encoding to flac. Perhaps if I used the custom speex codec Android uses (assumedly) I could get more out of it. But I just tested it with a short clip and got a response (lol, incorrect though it may be).
As for blocking the service, you’re probably right. I think it’s designed to work this way. They’d have to change chrome to handshake somehow to avoid people using it out of the browser. Since it’s a free service for everyone in the world anyway, why would they bother to control that?

tl;dr: Don’t post long sound files to the service.
Oxygen 2011/04/04 at 10:52 am

Thank you very much. I hope Google will open STT API soon, because until they are official nobody can be sure that API will work. BTW, that JSON looks like Google Translate’s JSON reply. They’re definitely going to open STT API.

P.S. Speech to Text… F*ck yeah!
Oxygen 2011/04/04 at 10:54 am

P.P.S. What with headers’ fonts? They are not aliased
todd 2011/04/05 at 11:32 am

Hi Mike,

I”ve been working on the issue of file size that google will allow and come up with this solution: https://github.com/taf2/audiosplit. It is very much a work in progress but the idea is to detect minor silences in an audio stream and cut the audio stream into smaller ~ 10 second chunks. It also makes it easy to send an a bit more arbitrary wave file and using ffmpeg, flac and some libsndfile code to chunk the wave files. Combining the results sort of works… Today I am working on merging smaller chunks…

-Todd
Ahmad 2011/04/05 at 10:20 pm

Great Article.
I tried curl, and it seemed to work (I got something back from the Google Server, but it was NOT close to being a good representation of the sound):

size=33036
date_time=Apr 06 03:10
file=recording3.flac

curl -H “Content-Type: audio/x-flac; rate=16000” “https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US” -F “myfile=@recording3.flac”

{“status”:0,”id”:”b4078d226d77e4593700df906c81fb34-1″,”hypotheses”:[{“utterance”:”what is this”,”confidence”:0.4514247}]}

I used a free prog to ‘save’ the file as MP3 (recording3.mp3), and another (VLC) to save it as FLAC (recording3.flac).
mike Post author2011/04/05 at 10:47 pm

There might be a sound quality issue converting first to mp3 then to flac?

Also- if your audio isn’t 16khz (as specified in the Content-Type header), then it’s going to be pretty off too.
ahmad 2011/04/05 at 11:41 pm

Mike,
You are correct! Hope this is of value (and use) by others…. file=recording4.flac
…and that is my way of ‘giving back’.

I have been desperate to find a way to make our videos section 508 compliant; this helps to that end.
What would be really valuable, is to have ‘longer’ audios be transcribed as well; have ‘text equivalents’ too.

PS: Google-Voice, allows recordings of up to 3 minutes, and makes its transcribed text available to you (for free). The past several times that I have tested it, I had to wait under 10 minutes to get the text back (with some editing, still needed).

WORKS as expected (now), using CURL on new file=recording4.flac:
$ curl -H “Content-Type: audio/x-flac; rate=16000” “https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US” -F “myfile=@recording4.flac”

Results are spit out (% is close to 97% now):
{“status”:0,”id”:”916c8a10b3c3f0e8e57573b2f98038ae-1″,”hypotheses”:[{“utterance”:”this is another test”,”confidence”:0.96688336}]}

Anyhow,……. this time, I paid attention to my batch file, running on WinXP, using VLC
(Convert__mp3_to_FLAC__WORKS.bat)—– note: VLC is available for Mac, and Linux as well:

@ECHO OFF
SET VLC_EXE=”c:\program files\videolan\vlc\vlc.exe”

SET file_name=recording4

SET file_path=c:\ahmad\music

SET SRC_File=%file_path%\%file_name%.mp3

SET DST_File=%file_path%\%file_name%.flac

SET transcode_options=vcodec=none,acodec=flac,ab=16,channels=1,samplerate=16000

::—- HIDE the VLC interface & WORK !!!
%VLC_EXE% –file-caching=300 “%SRC_File%” –sout #transcode{%transcode_options%}:file{dst=’%DST_File%’} -I dummy vlc://quit

PAUSE
::exit
Anonymouse 2011/04/13 at 5:35 am

An example with PHP + FFMPEG receiving an MP3 or other audio file, converting it with FFMPEG to FLAC, posting it to the API and returning the response.

$url = “https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”;
$target = “uploads/”;
$target = $target.basename( $_FILES[‘file’][‘name’]);

// move the temporary file
if(move_uploaded_file($_FILES[‘file’][‘tmp_name’], $target)) {
$file = $_FILES[‘file’][‘name’];
$fileflac = substr($file,0,-3).’flac’;

// convert the audio file to flac
exec(‘ffmpeg -i uploads/’.$file.’ -ab 96 -ar 44000 uploads/’.$fileflac);

// make the request
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_POSTFIELDS, array(‘file’=>’@uploads/’.$fileflac));
curl_setopt($ch, CURLOPT_HTTPHEADER, array(‘Content-Type: audio/x-flac; rate=44000’));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($ch);
curl_close($ch);

// return the response
print_r($response);

// delete the audio files
unlink(‘uploads/’.$file);
unlink(‘uploads/’.$fileflac);
}
Ahmad 2011/04/13 at 11:33 pm

To Anonymouse:
I like your approach; in fact you can do mp3-to-flac conversion in Windows as well
(if you have ffmpeg.exe from one of the Open Source downloads):

ffmpeg -i uploads/recording4.mp3 -ab 96 -ar 44000 uploads/recording4.flac

Curious, have you got the PHP code working then? I fiddled with it for a few minutes, with empty $response ! Would you have the code zipped up some place?

PS: Your code could probably be changed a little, to run from CMD line as well (in case ‘upload’ is disallowed on a web server):
/path/to/php audio_to_flac.php uploads/recording22.mp3 > ecording22.txt
Yuri 2011/04/15 at 8:25 am

Thank you for the topic!

Take a look at http://habrahabr.ru/blogs/macosx/117570/ (it is in Russian, but no matter). It Speech Translator for Mac OS X (based on Google speecg recognition API). Source: https://github.com/Kyrie1965/SpeechTranslator
Tim Panton 2011/05/03 at 4:51 pm

Looking at the code, the only change to the speex encoding is that they put a length byte in front of every speex frame. I guess this is so they can easily use speex in VBR mode where the frames are of variable length. It should be pretty easy to create an encoder for that.
Tim Panton 2011/05/04 at 11:28 am

Thanks for the info – I’ve used it to hack up a replacement for non-chrome browsers – java applet based I’m afraid…
http://api.phonefromhere.com/stt/test.html
Horsekiller 2011/05/05 at 12:10 am

Hi guys
Please testing my release

SpeechRecognition v. 1.0 Beta
http://programmer.uz/?action=comments&id=714
Florian 2011/05/05 at 10:35 am

Can you tell us more about your java example? Or release the code?
Horsekiller 2011/05/12 at 10:48 pm

To: Florian

It’s pretty simple.
To encode audio library was used http://javaflacencoder.sourceforge.net/, for deserialization google answer library was used gson.

Ask questions if something else is interested

I’m sorry for my english.
Q 2011/05/25 at 8:09 pm

To: Horsekiller
I just downloaded your SpeechRecognition.jar file. Is it possible for me to run this and send text to Processing?
If it works, could you briefly tell me the procedure?
Pingback: Playing around with speech-to-text « load,buffer,play
Ilya 2011/05/31 at 6:37 am

to Tim Panton,
I have tried to run recognition for Speex audio codec (http://jspeex.sourceforge.net/),
can you give more details about Google changes in this codec as compared to original.
I tell that changes is frame length at start of each frame? For 8khz speex sound frame size is 160 bites,
so should I add 160 int (long) before each 160 bytes buffer array ?
Pingback: How to add a full-vocabulary-sized english language model? « Support Forums
Andre 2011/06/13 at 7:46 pm

Anyone figured out the grammar question ?
ShamblingMound 2011/06/17 at 11:12 am

Has anyone had much success with speex-with-header-byte encoding?

I get back different (terrible) results from the server from the same file, which is something I don’t see happening when posting a flac encoded file. It occasionally returns the correct transcription which leads me to believe I didn’t completely foobar the modified speex encoding.
Pingback: Accessing Google Speech API / Chrome 11 « don’t_panic « marcusjpotter
Anonymous 2011/07/01 at 3:29 pm

Anyone have a clue as to when this will be able to be used commercially?
Henry 2011/07/17 at 12:15 pm

I have been using SoX to record .flac files and this system to translate them to speech. While the service is definitely working (returning valid JSON objects) the accuracy is atrocious, especially compared to using the service from a google site. Could it be related to the audio format? Are there adjustments I could make to improve the accuracy?

Right now i use ‘sox -d file.flac silence -l 1 0 5% 1 2.5 1%’ to record
Henry 2011/07/17 at 12:31 pm

Worked it out – This API requires the audio to be uploaded at 16kHz, SoX defaults to 44.1kHz; so just add a rate conversion to the record command. Below are a few lines of code you can add just after “my $audio” to make this perl script handle voice commands (assuming SoX is installed).

my $file = ‘.flac’;
my $record = “sox -d $file silence -l 1 0 1% 1 2.5 1% rate 16k’;
print “Speak Now \n”;
`$record`;
print “Processing \n”;
open (FILE, “<".$file);
Luke 2011/07/19 at 3:40 am

Can some body help regarding its implementation in Iphone.?
Any code or tutorial using this API.
Luke 2011/07/21 at 8:20 am

I am facing this issue when i use the same call in Iphone. Code i am using is

ASIFormDataRequest *request = [ASIFormDataRequest requestWithURL:url];
NSString *filePath = [[NSBundle mainBundle] pathForResource:@”can_you_keep_a_secret” ofType:@”flac”];
NSData *myData = [NSData dataWithContentsOfFile:filePath];

[request addPostValue:myData forKey:@”Content”];
[request addPostValue:@”audio/x-flac; rate=16000″ forKey:@”Content-Type”];

The response which i am getting is “Content-Type media type is not audio ” .Status code is 400.
Can any one let me know the error in code .? Why this response. ? I am passing .flace file which is running perfect in my VLC player.
Ted Kim 2011/08/02 at 12:06 am

great article!

May I ask you a question?

the result shows only one word(or sentence).
Can I get nbest(multiple) results?
Arunn 2011/08/03 at 3:07 pm

Not sure if this is free for Web, but it’s free for mobile. The quality seems decent:
http://techcrunch.com/2011/08/03/ispeech-launches-free-mobile-sdk-to-bring-speech-recognition-to-ios-android-apps/
Florian Schulz 2011/08/07 at 11:08 am

Pretty late but I made a Java / Processing library some time ago using the techniques mentioned here: http://stt.getflourish.com
Samir Ahmed 2011/08/13 at 11:09 pm

I wrote a Java based program that exploits the Speech API and TTS API to make an interactive desktop assistant

check out the code here
https://www.github.com/samirahmed

or read about it here
http://www.samir-ahmed.com/iris.html
Marcelo Luiz Onhate 2011/08/16 at 6:14 am

Hey man! You saved my graduationg final project!!!! I was almost given up, and I found this post!!! Realy thanks!!!!!!
Pingback: “free” Google Speech Recognition API | Intelligible Babble
Pingback: How to Add Speech Recognition To Website? HTML5 Tips | Globinch
er453r 2011/09/01 at 8:01 am

Hi. I’m trying to get it working w flash speex codec. If someone feels lucky try here: http://stackoverflow.com/questions/7270619/flash-speex-codec-coversion-for-google-speech-api-a-challenge 🙂
Pingback: Flash SPEEX codec coversion for Google Speech API – a challenge | Technical support, Computer, programming issue, issue tracking, quality assurance
Pingback: Speech Recognition for the Web
Fabio 2011/09/08 at 9:24 am

Thanks for the info, plan on using this myself.
juanmol 2011/09/20 at 7:43 am

Hello, i’m trying with the perl script and returns nothing, if i use:

$ curl -H “Content-Type: audio/x-flac; rate=16000” “https://www.google.es/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en_US” -F “myfile=@hello.flac”
{“status”:0,”id”:”b86fd217bf576032ca03746da9c534f9-1″,”hypotheses”:[{“utterance”:”hello good morning”,”confidence”:0.8036038}]}

works fine!! but i need to recognize spanish, then i record “hola, buenas tardes” in a flac file and:

$ curl -H “Content-Type: audio/x-flac; rate=16000” “https://www.google.es/speech-api/v1/recognize?xjerr=1&client=chromium&lang=es_ES” -F “myfile=@hola_es.flac”
{“status”:0,”id”:”2e1f7b23562ccb95e72af513e3f243a0-1″,”hypotheses”:[{“utterance”:”ull”,”confidence”:0.20765519}]}

ull????? What’s ull?? If i use the html5 example, works fine too, but i need in command line. Any ideas?
Raza 2011/09/25 at 11:30 pm

I tried this snippet of c# code but getting back the line below. Any suggestion on what is going wrong would be greatly appreciated.

{“status”:5,”id”:”186c0611e33571e187bbd85c0bbd1f85-1″,”hypotheses”:[]}

————————
string uploadUrl = “https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”;

FileStream rdr = new FileStream(“C:\\test.wav”, FileMode.Open);

WebRequest request = WebRequest.Create(uploadUrl);
request.Method = “POST”;
request.ContentType = “audio/x-flac; rate=16000”;
byte[] byteArray = new byte[rdr.Length];
int bytesRead = rdr.Read(byteArray, 0, byteArray.Length);
request.ContentLength = byteArray.Length;
using (Stream dataStream = request.GetRequestStream())
{
dataStream.Write(byteArray, 0, byteArray.Length);
}

HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader reader = new StreamReader(response.GetResponseStream());
string strText = reader.ReadToEnd();
reader.Close();
Raza 2011/09/25 at 11:53 pm

Please ignore previous message. Forgot to convert file to flac from wav.
slm32006 2011/09/26 at 7:50 pm

for those interested, here is a php script i’m using currently that works quite well.

=====================

mike pultz

personal and professional blog of mike pultz, technology specialist and serial entrepreneur.

Accessing Google Speech API / Chrome 11

265 thoughts on “Accessing Google Speech API / Chrome 11”

Leave a Reply