I’ve posted an updated version of this article here, using the new full-duplex streaming API.
Just yesterday, Google pushed version 11 of their Chrome browser into beta, and along with it, one really interesting new feature- support for the HTML5 speech input API. This means that you’ll be able to talk to your computer, and Chrome will be able to interpret it. This feature has been available for awhile on Android devices, so many of you will already be used to it, and welcome the new feature.
If you’re running Chrome version 11, you can test out the new speech capabilities by going to their simple test page on the html5rocks.com site:
http://slides.html5rocks.com/#speech-input
Genius! but how does it work? I started digging around in the Chromium source code, to find out if the speech recognition is implemented as a library built into Chrome, or, if it sends the audio back to Google to process- I know I’ve seen the Sphynx libraries in the Android build, but I was sure the latter was the case- the speech recognition was really good, and that’s really hard to do without really good language models- not something you’d be able to build into a browser.
I found the files I was looking for in the chromium source repo:
http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/
It looks like the audio is collected from the mic, and then passed via an HTTPS POST to a Google web service, which responds with a JSON object with the results. Looking through their audio encoder code, it looks like the audio can be either FLAC or Speex– but it looks like it’s some sort of specially modified version of Speex- I’m not sure what it is, but it just didn’t look quite right.
If that’s the case, there should be no reason why I can’t just POST something to it myself?
The URL listed in speech_recognition_request.cc is:
https://www.google.com/speech-api/v1/recognize
So a quick few lines of PERL (or PHP or just use wget on the command line):
#!/usr/bin/perl require LWP::UserAgent; my $url = "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US"; my $audio = ""; open(FILE, "<" . $ARGV[0]); while(<FILE>) { $audio .= $_; } close(FILE); my $ua = LWP::UserAgent->new; my $response = $ua->post($url, Content_Type => "audio/x-flac; rate=16000", Content => $audio); if ($response->is_success) { print $response->content; } 1;
This quick PERL script uses LWP::UserAgent to POST the binary audio from my audio clip; I recorded a quick wav file, and then converted it to FLAC on the command line (see SoX for more info)
To run it, just do:
[root@prague mike]# ./speech i_like_pickles.flac
The response is pretty straight forward JSON:
{ "status": 0, "id": "b3447b5d98c5653e0067f35b32c0a8ca-1", "hypotheses": [ { "utterance": "i like pickles", "confidence": 0.9012539 }, { "utterance": "i like pickle" }] }
I’m not sure if Google is intending this to be a public, usable web service API, but it works- and has all sorts of possibilities!
Were you able to access the force alignment of speech and text as well?
not sure why the code didn’t post the first time.
but here it is again.
Pingback: I Hope Apple Improves Voice Recognition... » Gadgets, Software » Russell Heimlich
[code]
$filename = ‘/path/to/flac/file/my_soundfile.flac’;
$handle = fopen($filename, “r”);
$XPost = fread($handle, filesize($filename));
fclose($handle);
$url = “https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”;
$ch = curl_init(); // initialize curl handle
curl_setopt($ch, CURLOPT_URL, $url); // set url to post to
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // return into a variable
curl_setopt($ch, CURLOPT_HTTPHEADER, Array(“Content-Type: audio/x-flac; rate=16000”));
curl_setopt($ch, CURLOPT_TIMEOUT, 300); // times out after 30s
curl_setopt($ch, CURLOPT_POSTFIELDS, $XPost); // add POST fields
curl_setopt($ch, CURLOPT_POST, 1);
$str = curl_exec($ch); // run the whole process
curl_close($ch);
//decode returned json to associative array
$objs = json_decode($str, true);
//extract the data we need
$converted_text = $objs[“hypotheses”][0][“utterance”];
$score = round($objs[“hypotheses”][0][“confidence”], 2);
?>
[/code]
I am trying use: “https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=es”; for spanish recongnition and it dont work, any suggest?
Regards!
cause you need, fisrt, change de language… take a look on parameter lang=en-US, probably you will need to use something like es-ES I think… or not??
Hi,
found this by pure chance – great thread !
Unfortunately, I do not get a result. wget returns: “Authority of issuer of the certificate cannot checked
locally.”
This surprises me, since I have called wget with option –no-check-certificate
Any ideas ?
Thanks a lot in advance for your help
Joshi
I want to support multiple languages with this link, and I know it’s possible, because lang=es-mx for example, recognizes spanish.
My questions are:
* How do I know which languages are supported?
* Can I use a flag to indicate “auto-recognize”?
Did anyone try with WAV or g711 files? Without conversion… I mean with something like:
Content-Type: audio/x-wav; rate=16000 ?
Zibri, I tried giving it a wav but the web service rejects it. Flac is the way to go for now it seems
I recorded a wav file test.wav, and then used sox to convert it to test.flac. I copied the posted perl script into file test.pl, and created a test.txt which contains the test.flac file name. When i run: perl test.pl test.txt, i got no recognition result. {“status”:5,”id”:”915a84d84d46e13fed2f52a44b652bfc-1″,”hypotheses”:[]}
BTW, I run it in cygwin
Could you please tell me where i made mistake?
thanks,
Victor
Hey Victor,
Why are you passing in a txt file?
The PERL script takes the name of an audio file as the first argument, and not a txt file that contains the name of the audio file.
Also make sure you’ve encoded it at either 8khz or 16khz, and adjust the Content-Type header accordingly; The example I posted uses 16.
Mike
Very interesting feature. Someone could share a PHP code that reads the FLAC archive name locally in server and do this?
Hey, I’ve written a bit of Perl code, and created sort of my own version of Siri / Iris. It’s still pretty rough around the edges, but I wanted to post it on this blog since it’s where I got some of the code. You can use sox to normalize the sound to -5 dB, which helps improve accuracy.
So, how can I post a file on here?
Hey Vic,
I dont really have a way to post files- but you could just include a link somewhere to download-
Mike
Hi Mike,
I wanted to say that you are one of the only references online for posting data to the google speech recognition engine. Thanks for sharing the info.
I’m trying to use the google speech recognition with my robot and I would like to write a bash script to do so. I am not familiar with Perl and I’m having a hard time posting the right data to the server using wget or curl.
Can you give me any tip on how to use wget (as you mention in the post) to post the flac file?
Right now, if I try this:
wget –post-file=out.flac https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US
it gives me ERROR 400: Content-Type media type is not audio.
Any help will be very appreciated. Thanks!
Carlitos-
It looks like you’re missing the Content-Type header- if you look back in some of the old comments, I’m pretty sure there was somebody that posted an example using wget.
Mike
Oh! I’m an idiot for forgetting the headers and for not realizing there where more comments!
Here is my result, this will record, encode to flac, send the request to google ans save teh output to speech.txt:
arecord -f cd -t wav -d 5 -r 16000 | flac – -f –best –sample-rate 16000 -o out.flac; wget –post-file out.flac –header=”Content-Type: audio/x-flac; rate=16000″ -O speech.txt http://www.google.com/speech-api/v1/recognize?lang=en
Thanks a lot for your code and help Mike!
FYI:
status: 0 – correct
status: 4 – missing audio file
status: 5 – incorrect audio file
Sample rate can be anything between 8000 and 44000 (not 44100), and doesn’t have to be exactly 8000 or 16000. If it is out of bounds you get an 400 html error page returned.
very interesting, thank you a lot for sharing. Does anybody have an idea on how to send audio from mic in real time instead that posting the flac file? i would like to reuse chrome for speech recognition, but my system would not have a monitor or tv, so clicking the mic picture to start recognition is not possible.
thank you all
Thanks for researching Chromes speech recognition API. Great work! I made a little video using the api from the Linux command line: http://www.youtube.com/watch?v=Sf3dMgooufc (use it if you like)
Pingback: Hack into the Google Chrome beta speech recognition api | Robert Buzink
this is for English language , how to set the required language
Pingback: Nao 1337 uses Google Speech-to-Text Service | Carlitos' Contraptions
Hiyassat, I would imagine you can change where it says “lang=en-us” to lang= . Haven’t tested that though.
Pingback: Building My Own Siri / Jarvis « cranklin.com
Mike,
any examples of iOS use of this API? I have an Xcode project where I need to convert the voice recorded .wav file in real-time to .flac and then would like to POST to the google API. Any thoughts or examples would be great.
Interesting! It is very easy to use.
The Speech Input API Specification can be found at http://www.w3.org/2005/Incubator/htmlspeech/2010/10/google-api-draft.html ( and I found it in the repo )
I think anyone who reads this article and wants to use Google Speech API should have a look at the code in the
source repository first.
BTW, this is the URL I use: ( lang=zh-CN is for Chinese and it works fine 😀 )
http://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=zh-CN&maxresults=1
Is there a way that google speech api can detect pauses in a statement? as in
If I say “Java is a programming language. and so is C.
can i forcefully make google translate the statement uptil the pause and then translate the rest of the statement?
I guess I’m almost there, just can’t seem to get the wget example working, my result file keeps returning empty and I’m not getting any errors either.
Could anyone please post a quick example as to how to go about posting to the API using cURL?
many thanks!
Okay, I did the wget call working in the end, but I’m still not there.
I am calling the API from a php file with an exec command with the api call.
Strangely, the addressbar changes from “speech.php” (my file) to “main?url=out.flac&tid=0&w=1440&h=809″ and my result file contains some weird html containing an iframe (with main?url=out.flac&tid=0&w=1440&h=809 as its src).
Here’s my code:
$cmd = ‘wget –post-file out.flac -header=”Content-Type: audio/x-flac; rate=16000” -O resultaat.html https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US‘;
exec($cmd);
echo file_get_contents(“resultaat.html”);
It seems like instead of returning the JSON object I desire, the API tries to redirect the call.
I hardly doubt the Google people have just now restricted the use of their speech-to-text, so I figure it’s me who fails.
Any pointers would be greatly appreciated.
Okay, I finally got it working.
I’m sorry for the blob comments I placed in the process.
Since I figure I won’t be the last person to walk in on the troubles I experienced, I’ll try to make up my blobbing by explaining how I resolved the matter. I hope that by doing so I can help others googling into this thread.
First of all, I must point out that I have been trying to make this work on a windows machine. So in order to successfully make a wget call I had to install GnuWin Wget (http://sourceforge.net/projects/gnuwin32/files/wget/1.11.4-1/). The problems I experienced earlier were due to the fact that I was using the wrong release. Make sure you download and run the setup program with all dependency files included.
Second of all, the win32 version of wget seems to accept its parameters slightly different than explained in this article and the comments underneath.
The command I issue is now “wget –post-file=”out.flac” –header=”Content-Type: audio/x-flac; rate=16000″ –output-file=”result.txt” –no-check-certificate https://www.google.com/speech-api/v1/recognize?lang=en“.
Unlike I expected this didn’t write the actual desired google JSON response to result.txt.
Instead it writes the wget log to the file. This log contained “HTTP request sent, awaiting response… 200 OK Length: unspecified [application/json] Saving to: ‘recognize@lang=en'”. Why on earth it writes the JSON obj to a file called ‘recognize@lang=en’ is a riddle to me, but sure enough, the file (saved in the wget executable dir) contains the text I desired.
Strangely, the accuracy is pretty low and thus the result/recognition level is quite bad. The api recognizes only half of the words the example at html5rocks and google translate do. I figure they use a higher bitrate or something or there’s something else I am not taking into account.
Anyway, thanks for this article and the comments that helped my on my way, I hope I was able to contribute.
This is the command that worked for me on Ubuntu 11.10
wget http://www.google.com/speech-api/v1/recognize?lang=en-us –header “Content-Type: audio/x-flac; rate=16000” –post-file=NameOfFile.flac –output-file=output.txt
Hope this helped! 🙂
Pingback: Asiri: Let Siri Speak Your Language | Abdulrahman Alotaiba's Blog
I have no idea what I’m doing wrong or what this output means when I do wget. I’ve tried everyone’s statements here for wget and none seem to work for me.
wget http://www.google.com/speech-api/v1/recognize?lang=en-us -header “Content-Type: audio/x-flac; rate=16000? -post-file=rec1.flac -output-file=output.txt
SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:\Program Files\GnuWin32/etc/wgetrc
–2012-02-05 17:34:11– http://www.google.com/speech-api/v1/recognize?lang=en-us
Resolving http://www.google.com... 74.125.113.104, 74.125.113.105, 74.125.113.106, …
Connecting to http://www.google.com|74.125.113.104|:80… connected.
HTTP request sent, awaiting response… 405 HTTP method GET is not supported by this URL
2012-02-05 17:34:12 ERROR 405: HTTP method GET is not supported by this URL.
–2012-02-05 17:34:12– http://%96header/
Resolving \226header… failed: No data record of requested type.
wget: unable to resolve host address `-header’
–2012-02-05 17:34:14– ftp://%93content-type/
=> `.listing’
Resolving \223content-type… failed: No data record of requested type.
wget: unable to resolve host address `”content-type’
unlink: No such file or directory
–2012-02-05 17:34:16– http://audio/x-flac;
Resolving audio… failed: No data record of requested type.
wget: unable to resolve host address `audio’
–2012-02-05 17:34:18– http://rate=16000/?
Resolving rate=16000… failed: No data record of requested type.
wget: unable to resolve host address `rate=16000′
–2012-02-05 17:34:21– http://%96post-file=rec1.flac/
Resolving \226post-file=rec1.flac… failed: No data record of requested type.
wget: unable to resolve host address `-post-file=rec1.flac’
–2012-02-05 17:34:21– http://%96output-file=output.txt/
Resolving \226output-file=output.txt… failed: No data record of requested type.
wget: unable to resolve host address `-output-file=output.txt’
I have tried this code in PHP.
<?php
$url = "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US";
$file = 'hello.flac';
$audio = "";
$file=fopen("hello.flac","r");
while(!feof($file)) {
$audio .= fgets($file). "”;
}
fclose($file);
$data = array(‘Content_Type’ => ‘audio/x-flac; rate=16000′,’Content’ => @$audio);
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_POST,true);
curl_setopt($ch,CURLOPT_POSTFIELDS,$data);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
echo $result;
curl_close($ch);
?>
But it is giving me response as:
Content-Type media type is not audio
Content-Type media type is not audio
Error 400
Can anybody suggest me, where I am doing wrong???????
vfnik, the @ in a curl postfields array is a reference to a file, so, instead of reading the data in with the fopen/fread etc, just put the filename there so, instead of @$audio, use @$file
untested but i think that will work
it works!!!!
For those interested in using SPEEX codec, there is a fork that implements the MIME-Type “x-speex-with-header-byte” and works perfectly against Google’s Speech Recognition APIs. Its available at http://qxip.net/wiki or directly on GitHUB: https://github.com/QXIP/Speex-with-header-bytes
I ported your code to Python (thanks!!):
import urllib2
import os
import sys
audio = open(sys.argv[1], ‘rb’)
filesize = os.path.getsize(sys.argv[1])
print sys.argv[1],’ Read’,”\n”
req = urllib2.Request(url=’https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US’)
req.add_header(‘Content-type’,’audio/x-flac; rate=16000′)
req.add_header(‘Content-length’, str(filesize))
req.add_data(audio)
print ‘Request built’,”\n”
response = urllib2.urlopen(req)
print ‘Response returned’,”\n”
print response.read()
Your code works fine without modification, thank you; however, the response from Google is somewhat confusing.
“hypotheses”:[{“utterance”:”poke poke poke poke poke poke poke poke”,”confidence”:0.96547246}]}
This is from a voicemail test that I just recorded and I can assure you that the word “poke” was not included.
Has anyone else seen this response from Google?
Doh! If your phone system records at a 8000 bit rate; Google works better if you tell it to use 8000 instead of 16000!
If anyone else gets odd responses from Google, check that your bit rate is the same.
hello .. does anyone know if it-IT is the right configuration to set it to italian language?
hello,
I made this code, but I always reports this error
Content-Type media type is not sound
Error 400
how can I fix this?
‘audio/x-flac; rate=16000′,’Content’ => @$file);
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_POST,true);curl_setopt($ch,CURLOPT_POSTFIELDS,$data);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
echo $result;
curl_close($ch);
?>
Hi, i am trying to get .net to work with this…
Here is my code below– what am I doing wrong!?
Dim rdr As New FileStream(“c:\record.flac”, FileMode.Open)
Dim req As HttpWebRequest = DirectCast(WebRequest.Create(“https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”), HttpWebRequest)
req.Method = “POST”
req.ContentType = “audio/x-flac; rate=16000”
req.ContentLength = rdr.Length
req.AllowWriteStreamBuffering = True
Dim reqStream As Stream = req.GetRequestStream()
Dim inData As Byte() = New Byte(rdr.Length – 1) {}
Dim bytesRead As Integer = rdr.Read(inData, 0, rdr.Length)
reqStream.Write(inData, 0, rdr.Length)
rdr.Close()
reqStream.Close()
Dim response As HttpWebResponse = req.GetResponse()
Using reader As StreamReader = New StreamReader(response.GetResponseStream())
s = reader.ReadToEnd()
End Using
All fiddler says to me is Tunnel through http://www.google.com:443
PLEASE HELP ME!
HELP ME!!
hello,
I made this code, but I always reports this error
Content-Type media type is not sound
Error 400
how can I fix this?
‘audio/x-flac; rate=16000′,’Content’ => @$file);
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_POST,true);curl_setopt($ch,CURLOPT_POSTFIELDS,$data);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
echo $result;
curl_close($ch);
?>
Savez,
The code you provided is incomplete;
where is $url set? or $data? or $file?
and I don’t understand what this line does:
‘audio/x-flac; rate=16000′,’Content’ => @$file);
that’s not even syntactically valid- I assume it was cut off when you posted it?
Somewhere on that line, you need to actually set “Content-Type” to the audio/flac value-
Without the rest of this code, it’s impossible to say why it’s not working.
Mike
sorry, you have deleted the code I posted qunado
I put the code that gives me error.
‘audio/x-wav’,’Content’ => @$audio);
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_POST,true);curl_setopt($ch,CURLOPT_POSTFIELDS,$data);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
echo $result;
curl_close($ch);
?>
‘audio/x-wav’,’Content’ => @$audio);
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_POST,true);curl_setopt($ch,CURLOPT_POSTFIELDS,$data);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
echo $result;
curl_close($ch);
?>
$url = “https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US”;
$file = ‘hello.flac’; $audio = “”; $files=fopen($file,”r”);
while(!feof($files)) { $audio .= fgets($files);}
fclose($files); $data = array(‘Content_Type’ => ‘audio/x-wav’,’Content’ => @$audio);
$ch = curl_init(); curl_setopt($ch,CURLOPT_URL,$url); curl_setopt($ch,CURLOPT_POST,true);curl_setopt($ch,CURLOPT_POSTFIELDS,$data);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch); echo $result; curl_close($ch);