API Endpoint : http://audiotravels.co.in/api/asr
Audio Format Supported - wav/mp3
Audio Sampling Rate - 8kHz
Number of Channels - 1
#1) Async File API
A) Submit File
To submit an audio file for transcription :
curl -F "audio=@test_hindi.wav" "http://0.0.0.0:0/recognize?model=model_english&ID=123&type=wave&mode=full"
Parameters :
-F - Audio file path
model - Model name
ID - Unique ID
type - Wave (Audio Format)
mode - full/transcript
full : Response with segmentation
transcript : only transacript
You’ll receive a response like this:
mode = full
{
"status": 'success',
"ID": 123,
"execution_time": < execution time >,
"timestamp": < timestamp >,
"fullResult": {
"result": [
{"word": < word >,
"start": < start time >,
"end": < end time >,
"conf": < confidence >,
"gender": < male/female >
}
],
"text": {response text}
}
mode = transcript
{
"status": 'success',
"ID": "123",
"execution_time":< execution time >,
"timestamp": < timestamp >,
"transcript": < response text >
}
B) Webhook API
Method:
PUT/POST/GET
Data Params – All params must
S3_URL: [string] – Only wav and mp3 audio format supported.
modelID: [string] – Model ID()
command: [string] – Commad name – transcribe/segment
ID: [string] – Unique Request ID
userID: [string] – User ID()
mode: [string] – Environment - prod/dev
webhook: [string] – Webhool url, ASR response POST in webhook url
API -
POST
curl -d "S3_URL={Audio URL}&modelID=1&ID=XXX&userID=15&mode=prod&command=transcribe
&webhook={Webhook URL}" -X POST http://audiotravels.co.in/api/asr
PUT
curl -d "S3_URL={Audio URL}&modelID=1&ID=XXX&userID=15&mode=prod&command=transcribe
&webhook={Webhook URL}" -X PUT http://audiotravels.co.in/api/asr
GET
http://audiotravels.co.in/api/asr?S3_URL={Audio URL}&modelID=1&ID=XXX&userID=15
&mode=prod&command=transcribe&webhook={Webhook URL}
Acknowledgement Status:
1) Request Verification
1.1) Success: Send success acknowledgement
{
status: 'success',
message: 'Request Received Successfully',
params:
{
S3_URL: < Audio File URL > ,
modelID: < Model ID > ,
ID: < Unique Request ID > ,
userID: < User ID > ,
webhook: < Webhool URL > ,
mode: < Environment > ,
command: < Commad Name >
},
timestamp: < Timestamp >
}
1.2) Params Error: Send acknowledgement with parameters error message
{
status: 'fail',
message: < Invalid Params Error >,
params:
{
S3_URL: < Audio File URL > ,
modelID: < Model ID > ,
ID: < Unique Request ID > ,
userID: < User ID > ,
webhook: < Webhool URL > ,
mode: < Environment > ,
command: < Commad Name >
},
timestamp: < Timestamp >
}
2) Audio Verification
1.1) Success: Send success acknowledgement
{
status: 'success',
message: 'Audio File Is Valid',
params:
{
S3_URL: < Audio File URL > ,
modelID: < Model ID > ,
ID: < Unique Request ID > ,
userID: < User ID > ,
webhook: < Webhool URL > ,
mode: < Environment > ,
command: < Commad Name >
},
timestamp: < Timestamp >
}
1.2) Error: Send acknowledgement with error message
{
status: 'fail',
message: 'Invalid Audio File'
params:
{
S3_URL: < Audio File URL > ,
modelID: < Model ID > ,
ID: < Unique Request ID > ,
userID: < User ID > ,
webhook: < Webhool URL > ,
mode: < Environment > ,
command: < Commad Name >
},
timestamp: < Timestamp >
}
3) Response
3.1) Success: Send success transcription
{
status: "success",
message: 'Audio Transcript Generated',
ID: < Unique Request ID > ,
S3_URL: < Audio File URL > ,
processing_time: < Processing Time >,
timestamp: < Timestamp >,
version: < Version >,
alternatives: [
{
transcript: ,
words: [
{
word: ,
start: ,
end: ,
conf: ,
gender:
}],
score:
}]
}
3.2) Error : Send acknowledgement with status
{
status: 'fail',
message: 'Audio Transcript Not Generated',
ID: < Unique Request ID > ,
S3_URL: < Audio File URL > ,
processing_time: < Processing Time >,
timestamp: < Timestamp >,
version: < Version >,
alternatives: [
{
transcript: ,
words: [
{
word: ,
start: ,
end: ,
conf: ,
gender:
}],
score:
}]
}
#2) Streaming API
Vspeech.ai streaming speech-to-text API (Streaming API) uses the WebSocket protocol
to deliver communication over a single TCP connection.
API URL : ws://0.0.0.0:0
Content Supported - audio/x-raw / audio/x-flac / audio/x-wav
Audio Sampling Rate - 8kHz
You’ll receive a response like this:
msg: END OF SESSION:
msg: 8:
msg: 7: Send tag: EOS
msg: 5: Send: blob: audio/x-raw, 1486
msg: 8: {"partial":"hello one two three"}
msg: 5: Send: blob: audio/x-raw, 4460
msg: 8: {"partial":"hello one two"}
msg: 5: Send: blob: audio/x-raw, 2972
msg: 8: {"partial":"hello one two"}
msg: 5: Send: blob: audio/x-raw, 2972
msg: 8: {"partial":"hello one"}
msg: 8: {"partial":""}
msg: 5: Send: blob: audio/x-raw, 4458
msg: 9: [object Event]
msg: READY FOR SPEECH:
msg: 13: Audio context resumed
msg: 3: Recorder initialized
msg: 2: Media stream created
Method :
Check websocket connection -
ws.readyState;
Send blob -
ws.send(item);
Got response in-
ws.onmessage
Stop streaming -
ws.send('{"eof" : 1}');