WebSocket API for Streaming Transcription
Connecting
To start a new stream, the connection must first be set up. A WebSocket
connection starts with a HTTP GET
request with header fields Upgrade: websocket
and Connection: Upgrade
as per
RFC6455.
GET /asr/v0.1/stream HTTP/1.1
Host: api.myrtle.ai
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Protocol: stream.asr.api.myrtle.ai
Sec-WebSocket-Version: 13
If all is well, the server will respond in the affirmative.
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Sec-WebSocket-Protocol: stream.asr.api.myrtle.ai
The server will return HTTP/1.1 400 Bad Request
if the request is invalid.
Request Parameters
Parameters are query-encoded in the request URL.
Content Type
Parameter | Required | Default |
---|---|---|
content_type | Yes | - |
Requests can specify the audio format with the content_type
parameter. If the content type is not
specified then the server will attempt to infer it. Currently only audio/x-raw
is supported.
Supported content types are:
audio/x-raw
: Unstructured and uncompressed raw audio data. If raw audio is used then additional parameters must be provided by adding:format
: The format of audio samples. Only S16LE is currently supportedrate
: The sample rate of the audio. Only 16000 is currently supportedchannels
: The number of channels. Only 1 channel is currently supported
As a query parameter, this would look like:
content_type=audio/x-raw;format=S16LE;channels=1;rate=16000
Model Identifier
Parameter | Required | Default |
---|---|---|
model | No | "general" |
Requests can specify a transcription model identifier.
Model Version
Parameter | Required | Default |
---|---|---|
version | No | "latest" |
Requests can specify the transcription model version. Can be "latest"
or a specific version id.
Model Language
Parameter | Required | Default |
---|---|---|
lang | No | "en" |
The BCP47 language tag for the speech in the audio.
Max Number of Alternatives
Parameter | Required | Default |
---|---|---|
alternatives | No | 1 |
The maximum number of alternative transcriptions to provide.
Supported Models
Model id | Version | Supported Languages |
---|---|---|
general | v1 | en |
Request Frames
For audio/x-raw
audio, raw audio samples in the format specified in the
format
parameter should be sent in WebSocket Binary frames without padding.
Frames can be any length greater than zero.
A WebSocket Binary frame of length zero is treated as an end-of-stream (EOS) message.
Response Frames
Response frames are sent as WebSocket Text frames containing JSON.
{
"start": 0.0,
"end": 2.0,
"is_provisional": false,
"alternatives": [
{
"transcript": "hello world",
"confidence": 1.0
}
]
}
Closing the Connection
The client should not close the WebSocket connection, it should send an EOS message and wait for a WebSocket Close frame from the server. Closing the connection before receivng a WebSocket Close frame from the server may cause transcription results to be dropped.
An end-of-stream (EOS) message can be sent by sending a zero-length binary frame.
Errors
If an error occurs, the server will send a WebSocket Close frame, with error details in the body.
Error Code | Details |
---|---|
400 | Invalid parameters passed. |
503 | Maximum number of simultaneous connections reached. |