WebSocket API for Streaming Transcription

Connecting

To start a new stream, the connection must first be set up. A WebSocket connection starts with a HTTP GET request with header fields Upgrade: websocket and Connection: Upgrade as per RFC6455.

GET /asr/v0.1/stream HTTP/1.1
Host: api.myrtle.ai
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Protocol: stream.asr.api.myrtle.ai
Sec-WebSocket-Version: 13

If all is well, the server will respond in the affirmative.

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Sec-WebSocket-Protocol: stream.asr.api.myrtle.ai

The server will return HTTP/1.1 400 Bad Request if the request is invalid.

Request Parameters

Parameters are query-encoded in the request URL.

Content Type

ParameterRequiredDefault
content_typeYes-

Requests can specify the audio format with the content_type parameter. If the content type is not specified then the server will attempt to infer it. Currently only audio/x-raw is supported.

Supported content types are:

  • audio/x-raw: Unstructured and uncompressed raw audio data. If raw audio is used then additional parameters must be provided by adding:
    • format: The format of audio samples. Only S16LE is currently supported
    • rate: The sample rate of the audio. Only 16000 is currently supported
    • channels: The number of channels. Only 1 channel is currently supported

As a query parameter, this would look like:

content_type=audio/x-raw;format=S16LE;channels=1;rate=16000

Model Identifier

ParameterRequiredDefault
modelNo"general"

Requests can specify a transcription model identifier.

Model Version

ParameterRequiredDefault
versionNo"latest"

Requests can specify the transcription model version. Can be "latest" or a specific version id.

Model Language

ParameterRequiredDefault
langNo"en"

The BCP47 language tag for the speech in the audio.

Max Number of Alternatives

ParameterRequiredDefault
alternativesNo1

The maximum number of alternative transcriptions to provide.

Supported Models

Model idVersionSupported Languages
generalv1en

Request Frames

For audio/x-raw audio, raw audio samples in the format specified in the format parameter should be sent in WebSocket Binary frames without padding. Frames can be any length greater than zero.

A WebSocket Binary frame of length zero is treated as an end-of-stream (EOS) message.

Response Frames

Response frames are sent as WebSocket Text frames containing JSON.

{
  "start": 0.0,
  "end": 2.0,
  "is_provisional": false,
  "alternatives": [
    {
      "transcript": "hello world",
      "confidence": 1.0
    }
  ]
}

Closing the Connection

The client should not close the WebSocket connection, it should send an EOS message and wait for a WebSocket Close frame from the server. Closing the connection before receivng a WebSocket Close frame from the server may cause transcription results to be dropped.

An end-of-stream (EOS) message can be sent by sending a zero-length binary frame.

Errors

If an error occurs, the server will send a WebSocket Close frame, with error details in the body.

Error CodeDetails
400Invalid parameters passed.
503Maximum number of simultaneous connections reached.