Synthesis - Sasha AI

A single endpoint handles everything:

POST https://api.trysasha.ru/v1/speech

Request body

text

string

required

The text to synthesize. Russian and English are supported. Digits, punctuation and symbols are allowed.

voice

string

default:"sasha-lite"

Voice id. See Voices.

model

string

default:"flash"

Model id. Currently a single low-latency model is available.

format

string

default:"mp3"

Audio format: mp3, mp3_high, mp3_low, or pcm.

stream

boolean

default:"false"

Stream the audio as it is generated. See Streaming.

previous_text

string

Text that comes before text. Improves prosody across chunks. Not billed. See Long texts.

next_text

string

Text that comes after text. Improves prosody across chunks. Not billed.

Response

On success you receive the raw audio bytes with Content-Type: audio/mpeg (or audio/wav for pcm), plus these headers:

X-Request-Id

string

Unique id of the request (also visible in your history).

X-Characters-Billed

integer

Number of characters charged for this request.

X-Cost-Rub

string

Cost of this request, in rubles.

X-Balance-Rub

string

Your remaining balance after the charge.

Example

curl -X POST https://api.trysasha.ru/v1/speech \
  -H "Authorization: Bearer $SASHA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Сегодня отличная погода.",
    "voice": "sasha-lite",
    "format": "mp3_high"
  }' \
  --output speech.mp3

Only the text field is billed. previous_text and next_text are free context hints. Text is limited to 10,000 characters per request.

Authentication Streaming

​Request body

​Response

​Example

Request body

Response

Example