A single endpoint handles everything:
POST https://api.trysasha.ru/v1/speech

Request body

text
string
required
The text to synthesize. Russian and English are supported. Digits, punctuation and symbols are allowed.
voice
string
default:"sasha-lite"
Voice id. See Voices.
model
string
default:"flash"
Model id. Currently a single low-latency model is available.
format
string
default:"mp3"
Audio format: mp3, mp3_high, mp3_low, or pcm.
stream
boolean
default:"false"
Stream the audio as it is generated. See Streaming.
previous_text
string
Text that comes before text. Improves prosody across chunks. Not billed. See Long texts.
next_text
string
Text that comes after text. Improves prosody across chunks. Not billed.

Response

On success you receive the raw audio bytes with Content-Type: audio/mpeg (or audio/wav for pcm), plus these headers:
X-Request-Id
string
Unique id of the request (also visible in your history).
X-Characters-Billed
integer
Number of characters charged for this request.
X-Cost-Rub
string
Cost of this request, in rubles.
X-Balance-Rub
string
Your remaining balance after the charge.

Example

curl -X POST https://api.trysasha.ru/v1/speech \
  -H "Authorization: Bearer $SASHA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Сегодня отличная погода.",
    "voice": "sasha-lite",
    "format": "mp3_high"
  }' \
  --output speech.mp3
Only the text field is billed. previous_text and next_text are free context hints. Text is limited to 10,000 characters per request.