Long texts - Sasha AI

For long content (articles, chapters, books) split the text into parts and synthesize them one by one. To keep the intonation natural across the seams, pass the neighbouring parts in previous_text and next_text. The model uses them for context but only speaks (and bills) the text field.

async function synthesizeChunk(chunk, prev, next) {
  const res = await fetch("https://api.trysasha.ru/v1/speech", {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.SASHA_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      text: chunk,
      voice: "sasha-lite",
      previous_text: prev, // tail of the previous chunk
      next_text: next,     // head of the next chunk
    }),
  });
  if (!res.ok) throw new Error(`Error ${res.status}: ${await res.text()}`);
  return Buffer.from(await res.arrayBuffer());
}

// `chunks` is an array of strings, e.g. split by sentence.
const parts = [];
for (let i = 0; i < chunks.length; i++) {
  parts.push(
    await synthesizeChunk(chunks[i], chunks[i - 1] ?? "", chunks[i + 1] ?? ""),
  );
}
const fullAudio = Buffer.concat(parts); // concatenate the mp3 parts

Tips

Split on sentence boundaries, not mid-word.
Keep each chunk around 1–2 thousand characters.
A few sentences of context in previous_text / next_text is enough.

previous_text and next_text are not billed — only text counts toward your character usage.

Streaming Voices & formats

​Tips

Tips