Back to blog

Sending Telegram Voice Notes Programmatically: The Complete Guide

Jonathan Lis|
telegramtutorialapi

Telegram is one of the few messaging platforms that makes voice notes a first-class citizen in their Bot API. Unlike LinkedIn (which has no public API for voice messages) or WhatsApp (which requires Business API approval), Telegram lets any bot send voice notes to any chat it has access to.

But there are specific format requirements and gotchas that catch most developers. Here's everything you need to know.

The sendVoice Method

Telegram's Bot API provides a dedicated sendVoice method. The endpoint is straightforward:

POST https://api.telegram.org/bot{token}/sendVoice

The key parameters:

| Parameter | Required | Description | |-----------|----------|-------------| | chat_id | Yes | Target chat ID or @username | | voice | Yes | Audio file (OGG/OPUS format) | | caption | No | Voice note caption, 0-1024 characters | | duration | No | Duration in seconds | | parse_mode | No | Markdown or HTML for caption |

Simple enough. The complexity is in the audio format.

OGG/OPUS: The Only Format That Works

Telegram voice notes must be in OGG container format with OPUS audio encoding. This isn't optional — if you send an MP3, WAV, or M4A file via sendVoice, Telegram will either reject it or treat it as a regular audio file (sent via sendAudio instead), which renders completely differently in the chat.

The difference matters:

  • Voice notes (OGG/OPUS via sendVoice): Show as the waveform player with a play button. Single tap to listen. They auto-play sequentially if multiple are queued.
  • Audio files (any format via sendAudio): Show as a file with title and performer metadata. Rendered like a music track, not a personal message.

For programmatic voice notes, you need to ensure your audio pipeline outputs the correct format.

Encoding with FFmpeg

The most reliable way to create OGG/OPUS files is with FFmpeg:

ffmpeg -i input.mp3 -c:a libopus -b:a 64k -ar 48000 -ac 1 output.ogg

Key flags:

  • -c:a libopus — Use the OPUS codec
  • -b:a 64k — 64kbps bitrate (plenty for voice)
  • -ar 48000 — 48kHz sample rate (OPUS standard)
  • -ac 1 — Mono channel

Node.js Example

Here's how to send a voice note using Node.js with the node-telegram-bot-api package:

const TelegramBot = require("node-telegram-bot-api");
const fs = require("fs");

const bot = new TelegramBot("YOUR_BOT_TOKEN");

async function sendVoiceNote(chatId, filePath) {
  const voice = fs.createReadStream(filePath);

  await bot.sendVoice(chatId, voice, {
    caption: "Here's a quick update",
  });
}

sendVoiceNote("123456789", "./message.ogg");

Or using the raw API with fetch:

async function sendVoice(token, chatId, oggBuffer) {
  const form = new FormData();
  form.append("chat_id", chatId);
  form.append("voice", new Blob([oggBuffer]), "voice.ogg");

  const res = await fetch(
    `https://api.telegram.org/bot${token}/sendVoice`,
    { method: "POST", body: form }
  );

  return res.json();
}

Python Example

Using the python-telegram-bot library:

from telegram import Bot
import asyncio

async def send_voice_note(chat_id: str, file_path: str):
    bot = Bot(token="YOUR_BOT_TOKEN")

    with open(file_path, "rb") as voice:
        await bot.send_voice(
            chat_id=chat_id,
            voice=voice,
            caption="Quick voice update"
        )

asyncio.run(send_voice_note("123456789", "./message.ogg"))

Or with raw requests:

import requests

def send_voice(token, chat_id, file_path):
    url = f"https://api.telegram.org/bot{token}/sendVoice"

    with open(file_path, "rb") as f:
        resp = requests.post(url, data={
            "chat_id": chat_id,
        }, files={
            "voice": ("voice.ogg", f, "audio/ogg"),
        })

    return resp.json()

Common Pitfalls

1. Wrong file format. The number one issue. If your TTS provider outputs MP3 or WAV, you must convert to OGG/OPUS before sending. Sending non-OGG audio via sendVoice results in an error or the message being treated as a regular audio file.

2. Missing OPUS codec. OGG is just the container — the audio codec inside must be OPUS, not Vorbis. OGG/Vorbis files (common from some encoders) won't render as voice notes.

3. File size limits. Telegram allows voice notes up to 50MB, but practical voice notes should be much smaller. A 60-second mono OPUS file at 64kbps is roughly 480KB.

4. Bot permissions. Your bot must be a member of the chat (for groups) or the user must have initiated a conversation with the bot (for direct messages). You can't send unsolicited voice notes to arbitrary users.

The Svara Way

With Svara, you skip the format conversion, the Bot API setup, and the OGG/OPUS encoding entirely:

curl -X POST https://svarapi.io/api/v1/voice-notes/send \
  -H "Authorization: Bearer svara_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "platform": "telegram",
    "recipient": "123456789",
    "text": "Hey, wanted to share a quick update on the project.",
    "voice": "professional-female-1",
    "credentials": { "bot_token": "..." }
  }'

Svara generates the audio from your text using AI voices, encodes it to the correct format for the target platform (OGG/OPUS for Telegram, M4A for LinkedIn), and delivers it natively through the platform's API. One endpoint, correct format, native delivery.

No FFmpeg. No format conversion. No platform-specific code. Just the message you want to send and where you want it to go.

Get your API key and send your first Telegram voice note in under a minute.

Ask Svara

Hey! I'm the Svara assistant. Ask me anything about integrating voice notes into your product.

Powered by Svara