How to Translate Any Video Into Any Language for Free With AI (Step-by-Step)

Most creators and businesses are sitting on a back catalog of video that only reaches people who already speak their language. The audience on the other side of a language barrier never even knows the content exists. The old fix — hiring translators, voice actors, and an editor — runs hundreds or thousands of dollars per video, so the catalog just stays in one language.

It doesn’t have to. Video translation — transcribing a video, translating the script, and voicing it over in a new language — can now be done end to end with free, open tools. We recently translated a full series of hour-long videos into Russian this way: accurate translation, natural voiceover, original picture untouched underneath, and the whole toolchain cost nothing.

This is the complete, copy-and-paste guide: the exact tools, the exact commands, and how to run all of it from one terminal with Claude Code — in any language pair, without being a developer.

Why Free Video Translation Changes the Math

When translation is expensive, you ration it. You localize one flagship video, measure, and hesitate. When it’s free, the calculation flips: you can translate your entire library and let new-language audiences find you.

The upside is real. A single teaching video, course module, sermon, or product explainer that you already produced becomes a second, third, or tenth asset working for a completely new market. For a creator reaching a global audience, a business expanding into a new region, or an organization serving immigrant communities, this is the difference between “someday” and “this week.” The only thing that ever made it hard was cost and complexity — and AI removes both.

The Free Toolchain (One-Time Setup)

Every tool here is free and, apart from the model that does the translating, runs entirely on your own machine. On a Mac, install the whole stack in two commands:

brew install yt-dlp ffmpeg whisper-cpp
pip install edge-tts

Here’s what each piece does:

  • yt-dlp — downloads the source video (skip it if the file is already yours).
  • whisper.cpp — runs OpenAI’s Whisper speech-recognition model locally to transcribe the audio with timestamps. Free, offline, no upload.
  • A language model — translates the transcript and checks it for accuracy. We drive this through Claude Code.
  • edge-tts — generates natural neural speech using Microsoft’s free voices, in dozens of languages. No API key, no cost.
  • FFmpeg — extracts the audio, fits the new voice track to the original timing, and muxes everything into a finished video.

Notice what’s *not* on the list: no paid translation platform, no per-minute transcription bill, no voice-cloning subscription. The media pipeline is 100% free.

The Step-by-Step Process

Below is the full process with the actual commands. Run them in a working folder with video/, audio/, and srt/ subfolders. (And remember — the whole point, which we’ll get to, is that you can have Claude Code run every one of these for you.)

Step 1: Get the Video and Pull the Audio

Download the source (or use your own file), then extract the clean 16kHz mono audio Whisper expects:

yt-dlp -f "bestvideo[height<=720][ext=mp4]+bestaudio/best" --merge-output-format mp4 -o "video/01.%(ext)s" "PASTE_VIDEO_URL"
ffmpeg -y -i video/01.mp4 -ar 16000 -ac 1 audio/01.wav

Step 2: Transcribe the Original

Run Whisper over the audio to produce a timestamped subtitle file. The large-v3-turbo model transcribes an hour of speech in a few minutes on a normal laptop, with excellent accuracy:

whisper-cli -m models/ggml-large-v3-turbo.bin -f audio/01.wav -l en -osrt -of srt/01.en

You now have srt/01.en.srt — the full spoken script, broken into time-stamped lines.

Step 3: Translate — With the Right Instructions

This is where quality is won or lost. Generic machine translation flattens nuance. Instead, you give the model clear rules: the tone, the formality, the terminology, and any specialized vocabulary your content depends on. A strong prompt looks like this:

Translate this SRT from English to Spanish. Keep every timestamp and line
number identical. Use a warm, conversational, formal-address tone. Keep these
terms consistent: [your brand names, product terms, jargon]. Where the script
quotes a source, match the standard published translation word-for-word.
Return the same SRT structure, only the text translated.

The translation comes back lined up one-to-one with the original, ready to voice.

Step 4: Review Until It’s Actually Right

Never ship a first-draft translation. Run a review loop: ask the model to compare each translated line against the source, flag any mistranslation, terminology slip, or tonal drift, propose a fix, and re-check — repeating until a full pass comes back clean. On a 1,000-line video this catches real errors a single pass misses, including sentences where the meaning was accidentally inverted. Only after the translation is clean does any audio get generated.

Step 5: Generate the Voiceover (in Any Language)

Feed each translated line to edge-tts. First, find a voice in your target language:

edge-tts --list-voices | grep es-      # Spanish voices (or fr-, de-, ru-, ko-, ar-…)

Then synthesize a clip for a line:

edge-tts --voice es-ES-AlvaroNeural --text "Tu frase traducida aquí" --write-media seg_0001.mp3

Loop that over every line in the translated SRT and you have one natural-sounding clip per sentence — at no cost.

Step 6: Rebuild the Timeline So It Stays in Sync

This is the step amateurs get wrong. Translated speech is rarely the same length as the original, so if you just play the clips back-to-back, the voiceover drifts further and further out of sync until the ending is half a minute off.

The fix: anchor every clip to its true timestamp from the subtitle file, and gently speed up any clip that runs long so it fits its slot — FFmpeg’s atempo filter changes speed without changing pitch:

ffmpeg -y -i seg_0001.mp3 -filter:a "atempo=1.15" -ar 24000 -ac 1 seg_0001_fit.wav

Then place each fitted clip onto a silent timeline the exact length of the video, at its real start time. The result is a voiceover track exactly as long as the video, locked to the original timing from first word to last. This bit of bookkeeping is where a short script earns its keep — and as you’ll see next, you don’t have to write it.

Step 7: Mix and Export

FFmpeg lays the new voice over the original, ducking the source audio low (so the original is still faintly audible underneath) and copying the untouched picture on top:

ffmpeg -y -i video/01.mp4 -i dub.wav -filter_complex "[0:a]volume=0.12[bg];[1:a]volume=1.0[dub];[bg][dub]amix=inputs=2:duration=first[a]" -map 0:v -map "[a]" -c:v copy -c:a aac -b:a 192k -shortest out/01.translated.mp4

Finally, export to H.264 MP4 for maximum compatibility — it plays everywhere and uploads cleanly to any platform:

ffmpeg -y -i out/01.translated.mp4 -c:v libx264 -crf 21 -c:a copy out/01.final.mp4

You now have a finished, shareable, translated video.

It Works for Any Language Pair

Nothing in this process is hardwired to one language. Whisper transcribes 90+ languages. The translation step is just instructions to a model, which handles virtually any pair. And edge-tts ships natural neural voices across dozens of languages and accents — Spanish, French, Arabic, Mandarin, Korean, and more.

To translate into a different language, you change exactly two things: the target language in your translation prompt (Step 3) and the --voice in Step 5. Same commands, same zero cost. Once the workflow exists, your marginal cost to localize into a new language is essentially your time.

You Don’t Have to Run a Single Command Yourself

Here’s the part that surprises people. Every step above involves a command, a script, or a file format — and you don’t have to touch any of them.

We ran this entire project inside Claude Code in the terminal by giving instructions in plain English: “Download these videos, transcribe them, translate to Russian with these terminology rules, review the translation until it’s clean, then voice it over and keep the audio in sync.” Claude Code wrote the scripts, ran the tools, caught its own bugs, and produced the finished files. When something broke mid-run, it diagnosed and fixed it without us opening the documentation.

That’s the real unlock. The free toolchain has existed for a while, but stitching it together used to require genuine technical comfort. With an AI agent in the terminal, the technical layer becomes a conversation: you describe the outcome, it handles the how. If you can write an email describing what you want, you can run this.

Bringing It Into Your Marketing

For the businesses and creators we work with, free video translation isn’t a novelty — it’s leverage. Your existing video becomes multilingual inventory. Your reach stops being capped by language. And because the cost is essentially zero, you can test new-language audiences without a budget conversation.

If you’d like help building this into a repeatable localization workflow for your content — or you’d rather hand off your whole catalog and get translated videos back — let’s talk. This is exactly the kind of high-leverage, low-cost system we love putting to work.

The tools are free. The barrier was never money — it was knowing how. Now you do.