Audio Sync & Replace
Replace or mix audio in a video with a clean recording. 100% browser-based.
Last updated: 2026-03-18
How It Works
- Upload Your Files — Select the video with the noisy or unwanted audio track, then select your clean audio file. Supported video formats include MP4, WebM, MOV, and AVI. Audio files can be MP3, WAV, FLAC, AAC, OGG, or any format your browser supports.
- Load Waveforms — FFmpeg (running entirely in your browser via WebAssembly) extracts the audio from both files, decodes them into raw PCM data, and renders visual waveforms. The blue waveform represents the original video audio; the green waveform represents the clean replacement audio.
- Align the Audio — This is the key step. The recordings may have started at different times, so you need to line them up:
- Auto Align uses FFT-based cross-correlation to detect the optimal offset automatically. It analyzes the audio patterns (peaks, transients) in both tracks and finds the point of maximum similarity.
- Drag to Align — grab the green waveform and drag it left or right for visual alignment. Line up recognizable features like claps, speech starts, or music beats.
- Fine-Tune — use the +/-10ms and +/-100ms buttons or type an exact offset in milliseconds for frame-accurate alignment.
- Search Window — adjust the slider to control how far the auto-align looks for a match. Increase it if the recordings have a large time gap between them.
- Choose Audio Mode
- Replace — completely removes the original audio and replaces it with the clean track. The video stream is copied as-is (no re-encoding), so there is zero video quality loss.
- Mix — blends both tracks together at adjustable volume levels. Set the original to 10-20% to retain background ambience while overlaying the clean voice at 100%.
- Process & Download — FFmpeg muxes the aligned audio into the video container. The video stream is never re-encoded (stream copy), so the output is fast and lossless. Audio is encoded as AAC at 192 kbps. Download the result as a single file.
Technical Details
- FFT Cross-Correlation — the auto-align algorithm downsamples both tracks to 4 kHz, computes a Fast Fourier Transform, multiplies the spectra (with conjugate), and inverse-transforms to find the lag with the highest correlation value. This runs in a dedicated Web Worker to keep the UI responsive.
- FFmpeg WebAssembly — a full build of FFmpeg compiled to WebAssembly runs in your browser. No server involved. Files are read into an in-memory virtual filesystem, processed, and the output is returned as a downloadable blob.
- Stream Copy — the video track is never decoded or re-encoded. FFmpeg uses
-c:v copyto transfer the video bitstream directly, preserving every frame at original quality. Only the audio track is re-encoded (to AAC). - Privacy — everything happens locally. Your video and audio files are never uploaded to any server. The FFmpeg engine and all processing run entirely in your browser.
Use Cases
Content Creators & YouTubers
You record a vlog or tutorial with your camera but the built-in microphone picks up too much background noise. You also record your voice separately on a lapel mic or phone. Use Replace mode to swap the camera audio with the clean mic recording — auto-align handles the sync so your lips match perfectly.
Podcasters & Interviewers
Record a video interview where the guest's audio comes from a separate USB mic while the camera captures room audio. Replace the camera audio with the USB recording for crystal-clear dialogue without echo or room reverb.
Music Videos & Performances
Sync a studio-recorded track to a live performance video. The auto-align detects common audio features (beat transients, vocal onsets) and aligns the studio audio to the video within milliseconds. Use Replace mode for a pure studio sound, or Mix mode to keep a touch of live audience atmosphere.
Presentations & Screencasts
Screen recordings often capture system audio artifacts, fan noise, or keyboard clicks. Record your narration separately with a quality microphone, then replace the screencast audio. The video stream is copied without re-encoding, so screen text stays sharp.
Wedding & Event Videos
Event videographers often use a separate audio recorder placed near the speakers or musicians. Use this tool to replace the camera's distant, echoey audio with the close-up recording. Mix mode at 10-15% original volume lets you keep the crowd reactions and ambient atmosphere while featuring the main audio clearly.
Film & Short Film ADR
Automated Dialogue Replacement (ADR) is standard in filmmaking — actors re-record lines in a studio after filming. This tool lets indie filmmakers align and replace dialogue tracks without expensive NLE software. The millisecond-precision offset controls ensure lip-sync accuracy.
Fixing Wind Noise in Outdoor Footage
Outdoor recordings often suffer from wind noise that no filter can fully remove. If you had a secondary recorder (phone in a pocket, lavalier mic under clothing), use it as the clean source. The auto-align handles the time offset between devices automatically.
Multi-Camera Shoots
When shooting with multiple cameras that each have their own audio, you often want one consistent audio source across all angles. Pick the best audio recording, then replace each camera's audio with it. Auto-align compensates for the different start times of each camera.
Frequently Asked Questions
What video formats are supported?
Any format your browser can decode — typically MP4 (H.264/H.265), WebM (VP8/VP9/AV1), MOV, and AVI. The output keeps the same container format as your input video.
What audio formats can I use as the clean source?
MP3, WAV, FLAC, AAC, OGG, Opus, M4A, WMA, AIFF, and WebM audio. The tool converts everything internally using FFmpeg.
Is there a file size limit?
There is no hard limit. Processing happens in your browser using your device's RAM. Most modern devices handle files up to several hundred megabytes. Very large files (1 GB+) may be slow or cause the browser tab to run out of memory.
Does the video get re-encoded? Will I lose quality?
No. The video stream is copied directly without re-encoding (-c:v copy). Every frame is preserved at its original quality and resolution. Only the audio track is re-encoded to AAC at 192 kbps.
How accurate is the auto-align?
Auto-align uses FFT cross-correlation at 4 kHz sample rate, giving ~0.25ms precision. It works best when both tracks share recognizable audio events (speech, claps, music). The confidence percentage tells you how strong the match is — above 60% is usually reliable.
Can I use this offline?
Yes. After the first visit (which downloads the FFmpeg WebAssembly engine), the tool works entirely offline. No internet connection is needed for processing.
Is my data private?
Completely. Your video and audio files never leave your browser. There are no uploads, no server-side processing, and no data collection. Everything runs locally using WebAssembly.
What does the confidence percentage mean?
It measures how similar the two audio tracks are at the detected offset. Higher values mean the tracks share more common audio features at that alignment point. Low confidence (<30%) may indicate the tracks don't overlap or have very different content — try manual alignment in that case.
Preview
