What is audio cross-correlation and how does it work?

Cross-correlation is a signal processing technique that measures the similarity between two audio signals at different time offsets. By sliding one signal along the other and computing a similarity score at each position, we find the exact point where they match best. This determines the precise time offset needed to align the clean audio with the video.

Does the video quality get affected when replacing audio?

No. The video stream is copied as-is without any re-encoding (using FFmpeg -c:v copy). Only the audio track is replaced. This means zero quality loss for the video and very fast processing.

What if the clean audio was recorded at a different sample rate?

The tool handles different sample rates automatically. During analysis, both audio tracks are resampled to a common rate for comparison. During muxing, FFmpeg transcodes the audio to AAC at 192kbps for maximum compatibility.

How large can the time offset be?

You can search for offsets up to 60 seconds using the sync search window slider. For larger offsets, you may want to manually trim one of the files first. Larger search windows take longer to process.

Is my data private and secure?

Absolutely. All processing happens inside your browser using WebAssembly (FFmpeg.wasm) and the Web Audio API. Your video and audio files are never uploaded to any server. This makes it safe for sensitive content like legal recordings, medical consultations, or private conversations.

What does the confidence score mean?

The confidence score (0-100%) indicates how well the two audio tracks match at the detected offset. Scores above 60% typically indicate reliable alignment. Lower scores may suggest the recordings contain different content or have very different acoustic characteristics.

Audio Sync & Replace

Name: Audio Sync & Replace — How It Works
Uploaded: 2026-03-18
Description: See how Audio Sync & Replace automatically aligns and replaces noisy video audio with a clean recording using cross-correlation, entirely in the browser.

Replace or mix audio in a video with a clean recording. 100% browser-based.

Last updated: 2026-03-18

Video (noisy audio)

Clean Audio

How It Works

Upload Your Files — Select the video with the noisy or unwanted audio track, then select your clean audio file. Supported video formats include MP4, WebM, MOV, and AVI. Audio files can be MP3, WAV, FLAC, AAC, OGG, or any format your browser supports.
Load Waveforms — FFmpeg (running entirely in your browser via WebAssembly) extracts the audio from both files, decodes them into raw PCM data, and renders visual waveforms. The blue waveform represents the original video audio; the green waveform represents the clean replacement audio.
Align the Audio — This is the key step. The recordings may have started at different times, so you need to line them up:
- Auto Align uses FFT-based cross-correlation to detect the optimal offset automatically. It analyzes the audio patterns (peaks, transients) in both tracks and finds the point of maximum similarity.
- Drag to Align — grab the green waveform and drag it left or right for visual alignment. Line up recognizable features like claps, speech starts, or music beats.
- Fine-Tune — use the +/-10ms and +/-100ms buttons or type an exact offset in milliseconds for frame-accurate alignment.
- Search Window — adjust the slider to control how far the auto-align looks for a match. Increase it if the recordings have a large time gap between them.
Choose Audio Mode
- Replace — completely removes the original audio and replaces it with the clean track. The video stream is copied as-is (no re-encoding), so there is zero video quality loss.
- Mix — blends both tracks together at adjustable volume levels. Set the original to 10-20% to retain background ambience while overlaying the clean voice at 100%.
Process & Download — FFmpeg muxes the aligned audio into the video container. The video stream is never re-encoded (stream copy), so the output is fast and lossless. Audio is encoded as AAC at 192 kbps. Download the result as a single file.

Technical Details

FFT Cross-Correlation — the auto-align algorithm downsamples both tracks to 4 kHz, computes a Fast Fourier Transform, multiplies the spectra (with conjugate), and inverse-transforms to find the lag with the highest correlation value. This runs in a dedicated Web Worker to keep the UI responsive.
FFmpeg WebAssembly — a full build of FFmpeg compiled to WebAssembly runs in your browser. No server involved. Files are read into an in-memory virtual filesystem, processed, and the output is returned as a downloadable blob.
Stream Copy — the video track is never decoded or re-encoded. FFmpeg uses -c:v copy to transfer the video bitstream directly, preserving every frame at original quality. Only the audio track is re-encoded (to AAC).
Privacy — everything happens locally. Your video and audio files are never uploaded to any server. The FFmpeg engine and all processing run entirely in your browser.

Use Cases

Content Creators & YouTubers

You record a vlog or tutorial with your camera but the built-in microphone picks up too much background noise. You also record your voice separately on a lapel mic or phone. Use Replace mode to swap the camera audio with the clean mic recording — auto-align handles the sync so your lips match perfectly.

Podcasters & Interviewers

Record a video interview where the guest's audio comes from a separate USB mic while the camera captures room audio. Replace the camera audio with the USB recording for crystal-clear dialogue without echo or room reverb.

Music Videos & Performances

Sync a studio-recorded track to a live performance video. The auto-align detects common audio features (beat transients, vocal onsets) and aligns the studio audio to the video within milliseconds. Use Replace mode for a pure studio sound, or Mix mode to keep a touch of live audience atmosphere.

Presentations & Screencasts

Screen recordings often capture system audio artifacts, fan noise, or keyboard clicks. Record your narration separately with a quality microphone, then replace the screencast audio. The video stream is copied without re-encoding, so screen text stays sharp.

Wedding & Event Videos

Event videographers often use a separate audio recorder placed near the speakers or musicians. Use this tool to replace the camera's distant, echoey audio with the close-up recording. Mix mode at 10-15% original volume lets you keep the crowd reactions and ambient atmosphere while featuring the main audio clearly.

Film & Short Film ADR

Automated Dialogue Replacement (ADR) is standard in filmmaking — actors re-record lines in a studio after filming. This tool lets indie filmmakers align and replace dialogue tracks without expensive NLE software. The millisecond-precision offset controls ensure lip-sync accuracy.

Fixing Wind Noise in Outdoor Footage

Outdoor recordings often suffer from wind noise that no filter can fully remove. If you had a secondary recorder (phone in a pocket, lavalier mic under clothing), use it as the clean source. The auto-align handles the time offset between devices automatically.

Multi-Camera Shoots

When shooting with multiple cameras that each have their own audio, you often want one consistent audio source across all angles. Pick the best audio recording, then replace each camera's audio with it. Auto-align compensates for the different start times of each camera.

Frequently Asked Questions

What video formats are supported?

Any format your browser can decode — typically MP4 (H.264/H.265), WebM (VP8/VP9/AV1), MOV, and AVI. The output keeps the same container format as your input video.

What audio formats can I use as the clean source?

MP3, WAV, FLAC, AAC, OGG, Opus, M4A, WMA, AIFF, and WebM audio. The tool converts everything internally using FFmpeg.

Is there a file size limit?

There is no hard limit. Processing happens in your browser using your device's RAM. Most modern devices handle files up to several hundred megabytes. Very large files (1 GB+) may be slow or cause the browser tab to run out of memory.

Does the video get re-encoded? Will I lose quality?

No. The video stream is copied directly without re-encoding (-c:v copy). Every frame is preserved at its original quality and resolution. Only the audio track is re-encoded to AAC at 192 kbps.

How accurate is the auto-align?

Auto-align uses FFT cross-correlation at 4 kHz sample rate, giving ~0.25ms precision. It works best when both tracks share recognizable audio events (speech, claps, music). The confidence percentage tells you how strong the match is — above 60% is usually reliable.

Can I use this offline?

Yes. After the first visit (which downloads the FFmpeg WebAssembly engine), the tool works entirely offline. No internet connection is needed for processing.

Is my data private?

Completely. Your video and audio files never leave your browser. There are no uploads, no server-side processing, and no data collection. Everything runs locally using WebAssembly.

What does the confidence percentage mean?

It measures how similar the two audio tracks are at the detected offset. Higher values mean the tracks share more common audio features at that alignment point. Low confidence (<30%) may indicate the tracks don't overlap or have very different content — try manual alignment in that case.

Preview

Audio Sync and Replace — replace noisy video audio with clean recording online