AI Audio Editor — Edit Audio Just by Chatting
The studio work that used to mean ffmpeg commands, a wall of plugins, and a timeline — now it’s one sentence. Clean up, cut, mix, split stems, generate voiceovers, even dub into another language. The AI runs the whole chain; you approve every step.
Drop your audio to start
MP3, WAV, M4A, FLAC, OGG, MP4…
Edit by talking
No timeline, no menus, no plugins to chain. Say what you want — “remove the hum”, “cut the part about pricing” — and the AI figures out which tools to use and runs them.
A whole studio, packed in
A server-side ffmpeg engine and neural models cover cleanup, cutting, EQ, dynamics, pitch, stem separation, voiceovers, and voice-preserving translation — reached just by asking.
Non-destructive, always
Every edit is a new version you can play, A/B, download, or roll back to. Your original is never overwritten, so you can experiment without fear.
The hard, manual way — made one sentence
Every one of these is a real operation people do by hand. The editor packs them all in and runs them for you — you just describe the result.
ffmpeg -i in.wav -af loudnorm=I=-16:TP=-1.5:LRA=11 out.wav
ffmpeg -af silenceremove=start_periods=1:\ stop_periods=-1:stop_duration=0.4:\ stop_threshold=-40dB in.wav out.wav
ffmpeg -af asetrate=44100*0.891,\ aresample=44100,atempo=1.122 in.wav out.wav
pip install demucs demucs --two-stems vocals song.mp3 # …then a GPU, the model weights, the paths
Just say it — here’s what happens
Type it like you’d ask a person. The AI maps your words onto the right tools and shows you the result as a version.
“Remove the background hum and hiss”
Neural cleanup + de-hum
“Make it podcast-ready”
Cleanup → EQ → loudness to −16 LUFS
“Tighten the pauses and cut the filler words”
Silence + filler trim
“Cut the part about pricing”
Finds it in the transcript, ripple-cuts it
“Split this song into stems”
Vocals, drums, bass, guitar, piano, other
“Take the vocals out for karaoke”
Vocal removal
“Dub this into Spanish but keep my voice”
Voice-preserving translation
“Add an intro that says “Welcome to episode 12””
AI voiceover, dropped on the timeline
“Speed it up 1.2× without the chipmunk voice”
Tempo stretch
“Boost the bass and add a little warmth”
Bass boost + EQ
Everything packed in
The full toolset of a pro studio and a stack of AI models — all reachable in one conversation.
Clean up the noise
- Neural noise removal
- De-hum & de-rumble
- De-ess & de-click
- De-plosive & noise gate
- Declip & restore
Cut & arrange
- Trim and split clips
- Ripple-cut a section
- Fade in / out
- Move & combine clips
- Cut by transcript
Tone & dynamics
- Parametric & voice EQ
- Compressor & limiter
- Loudness normalize (LUFS)
- Bass & treble shaping
Time & pitch
- Speed up / slow down
- Tempo stretch — no chipmunk
- Pitch shift ±12 semitones
- Reverse, echo & reverb
Separate stems · AI
- Split into 6 stems
- Isolate vocals, drums, bass…
- Extract or remove one instrument
- Vocal removal for karaoke
Generate & translate · AI
- Text-to-speech voiceovers
- Spoken intros & outros
- Dub into another language
- Keep the original voice
Transcribe and translate — the two big ones
It can read your audio and speak it in another language. That’s what turns an editor into a publishing tool.
Transcribe → edit by meaning
It writes down every word, so you can edit by content instead of hunting timestamps. Say “cut the part about pricing” and it finds it and removes it — and you keep the transcript.
Translate → reach everyone
Dub your recording into another language while keeping your own voice. One podcast, ad, or lesson — every audience, no re-recording, no new talent.
Built for the work you actually do
Whatever you’re making, you describe the result — the AI handles the audio.
Podcasters & creators
- Clean up noise & hum
- Tighten pauses, cut tangents
- Add AI intros & voiceovers
- Localize the whole episode
Localization & marketing
- Dub a VO into another language
- Keep the original voice
- One recording, many markets
Course creators
- Clean up lecture audio
- Trim down to tight lessons
- Dub courses into new languages
Journalists & interviewers
- Transcribe interviews
- Cut to the quote that matters
- Clean up field recordings
Musicians & remixers
- Split songs into stems
- Pull or remove one instrument
- Karaoke & acapella versions
- Pitch & tempo, no artifacts
Teams & business
- Clean up meeting & webinar audio
- Localize announcements & training
- Normalize loudness for delivery
Real workflows, start to finish
Each step is something you say; the editor does the rest and saves a version you can roll back to.
Raw recording → ready to publish
- 1
“Make it podcast-ready”
Denoise → EQ → loudness
- 2
“Tighten the pauses and cut my coughs”
Silence trim + content cuts
- 3
“Add an intro that says “Episode 12 — …””
AI voiceover at the start
- 4
“Export as MP3”
Download
One voiceover → another language
- 1
“Transcribe it so I can proof the script”
Full transcript
- 2
“Dub it into Spanish and keep my voice”
Voice-preserving dub
- 3
“Now do French”
A second dubbed version
Interview → the clip that matters
- 1
“Transcribe the interview”
Searchable transcript
- 2
“Pull the part where she talks about funding”
Finds it, cuts it to a clip
- 3
“Clean up the room noise”
Neural enhancement
Song → karaoke + stems
- 1
“Split this into stems”
Vocals, drums, bass, guitar, piano, other
- 2
“Make a karaoke version”
Vocal removal
- 3
“Download the vocals on their own”
Per-stem export
How it works
- Step 1
Drop your audio
The editor listens, tells you what it is — podcast, voiceover, song — and flags length, levels, and any noise.
- Step 2
Say what you want
Describe the edit in plain words, or tap a suggested action. The AI plans the whole ffmpeg + neural chain and previews each step before it touches your audio.
- Step 3
Play, compare, export
Every change is its own version with a waveform — A/B it against the last, roll it back, then download MP3 or WAV.
Frequently asked
What can the AI audio editor do?
A lot — all by describing it. Clean up audio (neural noise removal, de-hum, de-ess, de-click, de-plosive, noise gate), cut sections and trim silence or filler words, shape tone and loudness (EQ, compression, limiter, bass and treble), add fades, change speed, tempo or pitch, reverse, add echo or reverb, split a song into stems, remove or isolate a single instrument, strip vocals for karaoke, generate spoken intros and voiceovers, and dub a recording into another language.
How is this different from a normal audio editor?
A normal editor gives you the timeline and the plugins and leaves the work to you. This one does the work. You describe the result; the AI plans the chain of operations, runs it, and shows you a version to approve. It is the advanced editor — everything a manual DAW or a wall of ffmpeg commands could do, without you driving the tools.
What powers the editing under the hood?
A server-side ffmpeg engine handles the classic operations — cuts, fades, EQ, loudness, pitch, format conversion — and neural models handle the AI work: speech enhancement, stem separation, and voice-preserving translation. The AI agent decides which to run and in what order; you just say what you want.
Do I need editing skills?
No. There is no timeline or controls to learn — you just chat. The AI decides which tools to use, runs the whole workflow, and shows you the result as a new version you can play.
Can it remove background noise?
Yes. It uses neural speech enhancement to lift voice out of hiss, hum, and room noise, plus targeted fixes like de-hum, de-ess, de-click, de-plosive, and a noise gate — just ask it to “clean it up” or “make it podcast-ready”.
Can it split a song into stems?
Yes. Ask it to split a track and it separates vocals, drums, bass, guitar, piano, and other into individual stems — each with its own player and download. You can also extract or remove a single instrument, or strip the vocals for a karaoke version.
Can it translate or dub my audio?
Yes. Point it at a voice recording and ask for another language, and it produces a dubbed version that keeps the original speaker’s voice — translation and voice preserved together.
Can I undo a change?
Always. Every edit is a separate version you can play, A/B against the previous one, or roll back to. Editing is fully non-destructive, so your original is always intact.
What files can I use?
Drop MP3, WAV, M4A, FLAC, OGG, AAC, or audio from an MP4 — you can also add several clips into one project. When you are done, download the result as MP3 or WAV.
Is it free?
You can start editing right away. AI processing is metered with credits, like the rest of Notevibes’ AI features.
Stop driving the tools. Just say it.
Drop a file and describe the edit. The AI does the hard part — and every change is a version you can undo.
Non-destructive — your original is never touched.