StepFun AI Launches Open-Source Audio Editing Model Step-Audio-EditX for a New Audio Editing Experience

AIbase基地

Published inAI News · 4 min read · Nov 10, 2025

StepFun AI recently released its open-source audio editing model Step-Audio-EditX, an innovative 3B parameter model that makes audio editing as direct and controllable as text editing. By converting audio signal editing tasks into token-level operations, Step-Audio-EditX makes expressive voice editing much simpler.

Currently, most zero-shot text-to-speech (TTS) systems have limited control over emotion, style, accent, and tone. Although they can generate natural speech, they often fail to precisely meet user needs. Previous research has attempted to separate these factors through additional encoders and complex architectures, while Step-Audio-EditX achieves control by adjusting data and training objectives.

Step-Audio-EditX uses a dual-codebook tokenizer, mapping speech into two token streams: one language stream recorded at 16.7Hz, and another semantic stream recorded at 25Hz. The model is trained on a mixed corpus of text and audio tokens, allowing it to handle both text and audio tokens simultaneously.

The key to the model is the use of large-margin learning, where the subsequent training phase enhances the model's performance using synthesized large-margin triplets and quadruplets. With high-quality data from approximately 60,000 speakers, the model shows excellent performance in emotional and stylistic editing. In addition, the model uses human ratings and preference data for reinforcement learning to improve the naturalness and accuracy of speech generation.

To evaluate the model's effectiveness, the research team introduced the Step-Audio-Edit-Test benchmark, using Gemini2.5Pro as the evaluation tool. Test results showed significant improvements in the accuracy of emotional and speaking style editing after multiple rounds of editing. In addition, Step-Audio-EditX can effectively enhance the audio quality of other closed-source TTS systems, bringing new possibilities to audio editing research.

Paper: https://arxiv.org/abs/2511.03601

Key Points:
🎤 **StepFun AI launches the Step-Audio-EditX model, making audio editing easier.**
📈 **The model uses large-margin learning to improve the accuracy of emotional and stylistic editing.**
🔍 **Introduces the Step-Audio-Edit-Test benchmark, significantly improving audio quality evaluation.**

Hume AI Voice Conversion Feature Launches - Capture Your Perfect Voice Soul in One Go

Hume AI's new 'Voice Conversion' feature enables users to transfer their vocal rhythm, pronunciation, and intonation to any target voice with just one recording. Now available in Creator Studio and API, it shifts voice AI from robotic speech to emotional expression, unlocking creative possibilities.....

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Deployment Calculator

AI Dataset Collection

Intelligent Document Recognition

StepFun AI Launches Open-Source Audio Editing Model Step-Audio-EditX for a New Audio Editing Experience

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Editing audio like editing a Word document? StepXenon releases the 3 billion parameter audio editing model Step-Audio-EditX

Step-Audio-EditX Launch: 3 Billion Parameter Audio LLM Opens the Era of Voice Editing

50 Million USD in Seed Funding! Stanford Professor Founded Inception to Challenge GPT-5 with a Diffusion-based Large Model, Code Generation Speed Exceeds 1000 Token/Second

Inception Returns to the AI Track, $50 Million in Funding Drives the Rise of a New Model

Hume AI Voice Conversion Feature Launches - Capture Your Perfect Voice Soul in One Go

AMD CEO Reveals: Multiple OpenAI-Level Clients Competing to Purchase AI Chips

Robots Enter Their ChatGPT Moment! Generalist Releases GEN-0, 270,000 Hours of Real Data Gives Rise to the First Embodied Intelligence Scaling Law

Robotaxi soars 836%! The world's first Robotaxi company successfully returns to the Hong Kong stock market

Google AI Launches DS STAR: A Multi-Agent Data Science System for End-to-End Analysis

OpenAI CFO Responds to Market Concerns: No Plan to Go Public, AI Bubble Theory is Exaggerated

GEO Services