本页面单独归档日常积累的技术分享和 Talk Slides,涵盖语音合成、语音识别、音频生成等多个方向,方便查阅。
2025 年
2025-11-03 · Recent Advances in Autoregressive Diffusion Models for Speech Generation扩散模型 自回归 语音生成
关键工作:MELA-TTS, VoxCPM, VibeVoice, Ming-UniAudio
2025-08-21 · DiffRO: Differentiable Reward Optimization for LLM based TTS systemLLM 强化学习 TTS
关键工作:CosyVoice, CosyVoice2
2025-06-26 · Rethinking Tortoise TTS ParadigmZero-Shot 语音合成
关键工作:Tortoise-TTS, XTTS, Index-TTS
2025-04-14 · MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis扩散模型 Transformer Zero-Shot
关键工作:MegaTTS 3, InspireMusic, RALL-E, VALLE-R, VoiceLDM, PeRFlow
2024 年
2024-09-25 · FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications基础模型 工业应用 语义编码
关键工作:FireRedTTS, Semantic-Aware Speech Tokenizer
2024-09-23 · Moshi: A Speech-Text Foundation Model for Real-Time Dialogue多模态 实时对话 基础模型
关键工作:Helium, Mimi, Moshi
2024-07-09 · CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens多语言 Zero-Shot 语义Token
关键工作:CosyVoice
2024-04-01 · NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models扩散模型 编解码器 离散扩散
关键工作:FACodec, NaturalSpeech 3, MaskGIT, Discrete Diffusion
2024-03-06 · MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech移动端 高效推理 Zero-Shot
关键工作:MobileSpeech, SoundStorm, Pheme
2023 年
2023-11-20 · Recent Advances on Unified Model for Voice and Audio Generation (Part 2)音频生成 扩散模型 统一模型
关键工作:AudioLDM 2, Stable Diffusion
2023-11-13 · Recent Advances on Unified Model for Voice and Audio Generation (Part 1)音频生成 统一模型 多模态
关键工作:Speech-X, Make-A-Voice / MVoice, UniAudio
2023-06-01 · Speak, Read and Prompt High-Fidelity Text-to-Speech with Minimal Supervision弱监督 高保真 TTS
关键工作:w2v-BERT, SoundStream, Spear-TTS
2023-01-12 · Text-Based Speech Editing From a Perspective of Mask-Based Generative Models语音编辑 掩码生成 文本驱动
关键工作:SpeechPainter, A3T, MUSE (文生图)
2022 年
2022-11-13 · Voice Cloning C: Advanced Methods声音克隆 检索增强 适配器
关键工作:RetrieverTTS, TTS Adapters
2022-09-15 · Voice Cloning B: Overview on Transfer Learning, Representation Learning and Meta-Learning声音克隆 迁移学习 元学习
关键工作:ProsodySpeech, StyleSpeech, GST, Meta-TTS
2022-05-26 · Voice Cloning A: Recent Advances in Adaptive Text-to-Speech声音克隆 自适应TTS 少样本
关键工作:AdaSpeech, AdaSpeech2, AdaSpeech3, AdaSpeech4
2021 年
2021-07-01 · Lattice: Concepts, Methods and ApplicationsLattice 语言模型 WFST
关键工作:WFST, Lattice, Lattice-Transformer
2020 年
2020-09-17 · An Overview of Hybrid ASR: Acoustic Models, Language Models, and Post-Processing语音识别 声学模型 语言模型
关键工作:LF-MMI, TDNN-F, Lattice Rescoring
2020-07-20 · Personal Work on Speech Recognition (Bachelor/Master)语音识别 说话人自适应 迁移学习
关键工作:Speaker Adaption, Transfer Learning