aiyang
- Special Associate Researcher
- Supervisor of Master's Candidates
- Name (English):Yang Ai
- Name (Pinyin):aiyang
- E-Mail:
- Education Level:Postgraduate (Doctoral)
- Degree:Dr
- Professional Title:Special Associate Researcher
- Alma Mater:图书馆VIP
- Teacher College:School of information Science and Technology
- Discipline:Information and Communication Engineering
- ZipCode:
- PostalAddress:
- Telephone:
- Email:
- Profile
- Research Focus
- Honors & Awards
- Social Affiliations
艾杨,现任图书馆VIP信息科学技术学院电子工程与信息科学系特任副研究员,主要研究方向包括语音合成、语音增强、语音分离、音频编码和音频质量评价等,在语音领域顶刊IEEE TASLP及语音领域顶会ICASSP/Interspeech等上共发表论文40余篇。
教育经历
2012年9月—2016年6月 厦门大学 通信工程专业 学士
2016年9月—2021年6月 图书馆VIP 信息与通信工程专业 博士 (导师:凌震华教授)
科研与学术工作经历
2020年2月—2020年8月 日本国立情报学研究所 联合培养博士生
2021年7月—2022年3月 国防科技大学 讲师
2022年4月—2023年12月 图书馆VIP 博士后研究员
2024年1月至今 图书馆VIP 特任副研究员
科研项目
主持项目
国家自然科学基金委员会,国家自然科学基金青年项目,面向语音生成的抗卷绕相位谱预测,2024-01 至 2026-12,30万元
安徽省科学技术厅,安徽省自然科学基金青年项目,结合相位预测的高质量高效率辅助式语音增强方法研究,2023-09 至 2025-08,8万元
图书馆VIP, 青年创新基金, 高效率高鲁棒的神经网络声码器研究, 2023-01 至 2024-12,9万元
参与项目
科技部,科技部攻关项目子课题,智能语音移植模型和算法工具包研发,2022-01至2024-12,500万元 (排名2/34)
国家自然科学基金委员会,国家自然科学基金联合项目,感知驱动的细粒度语音表征解耦与跨模态可控语音语音合成,2024-01 至 2027-12,260万元 (排名7/21)
中科院, 战略性先导科技专项(C类)课题,多语种语音合成关键技术,2020-01 至 2022-12,1632万元 (排名2/35)
科技部, 国家重点研发计划项目课题,面向冬奥场景的多语种语音处理关键技术,2019-10 至 2022-06,338万元 (3/31)
国家自然科学基金委员会, 国家自然科学基金面上项目,面向语音合成的神经网络声码器研究,2019-01-01 至 2022-12-31, 63万元 (排名7/8)
论文发表
2022年及以后
第一作者+通讯作者论文列表
Yang Ai, Xiao-Hang Jiang, Ye-Xin Lu, Hui-Peng Du, and Zhen-Hua Ling*, “APCodec: A neural audio codec with parallel amplitude and phase spectrum encoding and decoding,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 3256–3269, 2024.
Yang Ai, and Zhen-Hua Ling*, “Low-latency neural speech phase prediction based on parallel estimation architecture and anti-wrapping losses for speech generation tasks,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 2283–2296, 2024.
Yang Ai, and Zhen-Hua Ling*, “APNet: An all-frame-level neural vocoder incorporating direct prediction of amplitude and phase spectra,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 2145–2157, 2023.
Yang Ai*, Zhen-Hua Ling, Wei-Lu Wu and Ang Li, “Denoising-and-dereverberation hierarchical neural vocoder for statistical parametric speech synthesis,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 2036–2048, 2022.
Yang Ai, Ye-Xin Lu and Zhen-Hua Ling*, “Long-frame-shift neural speech phase prediction with spectral continuity enhancement and interpolation error compensation,” IEEE Signal Processing Letters, vol. 30, pp. 1097-1101, 2023.
Yang Ai, Ye-Xin Lu, Xiao-Hang Jiang, Zheng-Yan Sheng, Rui-Chen Zheng, and Zhen-Hua Ling*, “A low-bitrate neural audio codec framework with bandwidth reduction and recovery for high-sampling-rate waveforms,” in Proc. Interspeech, 2024, pp. 1765-1769.
Yang Ai, and Zhen-Hua Ling*, “Neural speech phase prediction based on parallel estimation architecture and anti-wrapping losses,” in Proc. ICASSP, 2023, pp. 1-5.
Rui-Chen Zheng, Yang Ai*, Zhen-Hua Ling, “Speech reconstruction from silent lip and tongue articulation by diffusion models and text-guided pseudo target generation,” in Proc. ACM MM, 2024, pp. 6559-6568.
Ye-Xin Lu, Yang Ai*, Zheng-Yan Sheng, and Zhen-Hua Ling, “Multi-stage speech bandwidth extension with flexible sampling rates control,” in Proc. Interspeech, 2024, pp. 2270-2274.
Hui-Peng Du, Ye-Xin Lu, Yang Ai*, and Zhen-Hua Ling, “BiVocoder: A bidirectional neural vocoder integrating feature extraction and waveform generation,” in Proc. Interspeech, 2024, pp. 3894-3898.
Fei Liu, Yang Ai*, Hui-Peng Du, Ye-Xin Lu, Rui-Chen Zheng, and Zhen-Hua Ling, “Stage-wise and prior-aware neural speech phase prediction,” in Proc. SLT, 2024, pp. 648-654.
Xiao-Hang Jiang, Yang Ai*, Rui-Chen Zheng, Hui-Peng Du, Ye-Xin Lu, and Zhen-Hua Ling, “MDCTCodec: A lightweight MDCT-based neural audio codec towards high sampling rate and low bitrate scenarios,” in Proc. SLT, 2024, pp. 550-557.
Yu-Fei Shi, Yang Ai*, Ye-Xin Lu, Hui-Peng Du, and Zhen-Hua Ling, “Pitch-and-spectrum-aware singing quality assessment with bias correction and model fusion,” in Proc. SLT, 2024, pp. 821-827.
Hui-Peng Du, Yang Ai*, Rui-Chen Zheng, and Zhen-Hua Ling, “APCodec+: A spectrum-coding-based high-fidelity and high-compression-rate neural audio codec with staged training paradigm,” accepted by ISCSLP, 2024.
Yu-Fei Shi, Ye-Xin Lu, Yang Ai*, Hui-Peng Du, and Zhen-Hua Ling, “SAMOS: A neural MOS prediction model leveraging semantic representations and acoustic features,” accepted by ISCSLP, 2024.
Xiao-Hang Jiang, Hui-Peng Du, Yang Ai*, Ye-Xin Lu, and Zhen-Hua Ling, “ESTVocoder: An excitation-spectral-transformed neural vocoder conditioned on mel spectrogram,” accepted by NCMMSC, 2024.
Hui-Peng Du, Ye-Xin Lu, Yang Ai*, and Zhen-Hua Ling, “A neural denoising vocoder for clean waveform generation from noisy mel-spectrogram based on amplitude and phase predictions,” accepted by NCMMSC, 2024.
Rui-Chen Zheng, Yang Ai*, and Zhen-Hua Ling, “Speech reconstruction from silent tongue and lip articulation by pseudo target generation and domain adversarial training,” in Proc. ICASSP, 2023, pp. 1-5.
Ye-Xin Lu, Yang Ai*, and Zhen-Hua Ling, “MP-SENet: A speech enhancement model with parallel denoising of magnitude and phase spectra,” in Proc. Interspeech, 2023, pp. 3834-3838.
Hui-Peng Du, Ye-Xin Lu, Yang Ai*, and Zhen-Hua Ling, “APNet2: High-quality and high-efficiency neural vocoder with direct prediction of amplitude and phase spectra,” in Proc. NCMMSC, 2023, pp. 66-80.
Ye-Xin Lu, Yang Ai*, and Zhen-Hua Ling, “Source-filter-based generative adversarial neural vocoder for high fidelity speech synthesis,” in Proc. NCMMSC, 2022, pp. 68-80.
其他论文列表
Rui-Chen Zheng, Yang Ai, and Zhen-Hua Ling, “Incorporating ultrasound tongue images for audio-visual speech enhancement,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 1430–1444, 2024.
Kang-Di Mei, Zhao-Ci Liu, Hui-Peng Du, Heng-Yu Li, Yang Ai, Li-Ping Chen, Zhen-Hua Ling, “Considering temporal connection between turns for conversational speech synthesis,” in Proc. ICASSP, 2024, pp. 11426-11430.
Heng-Yu Li, Kang-Di Mei, Zhao-Ci Liu, Yang Ai, Li-Ping Chen, Zhen-Hua Ling, “Refining self-supervised learnt speech representation using brain activations,” in Proc. Interspeech, 2024, pp. 1480-1484.
Yuan Jiang, Shun Bao, Ya-Jun Hu, Li-Juan Liu, Guo-Ping Hu, Yang Ai, and Zhen-Hua Ling, “Online speaker adaptation for WaveNet-based neural vocoders,” in Proc. ICDSP, 2024, pp. 112-117.
Zheng-Yan Sheng, Yang Ai, Yan-Nian Chen, and Zhen-Hua Ling, “Face-driven zero-shot voice conversion with memory-based face-voice alignment,” in Proc. ACM MM, 2023, pp. 8443-8452.
Rui-Chen Zheng, Yang Ai, and Zhen-Hua Ling, “Incorporating ultrasound tongue images for audio-visual speech enhancement through knowledge distillation,” in Proc. Interspeech, 2023, pp. 844-848.
Zheng-Yan Sheng, Yang Ai, and Zhen-Hua Ling, “Zero-shot personalized lip-to-speech synthesis with face image based voice control,” in Proc. ICASSP, 2023, pp. 1-5.
Hao-Chen Wu, Zhu-Hai Li, Lu-Zhen Xu, Zhen-Tao Zhang, Wen-Ting Zhao, Bin Gu, Yang Ai, Ye-Xin Lu, Jie-Zhang, Zhen-Hua Ling and Wu Guo, “The USTC-NERCSLIP system for the track 1.2 of audio deepfake detection (ADD 2023) challenge,” in Proc. IJCAI 2023 Workshop on Deepfake Audio Detection and Analysis, 2023, pp. 119-124.
Hao-Jian Lin, Yang Ai, and Zhen-Hua Ling, “A light CNN with split batch normalization for spoofed speech detection using data augmentation,” in Proc. APSIPA, 1684 – 1689, 2022.
2022年以前
论文列表
Yang Ai and Zhen-Hua Ling, “A neural vocoder with hierarchical generation of amplitude and phase spectra for statistical parametric speech synthesis,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 839–851, 2020.
Yang Ai, Hong-Chuan Wu, and Zhen-Hua Ling, “SampleRNN-based neural vocoder for statistical parametric speech synthesis,” in Proc. ICASSP, 2018, pp. 5659-5663.
Yang Ai, Jing-Xuan Zhang, Liang Chen, and Zhen-Hua Ling, “DNN-based spectral enhancement for neural waveform generators with low-bit quantization,” in Proc. ICASSP, 2019, pp. 7025-7029.
Yang Ai and Zhen-Hua Ling, “Knowledge-and-data-driven amplitude spectrum prediction for hierarchical neural vocoders,” in Proc. Interspeech, 2020, pp. 190-194.
Yang Ai, Xin Wang, Junichi Yamagishi and Zhen-Hua Ling, “Reverberation modeling for source-filter-based neural vocoder,” in Proc. Interspeech, 2020, pp.3560-3564.
Yang Ai, Hao-Yu Li, Xin Wang, Junichi Yamagishi and Zhen-Hua Ling, “Denoising-and-dereverberation hierarchical neural vocoder for robust waveform generation,” in Proc. SLT, 2021, pp. 477-484.
Zhen-Hua Ling, Yang Ai, Yu Gu, and Li-Rong Dai, “Waveform modeling and generation using hierarchical recurrent neural networks for speech bandwidth extension,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 5, pp. 883–894, 2018.
Yuan Jiang, Ya-Jun Hu, Li-Juan Liu, Hong-Chuan Wu, Zhi-Kun Wang, Yang Ai, Zhen-Hua Ling, and Li-Rong Dai, “The USTC system for blizzard challenge 2019,”in Blizzard Challenge Workshop, 2019.
Yuan-Hao Yi, Yang Ai, Zhen-Hua Ling, and Li-Rong Dai, “Singing voice synthesis using deep autoregressive neural networks for acoustic modeling,” in Proc. Interspeech, 2019, pp. 2593–2597.
Qiu-Chen Huang, Yang Ai, and Zhen-Hua Ling, “Online speaker adaptation for WaveNet-based neural vocoders,” in Proc. APSIPA, 2020, pp. 815-820.
Hao-Yu Li, Yang Ai, and Junichi Yamagishi, “Enhancing low-quality voice recordings using disentangled channel factor and neural waveform model,” in Proc. SLT, 2021, pp. 2452-2456.
Chang Liu, Yang Ai, and Zhen-Hua Ling, “Phase spectrum recovery for enhancing low-quality speech captured by laser microphones,” in Proc. ISCSLP, 2021, pp. 1-5.
Kun Shao, Jun-An Yang, Yang Ai, Hui Liu and Yu Zhang, “BDDR: An effective defense against textual backdoor attacks,” Computers & Security, vol. 110, pp. 102433, 2021.
专利发表
已授权专利
艾杨; 凌震华; 基于短时谱一致性的神经网络声码器训练方法, 2024-03-29, 中国, ZL 2020 1 1482467.6
已受理专利
艾杨; 凌震华; 利用抗卷绕损失训练的平行估计架构网络预测相位的方法, 2022-11-25, 中国, 202211489291.6
艾杨; 凌震华; 一种声码器的构建方法、语音合成方法及相关装置, 2023-1-16, 中国, 202310081092.X
艾杨;鲁叶欣;凌震华;一种长帧移语音相位谱预测方法及装置,2023-6-19,中国,202310737506.X
艾杨;盛峥彦;郑瑞晨;鲁叶欣;江晓航;凌震华;一种语音通信系统及方法,2023-11-13,中国,202311498981.2
艾杨;江晓航;郑瑞晨;鲁叶欣;凌震华;音频处理方法、装置、存储介质和电子设备,2024-04-11,中国,202410438079X
鲁叶欣;艾杨;凌震华;语音增强方法及装置,2023-5-17,中国,2023105730480
鲁叶欣;艾杨;杜荟鹏;凌震华;一种语音波形的扩展方法、装置、设备及存储介质,2024-1-10,2024100399941
获奖情况
Interspeech 2024离散语音挑战赛(Discrete Speech Challenge)高采样率声码器赛道冠军 (第一完成人)
2024年声音质量评价挑战赛(VoiceMOS Challenge)赛道2冠军(指导教师,论文通讯作者)
2023年第十八届全国人机语音通讯学术会议最佳论文奖(指导教师,论文通讯作者)
2023年伪造音频检测挑战赛(Audio Deepfake Detection Challenge)赛道1.2冠军
2022年产学研合作创新成果奖二等奖
No content
No content
No content
1.2016.9-2021.6
图书馆VIP | 信息与通信工程 | Dr | 研究生
2.2012.9-2016.6
厦门大学 | 通信工程 | bachelor's degree | 本科
1.2024.1-Now
图书馆VIP | 信息科学技术学院 | 特任副研究员
2.2022.4-2023.12
图书馆VIP | 信息科学技术学院 | 博士后研究员
3.2021.7-2022.3
国防科技大学 | 电子对抗学院 | 讲师
No content
No content