当前位置:网站首页>Speech enhancement - spectrum mapping
Speech enhancement - spectrum mapping
2022-06-28 06:24:00 【Salute=】
Catalog
One 、 introduction
The main goal of speech enhancement is to extract pure speech signals from noisy speech signals , In automatic speech recognition 、 The hearing aid has A wide range of applications . Deep speech enhancement methods can be divided into two categories :1) A speech enhancement method based on mapping ; 2) Speech enhancement method based on mask .
Two 、 A speech enhancement method based on mapping
The speech enhancement method based on mapping is divided into different domains ( Time domain / frequency domain ) Handle , It can be divided into two categories :
1) Speech enhancement method based on spectrum mapping : The mapping relationship between noisy speech signal spectrum and clean speech signal spectrum is learned through neural network .
2) End to end speech enhancement methods : The mapping relationship between the time domain waveform of noisy speech signal and the time domain waveform of clean speech signal is learned through neural network .
2.1 Spectrum mapping system model
The spectrum mapping system model is shown in the figure below ,
Speech feature extraction and Time domain reconstruction The specific process is as follows ,
Training phase :
1) Input : The input feature used in this experiment is noisy speech signal Logarithmic amplitude spectrum . It is worth noting that , With reference to the literature [1] Frame expansion technology is adopted , Such as the input 5 Frame log amplitude spectrum data when , The network output is predicted The first 3 Frame log amplitude spectrum data , As shown in the figure below .
2) label : Is the logarithmic amplitude spectrum of a clean speech signal , For example, when entering 5 Frame log amplitude spectrum data , The output is the predicted 3 Frame log amplitude spectrum data .
3) Loss function :MSE Loss function , L Loss = ∥ L ^ − L ∥ 2 2 L_{\text {Loss }}=\|\hat{\mathbf{L}}-\mathbf{L}\|_{2}^{2} LLoss =∥L^−L∥22
remarks : Normalizing the input logarithmic amplitude spectrum can accelerate the convergence of the network , And In this paper, the experimental method is BN Layer normalizes the input features .
3、 ... and 、 experimental analysis
3.1 Experimental data set and parameter setting
Clean voice signals for training :TIMIT-TRAIN in DR1 All clean voice signals ; Clean voice signals used for testing :TIMIT-TEST in DR1 front 10 A clean voice signal ; SNR of synthetic noisy speech signal (dB):[-5, 0, 5, 10]; The noise source used to synthesize noisy speech signals :NoiseX-92 Medium 3 Kind of noise [‘babble’, ‘destroyerengine’, ‘factory1’] .
Parameter setting : Short time Fourier transform length :N_fft = 512, Window length :win_length=512, Window movement :hop_length=128 , Window function :‘hamming’; Training related parameters epoch=30, lr=1e-4, batch_size=16.
3.1 experimental result
3.1.1 Framing parameters (n_expand=3)
Frame expansion parameters n _ e x p a n d = 3 n\_expand=3 n_expand=3, That is, the number of frames input to the network is 2 ∗ n _ e x p a n d + 1 = 7 2*n\_expand+1=7 2∗n_expand+1=7, n _ e x p a n d = 3 n\_expand=3 n_expand=3 At the time of the PESQ Scoring and STOI The values are as follows .


3.1.2 Different framing parameters (n_expand=1, 3, 5, 7)
The influence of frame expansion parameters on the performance of spectrum mapped speech enhancement is discussed :
(1) n_expand=1, 3, 5, 7 when , each snr Under the PESQ Values and STOI value , As shown in the figure below .




【 Conclusion : Under the current experimental conditions ,n_expand=3 The speech enhancement performance of is the best .】
Four 、 reference
[1]An Experimental Study on Speech Enhancement Based on Deep Neural Networks
[2] The blue sky , Peng Chuan , Leeson , Qian Yuxin , Chen Cong , Liu Qiao . be based on RefineNet End to end speech enhancement method [J]. Journal of Automation ,2022,48(02):554-563.
[3] Single channel speech enhancement based on deep learning
[4] Yu Hong, teacher of Ludong University, has a speech enhancement course
[5] Reference code
边栏推荐
- Enum
- High quality domestic stereo codec cjc8988, pin to pin replaces wm8988
- Is it safe to open a stock account? How to open a stock account?
- Some habits of it veterans in the workplace
- @The reason why the Autowired annotation is empty
- Working principle of es9023 audio decoding chip
- Openharmony gnawing paper growth plan -- json-rpc
- 移动广告发展动向:撬动存量,精细营销
- RN7302三相电量检测(基于STM32单片机)
- easyui下拉框选中触发事件
猜你喜欢
How to add live chat in your Shopify store?
三极管驱动无刷电机
mac下安装多个版本php并且进行管理
Apple MDM bypass jailfree bypass MDM configuration lock free
Promotion intégrale et ordre des octets de fin de taille
High quality domestic stereo codec cjc8988, pin to pin replaces wm8988
AutoCAD C polyline small acute angle detection
socke. IO long connection enables push, version control, and real-time active user statistics
JDBC learning (I) -- implementing simple CRUD operations
AutoCAD C# 多段线小锐角检测
随机推荐
OpenSCAP 简介
YYGH-BUG-02
API learning of OpenGL (2007) gltexcoordpointer
使用SSM框架,配置多个数据库连接
sql及list去重操作
How popular are FB and WhatsApp mass messages in 2022?
链表(一)——移除链表元素
How to add live chat in your Shopify store?
freeswitch设置最大呼叫时长
助力涨点 | YOLOv5结合Alpha-IoU
浮动与定位
socke.io長連接實現推送、版本控制、實時活躍用戶量統計
MySQL common functions
Linux Mysql 实现root用户不用密码登录
Linked list (II) - Design linked list
ThreadLocal
报错--解决core-js/modules/es.error.cause.js报错
Sklearn Feature Engineering (summary)
death_ satan/hyperf-validate
D3D11_ Chili_ Tutorial (3): design a bindable/drawable system