当前位置:网站首页>Technical practice and development trend of video conference all in one machine
Technical practice and development trend of video conference all in one machine
2022-06-25 10:59:00 【Advanced audio and video development】
author | Waylon Spike Hummer audio lab Algorithm expert
Under the normal trend of Mixed Office , The efficiency of remote communication and cooperation is very important . However , At present, there are still many problems affecting communication in teleconference , Such as the lack of meeting room pickup and playback equipment 、 Software and hardware devices are incompatible 、 Unable to hear clearly due to far-field pickup , These questions will kill the patience of the participants , Affect the effect of the meeting , Let the team gradually lose the passion of discussion .
therefore , Whether it is Microsoft abroad 、Zoom, Or domestic nails 、 Tencent Conference , Are building their own hardware terminal ecology , It is expected to solve online problems through hardware 、 The pickup problem in offline mixed office , Like a microphone 、 Audio and video all-in-one machine 、 Conference Board, etc . But even so , One of the most common phenomena in offline meetings , Still can't hear clearly or even . The key to solving this problem , Is to solve the problem of far-field pickup .
actually , Since the last century 80 s , Far field pickup is a pain point in industry and a difficulty in academia , The difficulties mainly come from three aspects of audio problems : reverberation 、 noise 、 Echoes , Which remove “ reverberation ” It has been listed as “ One of the ten unsolved engineering problems in contemporary times ”.
at present , There is no mature in the industry 、 Mass production solutions . Based on this , The nail buzzer Audio Lab has developed a differential microphone array algorithm , And take the lead in F2 A single machine is realized in the video conference all-in-one machine 10 The breakthrough of m far field pickup , And this technical solution , Modular splitting is possible , Share with hardware manufacturers , To improve the ability of their hardware devices to pick up sound or video .
1 What are the technical difficulties to be overcome for far-field pickup ?
The audio and video industry often says “no video, we talk; no audio, we walk”, mean , Audio is more important than video in audio and video conference , But audio has always been a weak point .
In large and medium-sized meeting scenarios , Such as business meetings 、 Reporting meetings, etc , The physical distance of the meeting room will cause the attenuation of sound energy .
To solve this problem , The mainstream products on the market before were mainly split equipment , By deploying multiple microphones to pick up sound at the conference table . The video conference all-in-one machine needs to realize single machine far-field pickup , Overcome long-distance transmission 、 reverberation 、 noise 、 Technical difficulties such as echo , So that participants can better hear and be heard , Express yourself in every meeting 、 Communicate fully .
1、 Long distance transmission
When communicating in a large conference room, I can't hear each other clearly , Only “ Hello, hello. ” Repeated confirmation , Sometimes I have to go to the equipment , Confirm whether the communication is normal .
In fact, in this scenario , Communication links are often normal , The problem is that the pickup quality of the equipment is not high 、 It is caused by the distance between people and equipment . The attenuation of sound energy is proportional to the square of the propagation distance , relative 1 Pickup energy at meters ,4 It will decay to 1/16、10 Attenuation occurs at meters 100 times . The physical attenuation of long-distance sound will cause some components of the target speech to disappear in the spectrum . therefore , Once the distance is far away , The target signal in the original microphone signal will be covered by the noise at a closer distance .
2、 reverberation (reverberation)
We sometimes hear each other's voices when we are in a meeting and feel very muddy , Like from a distant valley , This is the problem caused by reverberation .
Reverberation occurs in a confined space , The sound received by the receiving end is transmitted through multiple channels , The multi-channel transmission caused by the reflection of the wall surface , The reflection is divided into low-order reflection and high-order reflection , Early reverberation and late reverberation are formed respectively . These reverberations have two obvious subjective auditory effects on people (perceptual effect):
The box effect (box effect): Feel the sound coming from all directions , Let the listener seem to be in a box (“inside a box”), It sounds cloudy and uncomfortable .
Long distance speaker effect (distant talker effect): Feel the sound coming from far away , Even farther than the actual distance .
2 The exploration and application of the nailed Hummer laboratory in the far field
In the process of far-field pickup or far-field voice interaction , In recent years, microphone array technology has played an indispensable role .
The microphone array technology developed by the laboratory is the first practice in the industry to combine the microphone acoustic characteristics with the advantages of the differential beam theory , The white noise gain of differential beam in low frequency band is significantly improved , Thus, the robustness of low frequency speech pickup is obviously improved , bring F2 The speech quality of far-field pickup is significantly improved .
F2 Microphone array technology mainly includes differential beamforming technology (differential beamforming) And multi-channel de reverberation algorithm .
1、 Differential directional microphone array beamforming technology
Beamforming (beamforming) From radar antenna technology - Sensor array , Pictured above , In the field of Communications , Beamforming can bring more signal coverage to the base station . alike , In recent years, microphone array technology has played an indispensable role , Beamforming based on microphone array forms a spatial filter in space , A pickup beam is ready-made in the direct direction of the target sound , The speech of the submerged target direction is losslessly recovered from other interference signals .
Differential microphone array technology (DMA,differential microphone array) Or differential beamforming (differential beamforming), Because it has more physical characteristics , Especially suitable for speech signal processing , In recent years, it has become a research hotspot in the field of signal processing , It is also widely used in industry .
About the differential microphone array , The pinned hummingbird laboratory is the first time in the industry to integrate and optimize the microphone acoustic characteristics and the differential beam theory , A self-developed differential directional microphone array is proposed (differential directional microphone array), The pain point problem in the technical field is obviously improved : Robustness of speech low frequency pickup , The white noise gain of differential beam in low frequency band is significantly improved 20db.
The research work of the laboratory was published in the form of a series of papers in INTERSPEECH、ICASSP Wait for the international voice summit , Recognized by peer review ( See the paper at the end of the article list). Independent tests show that , Whether in objective tests - Speech recognition accuracy and subjective test - Sound quality evaluation , Its far-field pickup performance is leading in the industry :
The far-field speech recognition accuracy is higher than that of the industry benchmark competitors 7~9 percentage , The sound quality and definition surpass all the world famous brands that can be found in the market ; Nail audio and video all-in-one machine F2 Is another landing product of the theory .
2、 Multi channel de reverberation technology
At present, most speech de reverberation algorithms can be divided into three categories : Spectral enhancement (spectral enhancement), Indirect inverse filtering (indirect inverse filtering), Direct inverse filtering (direct inverse filtering).
Spectral enhancement (spectral enhancement) A real or complex number mask is often used in the reverberation speech spectrum (mask), Treat reverberation as noise and suppress it , But this method has limited performance and brings some distortion , Because reverberation is not an additive noise .
Indirect inverse filtering (indirect inverse filtering) The propagation function between the sound source and the receiver is often required , This method can perfectly reverberate , But in practical applications , These propagation functions are not available .
Direct inverse filtering (direct inverse filtering) Reverberation prediction often depends on the microphone array signal itself rather than the propagation function , Suitable for practical application . The most widely used direct inverse filter in the industry (direct inverse filtering) The method is based on multi-channel linear prediction (MCLP:multichannel linear prediction).
The laboratory is based on MCLP Algorithms continue to be studied , The research reproduced the latest research results , stay F2 It has solved the problems of MCLP Many practical problems in : The computational complexity of more microphones , Performance degradation with fewer microphones , The accuracy explosion of filter , It has basically formed its own robust multi-channel de reverberation algorithm with low complexity and high performance .
3 How will the video conferencing hardware industry develop ?
What is the essence of video conferencing hardware ? At the same time 、 The collaboration efficiency of multiple people in different spaces is higher . In the beginning, remote interaction only needs email 、 The telephone can satisfy , With the continuous development of Technology , People began to pursue more immersive real-time audio and video interactive experience , The hardware provides a more professional polar microphone 、 HD camera and rich interfaces , The integrated software and hardware solution provides a higher quality guarantee for the conference .
We believe that video conferencing hardware will develop in two directions with the deepening of the industry : First, it is highly integrated 、 Second, intelligence . High integration takes into account the performance 、 Aesthetics and ease of use , This will become an important indicator in the future enterprise products ; And intelligence is the general trend of the software and hardware industry , Technology makes the pickup more accurate , Noise reduction is more intelligent , Let audio and video hardware better serve all kinds of work 、 Life scene .
nailing F2 It is the first single machine in China 10 Migao's all-in-one video conference machine for audio and video experience , Based on software and hardware algorithms 、AI Breakthroughs in technology and engineering design , It realizes stand-alone 10 M clear polar 、 Smart Guide ( Close up of the speaker )、 Two person split screen layout 、4K HD image quality and other features , Meet the meeting demand of online and offline mixed office , Greatly improve the efficiency and immersion in large and medium-sized meeting scenes .
Before a product goes on the market , Must go through a certain range of applications or tests , nailing F2 No exception . Nail Conference Rooms The product team once took our audio scientists all over the conference rooms of Alibaba group , To record various sizes 、 Test data of conference rooms with different structures , So as to improve the robustness of the product .
Ali has a culture of inviting enterprises to create new products ,F2 In order to further verify the suitability of user needs and scenarios , Often apply to sit directly in the customer meeting room to listen , Observe whether the user's application of the equipment conforms to the initial design idea 、 Have you had any problems 、 Is there any new demand .
In terms of technical capacity enhancement , For challenging scenarios , We may consider adding directional polar in the next step 、 Intelligent sound screen and other functions . for example , When the device is used in a noisy environment , Opening the intelligent voice screen can make the voice of the target speaker in a specific area more clearly picked up , So that participants can communicate more easily in the complex acoustic environment .
In the enterprise ,80% The meeting may be offline 、20% It's an online conference . We have been exploring how to realize the digitalization of offline meetings , For example, role-based meeting minutes , Sound source location is used here 、 Voiceprint recognition and other technologies .
F2 The positioning of is a hardware carrier , It's a container . We will use the audio module 、 Audio and video module 、 Board module, whole machine integration and other cooperation methods , Open products nailed in the field of audio and video to hardware manufacturers 、 Technology and algorithm , Help partners build a combination of software and hardware 、 Online and offline mixed conference experience .
Based on far field pickup 、 Breakthroughs in audio and video technologies such as intelligent noise reduction , The integration of hardware and software products and an open digital platform , Nailing can help users better digitize online and offline meetings , And become the assets of the enterprise .
attach : Related papers on self-developed microphone array published at the international summit by nailing hummingbird laboratory :
1.Weilong Huang,Jinwei Feng, ‘Minimum-Norm Differential Beamforming for Linear Array with Directional Microphones’,Interspeech 2021;
2.Weilong Huang,Jinwei Feng, ‘Differential Beamforming for Uniform Circular Array with Directional Microphones’, Interspeech 2020
3.Cheng Xue, Weilong Huang, Weiguang Chen, Jinwei Feng, ‘Real-time Multi-channel Speech Enhancement Based on Neural Network Masking with Attention Model’, Interspeech 2021;
4.ShiLiang Zhang, Siqi Zheng, Weilong Huang, Ming Lei, Hongbin Suo, Jinwei Feng and Zhijie Yan, ‘Investigation of Spatial-Acoustic Features for Overlapping Speech Detection in Multiparty Meetings’, Interspeech 2021;
5.Siqi Zheng, Weilong Huang, Xianliang Wang, Hongbin Suo, Jinwei Feng, Zhijie Yan, ‘A real-time speaker diarization system based on spatial spectrum’, ICASSP 2021;
6.Weiguang Chen (intern), Cheng Xue(intern), Xionghu Zhong“Cramer-Rao Lower Bound for DOA Estimation with an Array of Directional ´ Microphones in Reverberant Environments”; InterSpeech 2021
7.Fan Yu, .., Weilong Huang, etc“M2MET: THE ICASSP 2022 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE”, ICASSP 2022
8.Fan Yu, .., Weilong Huang, etc“ Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge”, ICASSP2022
9.Pengyu Wang, Feifei Xiong, Zhongfu Ye and Jinwei Feng, “Joint Estimation of Direction-Of-Arrival and Distance for Arrays with Directional Sensors Based on Sparse Bayesian Learning”, Accepted for Publication at Inter-Speech 2022
10.Feifei Xiong, Weiguang Chen, Pengyu Wang, Xiaofei Li and Jinwei Feng, “Spectro-Temporal SubNet for Real-Time Monaural Speech Denoising and Dereverberation”, Accepted for Publication at Inter-Speech 2022
边栏推荐
- Learn to learn self-study [learning to learn itself is more important than learning anything]
- 1-7Vmware中的快照与克隆
- Software testing to avoid being dismissed during the probation period
- Google Earth engine (GEE) - evaluate enables one click batch download of all single images in the research area (some areas in Shanghai)
- Growth: how to think deeply and learn
- [dynamic planning] - Digital triangle
- [observation] objectscale: redefining the next generation of object storage, reconstruction and innovation of Dell Technology
- Unreal Engine graphics and text notes: use VAT (vertex animation texture) to make Houdini end on Houdini special effect (ue4/ue5)
- 网络协议学习---LLDP协议学习
- TASK03|概率论
猜你喜欢
Floating window --- create an activity floating window (can be dragged)
Use of Siemens plcs7-200 (I) -- Introduction to development environment and configuration software
Oracle彻底卸载的完整步骤
看完这篇 教你玩转渗透测试靶机Vulnhub——DriftingBlues-7
[file containing vulnerability-03] six ways to exploit file containing vulnerabilities
Flask blog practice - archiving and labeling of sidebar articles
Nuxtjs actual combat case
Houdini graphic notes: could not create OpenCL device of type (houdini_ocl_devicetype) problem solving
CSRF attack
1-7Vmware中的快照与克隆
随机推荐
性能之内存篇
Multiple environment variables
Advanced single chip microcomputer -- development of PCB (2)
Complete steps for a complete Oracle uninstall
性能之文件系统篇
[image fusion] image fusion based on morphological analysis and sparse representation with matlab code
Daily Mathematics Series 52: February 20
MCU development -- face recognition application based on esp32-cam
【图像融合】基于形态学分析结合稀疏表征实现图像融合附matlab代码
[RPC] i/o model - Rector mode of bio, NiO, AIO and NiO
After reading this article, I will teach you to play with the penetration test target vulnhub - drivetingblues-7
I have summarized the knowledge points of JS [intermediate and advanced] for you
CDN+COS搭建图床超详细步骤
Floating window --- create an activity floating window (can be dragged)
单片机进阶---PCB开发之照葫芦画瓢(二)
Think about it
[paper reading | deep reading] line: large scale information network embedding
On binary tree
Kotlin implements a simple login page
Learn to learn self-study [learning to learn itself is more important than learning anything]