当前位置:网站首页>Audio knowledge (I)
Audio knowledge (I)
2022-06-24 16:41:00 【languageX】
I have been exposed to many audio projects , You need to review what you have learned each time . Here is a systematic summary of the previous knowledge points .
This article mainly summarizes the basic knowledge of audio , Terms and some basic mathematical knowledge for subsequent feature extraction .
To learn about audio , First, understand the sound : Sound is a wave produced by the vibration of an object .
Audio Basics
1. Three elements of sound
loudness : The subjective perception of sound intensity by the human ear is called loudness . Loudness is related to the amplitude of the vibration of sound waves .
tone : The feeling of the human ear about the level of sound is called tone .
The tone is mainly related to the frequency of the sound wave . But the tone is not proportional to the frequency , It also relates to the intensity of the sound And waveform .
timbre : It is the human ear to various frequencies 、 The combined response of sound waves of various intensities . The characteristics of sound , And the material of the sound producing object itself 、 Structure is related to .
2. Digital to analog conversion
The sounds heard by the human ear are continuous , This continuous smooth signal is called analog signal . The audio data processed in the computer is a discrete signal , This discontinuous signal is called digital signal . Converting analog signals into digital signals is called Digital to analog conversion , The steps that need to be taken : sampling , quantitative , code .
sampling : Take values on the analog signal according to a certain time interval . For example, it is often said that 16KHZ Audio , Refers to the number of samples per second 16000 A little bit .
quantitative : Quantize the sampled value , Use restrictions A number represents the amplitude signal . Usually use bit Work unit . such as 16bit The audio quantization level is 16 position , Value range -32768,32767, Altogether 65536 It's worth .
code : Record in a certain format sampling and quantitative Later data . Generally speaking, audio raw data refers to pulse code modulation (PCM) data . The encoded binary data is a digital signal .
3. The term
Sampling rate : sampling frequency , How many points are sampled every second . Commonly used 16KHZ,44.1KHZ.
Sample size : Each sampling point pair bit Count . Commonly used 16bit,24bit. The channel number : The signal generates several sets of acoustic data at a time . Commonly used mono , Two channel .
Bit rate : Also called bit rate , It refers to the number of... Transmitted per second bit Count . Unit is bps(Bit Per Second), Higher bit rate , The more data is transferred per second , The better the sound quality .
Rate calculation formula : Bit rate = Sampling rate * Sample size * Track number
pcm Encoding vs. file size (M): Bit rate *1000* Number of seconds /1024/1024/8
Signal Basics
1. Continuous signal , Discrete signal
Continuous signal x(t) At intervals T Sample evenly , Then the discrete signal is obtained x(nT), Finally, by quantifying bit It means to get s digital signal .
Signals are also divided into periodic signals and aperiodic signals . The following figure shows the aperiodic continuous signal , Periodic continuous signal , Aperiodic discrete signal , Periodic discrete signal .
2. Fourier analysis
Fourier said : Any continuous periodic signal can be composed of a set of suitable sinusoids . So why use a sine curve ? Because sine wave is a description of frequency domain ,
The only waveform in the frequency domain . Fourier analysis is a method to transform signals in time domain and frequency domain .
(https://zhuanlan.zhihu.com/p/19759362 This article explains the analysis of Fourier transform very well )
2.1 Fourier series (Fourier Series)
According to Fourier , The periodic signal f(t) Express by a series of sine functions : And here t Time , A It means amplitude , w Is the angular frequency \omega=2\pi/T, \psi
\ It's the first phase
Then through trigonometric function formula sin(\alpha \pm \beta) = sin\alpha coa\beta \pm cos\alpha sin\beta You can convert the formula to
Make a_n = A_nsin\psi , b_n=A_ncos\psi , So the formula (1) Just switch to
According to the trigonometric function, the integral in a period is 0, And the orthogonality of trigonometric functions , We can solve A_0,a_n,b_n
In order to unify the form , remember a_0=2A_0, cycle T=2\pi , For the formula 4 Perform down transform
The formula 5 That's the Fourier series formula ~
2.2 Fourier transformation (Fourier Transform)
Fourier series are in trigonometric form , Let's change it to exponential form .
Through Euler's formula e^{i\theta} = cos(\theta) + isin(\theta) You can calculate that
Put the formula 6 Into the 5 Medium f(t)
And then 5 in a_n,b_n,a_0 Bring in the formula
We make N Tend to \infty , that \omega=\frac{2\pi}{N} , Make \omega_x=\frac{2\pi}{N}n=\omega n
Re order F(ωt) by f(t) The Fourier transform of
You can put the formula 8 Transformation for
According to the definition above , step \omega=\frac{2\pi}{N} , According to the Riemann sum expression of the integral ( Integration can be seen as dividing a curve into very small intervals and then summing them )
Then the formula can be changed to
Final order T\rightarrow N
f(t)=\frac{1}{2\pi}\int^{+\infty}_{-\infty}F(\omega_t)e^{i\omega_xt} d\omega_x\tag{12}
The formula 12 and 9 It is the formula of Fourier transform ~
2.3 Discrete Fourier transform (Discrete Fourier Transform)
Fourier transform is an integral form calculated on a continuous signal . In the computer , What we get is a discrete signal , So you have to go through DFT.
DFT Yes, it will FT The integral of is converted into a summation form ,FT Inside is the order step \omega_t\rightarrow 0 , We put \omega_x=n\frac{2\pi}{N} Bring it to the formula 10
Make T\rightarrow N , Yes 13 and 9 Make changes , obtain DFT Variation formula
2.4 fast fourier transform (FFT)
DFT And FFT It's actually doing the same thing , It's just FFT yes DFT A fast algorithm .
We're going to calculate DFT, Every F(n) , So the time complexity is O(n2), however FFT The time complexity of just O(nlog2n).
2.5 Discrete cosine transform (DCT)
DCT Is in the Fourier series expansion , If the expanded function is Real even function , Then the Fourier series contains only the cosine term , And then discretize it (DFT) The cosine transform can be derived , So it is called discrete cosine transform (DCT).DCT yes DFT A subset of .
Discrete cosine transform is actually a discrete Fourier transform that produces a new signal after certain processing of the original signal . The transformation process from the original signal to the new signal is shown in the figure below .
The original signal is first transformed symmetrically , And then pan 1/2 Get a new signal after units . If the original signal is used as f(x), So the new signal is g(x)=f(x-\frac{1}{2})+f(-x-\frac{1}{2})
Go straight up DCT The formula :
inverse transformation
I'd like to introduce you here today , We will continue to introduce the audio MFCC Feature extraction and code implementation .
Reference article :
https://zhuanlan.zhihu.com/p/75521342
https://blog.csdn.net/qq_39546227/article/details/99686160
边栏推荐
- Introduction of thread pool and sharing of practice cases
- What is a server
- Cloud + community [play with Tencent cloud] video solicitation activity winners announced
- [tke] whether to configure SNAT when the container accesses services outside the node
- Istio FAQ: sidecar stop sequence
- Cause analysis of the failure of web page live broadcast on demand RTMP streaming platform easydss streaming live broadcast
- AI video structured intelligent security platform easycvr intelligent security monitoring scheme for protecting community residents
- Virtual machine virtual disk recovery case tutorial
- What does the router pin mean?
- Memo list: useful commands for ffmpeg command line tools
猜你喜欢
MySQL進階系列:鎖-InnoDB中鎖的情况
Cognition and difference of service number, subscription number, applet and enterprise number (enterprise wechat)
[go] concurrent programming channel
A survey on model compression for natural language processing (NLP model compression overview)
[leetcode108] convert an ordered array into a binary search tree (medium order traversal)
Ps\ai and other design software pondering notes
Applet wxss
A survey on dynamic neural networks for natural language processing, University of California
Some adventurer hybrid versions with potential safety hazards will be recalled
Advanced programmers must know and master. This article explains in detail the principle of MySQL master-slave synchronization
随机推荐
Tencent on the other hand, I was puzzled by the "horse race" problem
Where is the most formal and safe account opening for speculation futures? How to open a futures account?
Kubernetes 1.20.5 setting up Sentinel
Is Shanjin futures safe? What are the procedures for opening futures accounts? How to reduce the futures commission?
[security] graphical CSRF injection of Web Security (II)
[play with Tencent cloud] & lt; trtc-room> Applet component usage
Go path customized project path package dependency
[go] runtime package for concurrent programming and its common methods
Some adventurer hybrid versions with potential safety hazards will be recalled
Istio FAQ: virtualservice route matching sequence
Embedded Software Engineer written interview guide arm system and architecture
FPGA project development: experience sharing of lmk04821 chip project development based on jesd204b (I)
【prometheus】1. Monitoring overview
Video structured intelligent analysis platform easycvr video recording plan function optimization / regularly delete expired videos
mysql时间戳格式转换日期格式字符串
[tke] modify the cluster corendns service address
A troubleshooting of golang memory leak
[tke] nodelocaldnschache is used in IPVS forwarding mode
Bitwise Operators
API documents are simple and beautiful. It only needs three steps to open