当前位置：网站首页>Audio knowledge (I)

Audio knowledge (I)

2022-06-24 16:41:00 【languageX】

I have been exposed to many audio projects , You need to review what you have learned each time . Here is a systematic summary of the previous knowledge points .

This article mainly summarizes the basic knowledge of audio , Terms and some basic mathematical knowledge for subsequent feature extraction .

To learn about audio , First, understand the sound ： Sound is a wave produced by the vibration of an object .

Audio Basics

1. Three elements of sound

loudness ： The subjective perception of sound intensity by the human ear is called loudness . Loudness is related to the amplitude of the vibration of sound waves .

tone ： The feeling of the human ear about the level of sound is called tone .

The tone is mainly related to the frequency of the sound wave . But the tone is not proportional to the frequency , It also relates to the intensity of the sound And waveform .

timbre ： It is the human ear to various frequencies 、 The combined response of sound waves of various intensities . The characteristics of sound , And the material of the sound producing object itself 、 Structure is related to .

2. Digital to analog conversion

The sounds heard by the human ear are continuous , This continuous smooth signal is called analog signal . The audio data processed in the computer is a discrete signal , This discontinuous signal is called digital signal . Converting analog signals into digital signals is called Digital to analog conversion , The steps that need to be taken ： sampling , quantitative , code .

sampling ： Take values on the analog signal according to a certain time interval . For example, it is often said that 16KHZ Audio , Refers to the number of samples per second 16000 A little bit .

quantitative ： Quantize the sampled value , Use restrictions A number represents the amplitude signal . Usually use bit Work unit . such as 16bit The audio quantization level is 16 position , Value range -32768,32767, Altogether 65536 It's worth .

code ： Record in a certain format sampling and quantitative Later data . Generally speaking, audio raw data refers to pulse code modulation (PCM) data . The encoded binary data is a digital signal .

3. The term

Sampling rate ： sampling frequency , How many points are sampled every second . Commonly used 16KHZ,44.1KHZ.

Sample size ： Each sampling point pair bit Count . Commonly used 16bit,24bit. The channel number ： The signal generates several sets of acoustic data at a time . Commonly used mono , Two channel .

Bit rate ： Also called bit rate , It refers to the number of... Transmitted per second bit Count . Unit is bps(Bit Per Second), Higher bit rate , The more data is transferred per second , The better the sound quality .

Rate calculation formula ： Bit rate = Sampling rate * Sample size * Track number

pcm Encoding vs. file size （M）： Bit rate *1000* Number of seconds /1024/1024/8

Signal Basics

1. Continuous signal , Discrete signal

Continuous signal x(t) At intervals T Sample evenly , Then the discrete signal is obtained x(nT), Finally, by quantifying bit It means to get s digital signal .

Signals are also divided into periodic signals and aperiodic signals . The following figure shows the aperiodic continuous signal , Periodic continuous signal , Aperiodic discrete signal , Periodic discrete signal .

The signal

2. Fourier analysis

Fourier said ： Any continuous periodic signal can be composed of a set of suitable sinusoids . So why use a sine curve ？ Because sine wave is a description of frequency domain ,

The only waveform in the frequency domain . Fourier analysis is a method to transform signals in time domain and frequency domain .

（https://zhuanlan.zhihu.com/p/19759362 This article explains the analysis of Fourier transform very well ）

2.1 Fourier series (Fourier Series)

According to Fourier , The periodic signal f(t) Express by a series of sine functions ： And here t Time , A It means amplitude , w Is the angular frequency \omega=2\pi/T, \psi

\ It's the first phase

f(t)=A_0 + \sum^{\infty}_{n=1}{A_nsin(n\omega t+\psi_n)}\tag{1}

Then through trigonometric function formula sin(\alpha \pm \beta) = sin\alpha coa\beta \pm cos\alpha sin\beta You can convert the formula to

f(t)=A_0 + \sum^{\infty}_{n=1}{A_nsin\psi cons(n\omega t)+A_ncos\psi sin(n\omega t)}\tag{2}

Make a_n = A_nsin\psi , b_n=A_ncos\psi , So the formula （1） Just switch to

f(t)=A_0 + \sum^{\infty}_{n=1}{a_n cos(n\omega t)+ b_n sin(n\omega t)}\tag{3}

According to the trigonometric function, the integral in a period is 0, And the orthogonality of trigonometric functions , We can solve A_0,a_n,b_n

\begin{split} &A0 = \frac{1}{2\pi}\int^{\pi}_{-\pi}{f(t)} \\ &an = \frac{1}{\pi}\int^{\pi}_{-\pi}{f(t)cos(n\omega t)dt} \\ &bn = \frac{1}{\pi}\int^{\pi}_{-\pi}{f(t)sin(n\omega t)dt} \\ \end{split} \tag{4}

In order to unify the form , remember a_0=2A_0, cycle T=2\pi , For the formula 4 Perform down transform

\begin{split} &a_0 = \frac{2}{T}\int^{t_0+T}_{t_0}{f(t)} \\ &a_n = \frac{2}{T}\int^{t_0+T}_{t_0}{f(t)cos(n\omega t)dt} \\ &b_n = \frac{2}{T}\int^{t_0+T}_{t_0}{f(t)sin(n\omega t)dt} \\ &f(t)=\frac{a_0}{2}+ \sum^{\infty}_{n=1}{a_n cos(n\omega t)+ b_n sin(n\omega t)}\\ \end{split} \tag{5}

The formula 5 That's the Fourier series formula ～

2.2 Fourier transformation (Fourier Transform)

Fourier series are in trigonometric form , Let's change it to exponential form .

Through Euler's formula e^{i\theta} = cos(\theta) + isin(\theta) You can calculate that

sin(\theta) = -i \frac{e^{i\theta}-e^{-i\theta}}{2}, cos(\theta) = \frac{e^{i\theta}+e^{-i\theta}}{2} \tag{6}

Put the formula 6 Into the 5 Medium f(t)

f(t)=\frac{a_0}{2}+ \sum^{\infty}_{n=1}(\frac{a_n-ib_n}{2}e^{in\omega t}+\frac{a_n+ib_n}{2}e^{-in\omega t}) \tag{7}

And then 5 in a_n,b_n,a_0 Bring in the formula

f(t)=\frac{1}{T}\sum^{+\infty}_{n=-\infty}\int^{t_0+T}_{t_0}f(t)e^{-in\omega t}dte^{in\omega t}\tag{8}

We make N Tend to \infty , that \omega=\frac{2\pi}{N} , Make \omega_x=\frac{2\pi}{N}n=\omega n

f(t)=\frac{1}{T}\sum^{+\infty}_{n=-\infty}\int^{t_0+T}_{t_0}{f(t)e^{-i\omega_xt}dte^{i\omega_xt}}

Re order F（ωt） by f(t) The Fourier transform of

F（\omega_x）=\int^{t_0+T}_{t}f(t)e^{-i\omega_xt}dt \tag{9}

You can put the formula 8 Transformation for

f(t)=\frac{1}{T}\sum^{+\infty}_{n=-\infty}F(\omega_t)e^{i\omega_xt} \tag{10}

According to the definition above , step \omega=\frac{2\pi}{N} , According to the Riemann sum expression of the integral ( Integration can be seen as dividing a curve into very small intervals and then summing them )

\int^{b}_{a}f(t)dt = \sum^{(b-a)/\delta}_{n=0}f(a+n\delta)

Then the formula can be changed to

f(t)=\frac{N}{2\pi T}\int^{+\infty}_{-\infty}F(\omega_t)e^{i\omega_xt} \frac{2\pi}{N}=\frac{N}{2\pi T}\int^{+\infty}_{-\infty}F(\omega_t)e^{i\omega_xt} d\omega_x\tag{11}

Final order T\rightarrow N

f(t)=\frac{1}{2\pi}\int^{+\infty}_{-\infty}F(\omega_t)e^{i\omega_xt} d\omega_x\tag{12}

The formula 12 and 9 It is the formula of Fourier transform ～

2.3 Discrete Fourier transform (Discrete Fourier Transform)

Fourier transform is an integral form calculated on a continuous signal . In the computer , What we get is a discrete signal , So you have to go through DFT.

DFT Yes, it will FT The integral of is converted into a summation form ,FT Inside is the order step \omega_t\rightarrow 0 , We put \omega_x=n\frac{2\pi}{N} Bring it to the formula 10

f(t)=\frac{N}{2\pi T}\sum^{N}_{n=0}F(n)e^{i\frac{2\pi n}{N}t} \frac{2\pi}{N}=\frac{1}{T}\sum^{N}_{n=0}F(n)e^{i\frac{2\pi n}{N}t} \tag{13}

Make T\rightarrow N , Yes 13 and 9 Make changes , obtain DFT Variation formula

\begin{split} &f(t)=\sum^{N}_{n=0}\frac{1}{N}F(n)e^{i\frac{2\pi n}{N}t} \\ &F(n) = \sum^{N}_{n=0}f(t)e^{-i\frac{2\pi n}{N}t} \\ \end{split} \tag{14}

2.4 fast fourier transform （FFT）

DFT And FFT It's actually doing the same thing , It's just FFT yes DFT A fast algorithm .

We're going to calculate DFT, Every F(n) , So the time complexity is O(n2), however FFT The time complexity of just O(nlog2n).

2.5 Discrete cosine transform （DCT）

DCT Is in the Fourier series expansion , If the expanded function is Real even function , Then the Fourier series contains only the cosine term , And then discretize it (DFT) The cosine transform can be derived , So it is called discrete cosine transform (DCT).DCT yes DFT A subset of .

Discrete cosine transform is actually a discrete Fourier transform that produces a new signal after certain processing of the original signal . The transformation process from the original signal to the new signal is shown in the figure below .

The original signal is first transformed symmetrically , And then pan 1/2 Get a new signal after units . If the original signal is used as f(x), So the new signal is g(x)=f(x-\frac{1}{2})+f(-x-\frac{1}{2})

Go straight up DCT The formula ：

X(k)=\sqrt{\frac{2}{N}}\sum^{N-1}_{n=0}{x[n]a_kcos[k\frac{\pi}{N}(n+\frac{1}{2})]}

among a_k=\begin{cases} \frac{1}{\sqrt{2}}& \text{k=0}\\ 1& \text{k!=0} \end{cases}

inverse transformation

X(n)=\sum^{N-1}_{k=0}{X[k]a_kcos[k\frac{\pi}{N}(n+\frac{1}{2})]}

I'd like to introduce you here today , We will continue to introduce the audio MFCC Feature extraction and code implementation .

Reference article ：

https://zhuanlan.zhihu.com/p/75521342

https://blog.csdn.net/qq_39546227/article/details/99686160

原网站

版权声明
本文为[languageX]所创，转载请带上原文链接，感谢
https://yzsam.com/2021/04/20210411000252438C.html