当前位置:网站首页>Ffmpeg notes (I) fundamentals of audio and video
Ffmpeg notes (I) fundamentals of audio and video
2022-07-25 05:42:00 【Hello,C++!】
1 Basic concepts of image
1.1 Pixels
Pixel is the basic unit of a picture pix It's English words picture Abbreviation Add English words Elements element” Got it. pixel” abbreviation px therefore Pixels Yes The meaning of image elements .2500 × 2000 The picture of means that there are horizontal 2500 Pixels , Vertical 2000 Pixels , The total is 500 Ten thousand pixels , Also known as 500 10 megapixel photo .
1.2 The resolution of the
Resolution refers to the size or size of the image . such as 1920 x 1080 A picture of is a picture with a horizontal width 1920 Pixels , Vertical height 1080 Pixels .
Common resolution :360P(640 x 360)、720P(1280x720) 、 1080P(1920x1080) 、 4K(3840x2160) 、 8K(7680x4320) etc. .
Used to say 1080 and 720 Actually speaking of vertical pixels . According to width : High for 16:9 Calculate the proportion of ,720p The number of horizontal pixels of is 720 ÷ 9 × 16 = 1280, The total pixels are 921600 The pixels are approximately 92 Mega pixels . 1080p The horizontal pixel of is 1080 ÷ 9 × 16 = 1920, The total pixels are about 200 Mega pixels , yes 720p Of 2 More than double . The more pixels, the clearer the video , therefore 1080p Than 720p The video is clearer . The higher the resolution of the image , The clearer the image .


Provide an article explaining better resolution :
https://segmentfault.com/a/1190000023769775
1.3 A deep
Bit Bit depth is also called “ Bit resolution ”(Bit resolution), Represents the number of binary bits contained in an image .
1 Bit depth can only display single bit information in a picture , So the graphics can only have pure black and white colors .8 A deep (2 Of 8 Power ) It means that there is 256 A combination of grayscale or color .16 A deep (2 Of 16 Power ) Can express 65 536 Two possible color combinations .24 Bit depth can express about 1670 Thousands of different colors .
Because the eyes of ordinary people can only distinguish about 1200~1400 Thousands of different colors and shades , therefore 24 Bit color is also called “ photo ” Color or true color . Usually ,24 Bit color channels are allocated 8 Bit data , in other words : red , green , blue , Each of these three primary colors can have 256 Change . That is to say, it can be used 3 Byte representation 24 A color .(8x3=24)
Computers are storing R、G、B when , All of them adopt a 8bit The storage space of , So computers can express 256256256=16,777,216 = 1677 Ten thousand colors . Different values represent colors .
The greater the bit depth of each channel , The larger the color value that can be represented , For example, high-end TV now says 10bit color , That is, each channel uses 10bit Express , Each channel has 1024 Color . 102410241024, about 10,7374 Ten thousand colors =10 Billion colors , yes 8bit Of 64 times .
1.4 Frame rate
stay 1 Number of frames of pictures transmitted in seconds . It can also be understood that the graphics processor can refresh several times per second , such as 25 fps It means that one second has 25 A picture .
FPS The higher the frame rate, the smoother the video picture , The lower, the more stuck .
Because the visual image temporarily stays in the retina , Generally, the image frame rate can reach
24 frame , We think that the image is continuous .
The higher the frame rate , The smoother the picture , The higher the performance of the equipment required .
1.5 Bit rate
The data flow used by video files per unit time . such as 1 Mbps .
In most cases, the higher the bit rate The higher the resolution , The clearer it becomes . But fuzzy video file size ( Bit rate ) It can be very big , A video file with a small resolution may also be clearer than a video file with a large resolution .
For the same original image source , The same coding algorithm , The higher the bit rate , The smaller the distortion of the image , The clearer the video will be .
1.6 Stride
Refers to the space occupied by each row of pixels in memory . In order to realize memory alignment, the space occupied by each row of pixels in memory Does not necessarily Is the width of the image .
Stride Is the name of these extensions , Stride Also known as Pitch , If there is extended content at the end of each row of pixels in the image , Stride The value of must be greater than the width of the image , As shown in the figure below :
Alignment issues :
Like resolution 638x480 Of RGB24 Images , If we want to With 16 Byte alignment is 6383/16=119.625 Not divisible , So we can't 16 Byte alignment , We need to fill in the end of each line 6 Bytes . Namely (638+2 -->640), 6403/16=120 . At this time, the stride by 1920 byte .
Like resolution 638x480 Of YUV420P Images , When we process the memory, if we want to 16 Byte alignment , be 638 Can not be 16 to be divisible by , We need to fill in the end of each line 2 Bytes . Namely 640 . At this time, the Y stride by 640 byte .
2 YUV Knowledge aggregation
2.1 YUV Definition
YUV Of "Y" Component represents brightness ( That's the gray scale value )、"UV" Component represents chromaticity . among “u” Blue hue ,“v” Reddish hue .
YUV Set brightness Y and UV The benefits of expressing separately :
1、 Avoid interfering with each other , Rely solely on Y You can also display a black-and-white picture completely , It solves the compatibility problem between black-and-white TV and color TV .
2、 Reduce chromaticity (UV) Sampling rate It will not affect the image quality too much It reduces the bandwidth of video signal transmission bandwidth The requirements of . It can be done by UV The sampling frequency of is modified to reduce the bandwidth , Save network traffic , Indirectly reduce the video delay problem .
2.2 YUV The format of
YUV Is a relatively general term For its specific arrangement It can be divided into many specific formats :
1、 pack packed Format : take Every pixel point Y 、 U 、 V Components are arranged crosswise And the pixels are continuously stored in the same array Usually several adjacent pixels form a macro pixel macro pixel
Packaging mode , Generally speaking, different small quantities are packed together as large quantities .
2、 Plane planar Format : Use three arrays to store them separately and continuously Y 、 U 、 V The three components namely Y 、 U 、 V Stored in their respective arrays .
Flat pattern , Popular understanding is to Y、U、V The components are tiled separately .
2.3 YUV sampling
Main sampling methods :
1、YUV 4:4:4 sampling , Indicates that the chroma channel has no down sampling , That is, a Y The component corresponds to a U Component and one V component 

2、YUV 4:2:2 sampling , Express 2:1 Sampling at the same level , No vertical down sampling , That is, every Two Y Components share one U Component and one V component .

3、YUV 4:2:0 sampling , Express 2:1 Sampling at the same level , 2:1 Vertical down sampling , That is, every four Y Components share one U Component and one V component .

2.4 FFMPEG in YUV data storage
1、I444(YUV 444 P) Format : Corresponding Ffmpeg Pixels represent AV_PIX_FMT_ YUV444P, This type is plane mode 
2、I422 (YUV422P) Format
Corresponding Ffmpeg Pixels represent AV_PIX_FMT_YUV422 P, This type is plane mode 
3、4:2:0 Format YUV420P
Corresponding Ffmpeg Pixels represent AV_PIX_FMT_ YUV420 P( That is to say I420), This type is in flat format , Occupy (4+1+1)/4 = 1.5 Bytes 
4、4:2:0 Format NV12
Corresponding Ffmpeg Pixels represent AV_PIX_FMT_NV12, This type is in flat format 
5、 Look at the difference from the memory layout
YV12 And I420 The difference between :
YYYYYYYY VV UU (YV12)
YYYYYYYY UU VV (I420)
You can see U Weight and V The order of components is interchanged .
NV12 and NV21 The difference between :
YYYYYYYY UV UV (NV12)
YYYYYYYY VU VU(NV21)
You can see U and V The positions of the are interchanged .
2.5 RGB and YUV Transformation
Usually RGB and YUV The direct mutual conversion is to call the interface implementation such as Ffmpeg Of swscale perhaps libyuv Such as the library .
YUV(256 Level ) It can be downloaded from 8 position RGB Direct calculation :
Y = 0.299*R + 0.587*G + 0.114*B;
U = 0.169*R 0.331*G + 0.5 *B
V = 0.5 *R 0.419*G 0.081*B;
8bit In the case of bit depth
TV range yes 16 235(Y) 、 16 240(UV) , Also called Limited Range.
PC range yes 0 255 , Also called Full Range.
and RGB No, range Points , Is full of 0 255.
In turn, ,RGB You can also go straight from YUV (256 Level ) Calculation
R = Y + 1.402 (Y-128)
G = Y - 0.34414 (U 128) 0.71414 (U 128)
B = Y + 1.772 (V - 128)
from YUV go to RGB If the value is less than 0 Want to take 0 , If it is greater than 255 Want to take 255.
problem :RGB and YUV Transformation Why is the green screen displayed when decoding errors ?
analysis : Because when decoding fails ,YUV Fill in all the components as 0 value , And then according to the formula :
R = 1.402 * (128) = 126.598
G = 0.34414*( 128) 0.71414*( 128) = 44.04992 + 91.40992 = 135.45984
B = 1.772 * (128) = 126.228
RGB The range of values is [0 255], So the final calculated value is :
R = 0,G = 135.45984,B = 0. At this time only G The component has a value, so it is green .
3 Audio video correlation
3.1 The principle of audio and video recording

3.2 The principle of audio and video playback :

3.3 I、P、B frame



3.4、 Common video compression algorithms
MPEG2 MPEG camp
H264 MPEG camp
H265 MPEG camp
AVS The Chinese Camp
VP8 Google camp
VP9 Google camp
3.5、 Package format
Package format ( It's also called a container ) Is to encode and compress the video stream 、 The audio stream and subtitles are put into a file according to a certain scheme , Easy to play software play .
Generally speaking , The suffix of a video file is its encapsulation format .
The format of encapsulation is different , The suffix is not the same .
such as : The same sink can be made into dumplings or steamed buns . There's also a reason for video , The same audio and video stream can be carried in different containers .

Knowable flow 0 It's video format 、 It uses h264 Compression algorithm , flow 1 Is the audio format 、 It uses mp3 Compression algorithm .
Common video packaging formats :
AVI、MKV、MPE、MPG、MPEG
MP4、WMV、MOV、3GP
M2V、M1V、M4V、OGM
RM、RMS、RMM、RMVB、IFO
SWF、FLV、F4V、
ASF、PMF、XMB、DIVX、PART
DAT、VOB、M2TS、TS、PS
among H264+AAC Encapsulated in the FLV or MP4 It's the most popular model .
3.6、 Audio video synchronization
The concept of audio and video synchronization :
DTS(Decoding Time Stamp): Decoding timestamp , The meaning of this time stamp is to tell the player when to decode the data of this frame .
PTS(Presentation Time Stamp): Display time stamp , This timestamp is used to tell the player when to display the data of this frame .
Audio and video synchronization mode :
Audio Master: Sync video to audio
Video Master: Sync audio to video
External Clock Master: Synchronize audio and video to an external clock .
In general Audio Master > External Clock Master > Video Master
Before the end of the , Provide a test video download website in various formats
https://sample-videos.com/
The website provides download files in various audio and video formats , Aspect test .
边栏推荐
- LeetCode 15:三数之和
- 微服务 - 网关Gateway组件
- Microservice gateway component
- Application of hard coding and streaming integration scheme based on spice protocol in cloud games
- How to start if you want to be a product manager?
- R language uses rowmedians function to calculate the row data median value of all data rows in dataframe
- 线性代数(三)
- Softing pngate series gateway: integrate PROFIBUS bus into PROFINET network
- Arm PWN basic tutorial
- Unity中使用UniRx入门总结
猜你喜欢

Microservice - hystrix fuse

剑指 Offer 05. 替换空格

Differences and application directions of GPS, base station and IP positioning

CSDN编程挑战赛之数组编程问题

HTB-Beep

C编程 --“最大子数组的和” 的动态规划的解法

Microservices and related component concepts

50 places are limited to open | with the news of oceanbase's annual press conference coming!
![Atof(), atoi(), atol() functions [detailed]](/img/5a/a421eab897061c61467c272f122202.jpg)
Atof(), atoi(), atol() functions [detailed]

ThreadLocal
随机推荐
[typescript manual]
typora+PicGo+阿里云OSS 搭建以及报错解决【转载】
Dynamic planning learning notes
Easyrecovery free data recovery tool is easy to operate and restore data with one click
Sword finger offer 05. replace spaces
计算BDP值和wnd值
New discovery of ROS callback function
Basset: learning the regulatory code of the accessible genome with deep convolutional neural network
systemverilog中function和task区别
Leetcode 202. happy number (not happy at all)
PMP Exam is easy to confuse concept discrimination skills! Don't lose points after reading!
Typera+picgo+ Alibaba cloud OSS setup and error reporting solution [reprint]
Big talk · book sharing | Haas Internet of things device cloud integrated development framework
编程大杂烩(二)
Please stop using system The currenttimemillis() statistical code is time-consuming, which is really too low!
Programming hodgepodge (I)
传输线理论之相速、相位等的概念
求求你别再用 System.currentTimeMillis() 统计代码耗时了,真的太 Low 了!
微服务 - 远程调用(Feign组件)
2020ICPC 江西省赛热身赛 E.Robot Sends Red Packets(dfs)