当前位置：网站首页>ByteDance Interviewer: talk about the principle of audio and video synchronization. Can audio and video be absolutely synchronized?

ByteDance Interviewer: talk about the principle of audio and video synchronization. Can audio and video be absolutely synchronized?

2022-06-24 09:44:00 【Cattle within yards】

psychoanalysis ： Audio and video synchronization itself is difficult , In general use ijkplayer The third party synchronizes audio and video . Do not rule out live video Video calls require audio and video synchronization , There are three kinds of Audio control Video shall prevail There are three ways to realize audio and video synchronization based on the custom clock

Job seekers ： If asked Put your mind right , What you can answer is what you can answer . If you read this article, you can definitely answer

The live broadcast system of audio and video is a complex engineering system , To achieve very low latency live , Need complex system engineering optimization and very familiar with the components of the master . Here are some simple and common tuning techniques ：

With fflay Look at the audio and video synchronization process

ffplay The main scheme to synchronize video to audio is , If the video plays too fast , Repeat the previous frame , To wait for audio ; If the video playback is too slow , Then lose the frame to catch up with the audio .

The logic of this part is implemented in the video output function video_refresh in , Before analyzing the code , Let's first review the flow chart of this function ：

In this flow ,“ Calculate the display time of the previous frame ” This step is crucial . Let's look at the code first ：

static void video_refresh(void *opaque, double *remaining_time)
{
    //……
    //lastvp On a frame ,vp The current frame  ,nextvp The next frame 

    last_duration = vp_duration(is, lastvp, vp);// Calculate the duration of the previous frame 
    delay = compute_target_delay(last_duration, is);// Reference resources audio clock Calculate the real duration of the last frame 

    time= av_gettime_relative()/1000000.0;// Take the system time 
    if (time < is->frame_timer + delay) {// If the display duration of the previous frame is not full , Repeat the previous frame 
        *remaining_time = FFMIN(is->frame_timer + delay - time, *remaining_time);
        goto display;
    }

    is->frame_timer += delay;//frame_timer Update to the end of the previous frame , It is also the start time of the current frame 
    if (delay > 0 && time - is->frame_timer > AV_SYNC_THRESHOLD_MAX)
        is->frame_timer = time;// If the deviation from the system time is too large , Then it is corrected to system time 

    // to update video clock
    // Video sync audio doesn't work 
    SDL_LockMutex(is->pictq.mutex);
    if (!isnan(vp->pts))
        update_video_pts(is, vp->pts, vp->pos, vp->serial);
    SDL_UnlockMutex(is->pictq.mutex);

    //……

    // Frame loss logic 
    if (frame_queue_nb_remaining(&is->pictq) > 1) {
        Frame *nextvp = frame_queue_peek_next(&is->pictq);
        duration = vp_duration(is, vp, nextvp);// Display duration of current frame 
        if(time > is->frame_timer + duration){// If the system time is already greater than the current frame , The current frame is discarded 
            is->frame_drops_late++;
            frame_queue_next(&is->pictq);
            goto retry;// Go back to the beginning of the function , Continue to try again ( You can't just while Frame loss , Because it's possible audio clock It's time again , such delay The value needs to be recalculated )
        }
    }
}

The logic of this code is included in the above flowchart . The main idea is mentioned at the beginning if the video is played too fast , Repeat the previous frame , To wait for audio ; If the video playback is too slow , Then lose the frame to catch up with the audio . The way to do this is , Reference resources audio clock, Calculate the previous frame （ The picture on the screen ） It should also show how long （ Including the duration of the frame itself ）, Then compare with the system time , Whether it's time to display the next frame .

Here is a comparison with the system time , Introduced another concept ——frame_timer. It can be understood as frame display time , If before update , Is the display time of the previous frame ; For updated （is->frame_timer += delay）, Displays the time for the current frame .

The last frame shows the time plus delay（ It should also show how long （ Including the duration of the frame itself ）） That is, the time when the display of the previous frame should end . See the following schematic diagram for the specific principle ：

Here are 3 Schematic diagram of two cases ：

time1： The system time is less than lastvp End the displayed time （frame_timer+dealy）, That is, the dotted circle position . You should continue to display lastvp
time2： The system time is greater than lastvp The end of the display time , But less than vp The end of the display time （vp The display time of starts with the dashed circle , End in a black circle ）. At this time, it is not repeated lastvp, And don't throw it away vp, It should display vp
time3： The system time is greater than vp End display time （ Black circle position , It's also nextvp The expected start display time ）. You should discard vp.

delay The calculation of

Then we'll look at the most critical lastvp Display duration of delay How is it calculated .

This is in the function compute_target_delay To realize ：

static double compute_target_delay(double delay, VideoState *is)
{
    double sync_threshold, diff = 0;

    /* update delay to follow master synchronisation source */
    if (get_master_sync_type(is) != AV_SYNC_VIDEO_MASTER) {
        /* if video is slave, we try to correct big delays by
           duplicating or deleting a frame */
        diff = get_clock(&is->vidclk) - get_master_clock(is);

        /* skip or repeat frame. We take into account the
           delay to compute the threshold. I still don't know
           if it is the best guess */
        sync_threshold = FFMAX(AV_SYNC_THRESHOLD_MIN, FFMIN(AV_SYNC_THRESHOLD_MAX, delay));
        if (!isnan(diff) && fabs(diff) < is->max_frame_duration) {
            if (diff <= -sync_threshold)
                delay = FFMAX(0, delay + diff);
            else if (diff >= sync_threshold && delay > AV_SYNC_FRAMEDUP_THRESHOLD)
                delay = delay + diff;
            else if (diff >= sync_threshold)
                delay = 2 * delay;
        }
    }

    av_log(NULL, AV_LOG_TRACE, "video: delay=%0.3f A-V=%f\n",
            delay, -diff);

    return delay;
}

The comments in the above code are all comments of the source code , The code is not long. , Comments account for nearly half , It can be seen that the importance of this code .

The hardest thing to understand in this code is sync_threshold, Draw a picture to help understand ：

The coordinate axis in the figure is diff Value size ,diff by 0 Express video clock And audio clock Exactly the same , Perfect sync . Color block at the bottom of the drawing , Represents the value to return , Color block value delay Refers to the passed in parameter , Combined with the code in the previous section , namely lastvp Display duration of .

It can be seen from the picture that sync_threshold Is to build an area , There is no need to adjust in this area lastvp Display duration of , Go straight back to delay that will do . That is, it is considered to be quasi synchronous in this area .

If it is less than -sync_threshold, That is, the video playback is slow , Appropriate frame loss is required . Specifically, it returns a maximum of 0 Value . According to the front frame_timer Graph , At least the screen should be updated to vp.

If it is greater than sync_threshold, So the video is playing too fast , Repeat the display as appropriate lastvp. Specifically, return to 2 Times delay, That is to say 2 Times lastvp According to the length , Also is to let lastvp Show another frame .

If not only greater than sync_threshold, And more than AV_SYNC_FRAMEDUP_THRESHOLD, Then the return delay+diff, By specific diff Decide how long to show （ The intent of the code is not very clear here , As I understand , Unified processing is to return 2*delay, perhaps delay+diff that will do , There is no need to distinguish ）

thus , Basically, the process of video synchronization and audio is analyzed , In a brief summary ：

The basic strategy is ： If the video plays too fast , Repeat the previous frame , To wait for audio ;
If the video playback is too slow , Then lose the frame to catch up with the audio .
The implementation of this strategy is ： introduce frame_timer Concept , Mark the display time of the frame and the time when the display should end , Then compare with the system time , Decide whether to repeat or lose frames .
lastvp The time when the display should end , In addition to considering the display time of this frame itself , Consideration should also be given to video clock And audio clock The difference between the .
It's not synchronized all the time , But there is one “ Quasi synchronous ” The difference area of .

If you want to know more Android Development 、 More knowledge points related to audio and video development , You can reply to me by private letter 666 Ready to pick up , There are many records Android Audio and video knowledge points . Finally, please praise and support ！！！

Insert picture description here