当前位置:网站首页>Decipher the AI black technology behind sports: figure skating action recognition, multi-mode video classification and wonderful clip editing
Decipher the AI black technology behind sports: figure skating action recognition, multi-mode video classification and wonderful clip editing
2022-06-26 05:05:00 【Paddlepaddle】
lately , All major video platforms update the highlights of wonderful moments in the Winter Olympic Games in real time , Gu ailing 、 Wu Dajing 、 Su Yiming and other athletes have made great achievements , Gratifying congratulations ! Moved by the strong strength of Chinese sports 、 At the same time of joy , We also pay attention to some behind sports AI Industrial application , For example, through the action recognition technology to assist athletes in daily training and competition scoring , Using intelligent classification and automatic editing AI Technology greatly reduces the labor and time cost of sports video content processing .
In order to let everyone know more about these AI Application of technology in industry , Reduce AI Landing threshold , Baidu Flying propeller 、 Baidu intelligent cloud 、 Associate Professor Liu Shenglan of Dalian University of technology jointly launched industrial practice examples , stay Figure skating movement recognition 、 Multimodal sports video classification 、 Football video clips Three classic scenes , Provides data preparation from , The project design , The whole process tutorial of model optimization deployment , Explain the industrial landing plan in simple terms , Teach users to practice code hand in hand .
Project links
https://github.com/PaddlePaddle/awesome-DeepLearning
All source code and tutorials have been open source , Welcome to use ,star encourage ~
Deep learning technology empowers sports events
Three typical examples
1、 Figure skating movement recognition
The movement track of figure skating is very complex 、 Fast 、 There are many categories , This is a great challenge for the recognition task . In this example, the human motion recognition algorithm based on the key points of human skeleton is introduced for the first time ST-GCN( Spatiotemporal graph convolution network model ), Applied to figure skating action recognition , Sure Recognize the technical actions of figure skaters in the video in real time and add labels to classify them , Do auxiliary scoring and movement quality evaluation in the process of competition and training .
Scene difficulty
In figure skating, it is difficult to judge the type of action by the figure posture in one or several frames ;
Figure skating belongs to the same category 、 The two action categories of different sub categories only have slight differences in a few frames , Discrimination is extremely difficult . However , The features of other frames must also be preserved , So as to be used for category identification and “ Polysemy frame ” Deal with situations such as .
For example, figure skating has jumping 、 rotate 、 A lift 、 Pace and twist 、 Swallow step , Jumping is one of the most important action elements , There are many kinds of ice blade methods and air rotation cycles used by players in take-off and landing , Therefore, a variety of combinations can be produced , This increases the difficulty of classification .
To solve the above problems , What is the thinking of technical scheme selection ? This example selects ST-GCN, Based on the published papers, the network structure is improved , It provides a novel idea to solve the problem of human action recognition based on the key points of human skeleton , It has also achieved great performance improvement . The following figure shows the... Built in this project ST-GCN Network structure chart .
Final adoption of amendment batch_size、num_classes Parameters , You can achieve 91% The accuracy of the .
2、 Multimodal sports video classification
In recent days, , All kinds of ice and snow sports videos have attracted extensive attention . In order to extract users' real interest points and high-level semantic information , Enterprises need to check the text of the video 、 Audio 、 Image multi-modal data multi angle understanding . Flying propeller Joint Baidu cloud to bring multimodal classification tasks , Give the video multiple labels describing the content , Used for content selection 、 Launch and other recommended system scenarios , It can be said to be the gospel of cultural and entertainment media workers .
Scene difficulty
Video tags have high-level semantic features , Unimodal features are difficult to express , High quality video classification data is limited , Corresponding image 、 Audio 、 It is difficult to extract high semantic features of text ;
There is a semantic gap between different modes , There are challenges in the interaction between modes , Different modes may interfere with each other ;
Mixed video themes and difficult problems in long video processing , Single mode may have large noise and missing , It has high requirements for the robustness of the model .
Based on the above difficulties , Practice examples integrate text 、 Video images 、 Video multimode feature extraction based on three modes of audio , Then feature fusion , Finally, multi label classification , Compared with pure video image features , Significantly improve the effect of high-level semantic labels .
This example summarizes a variety of optimization experience , Powerful pre training based on entity information ERNIE, Improve the ability of text representation , Hold on ERNIE Parameters of , After TextCNN Knowledge in the field of e-learning , Speed up model training , Multimodal cross attention Improve the interaction ability of different modes , Finally achieve 85.59% The accuracy of the model .
3、 Football video clips
Sports highlights video needs fast and high-quality automatic editing tools to process the video quickly . Professional sports training needs big data support , Get familiar with yourself and your opponents through game or daily training video playback , Conduct tactical exercises , The media industry also needs tools to extract the required video content , Produce high timeliness news materials .
Scene difficulty
The complexity of motion detection task is high : The key point of video clip editing task is to accurately find the starting and ending point of this kind of action . But sports videos often contain a lot of redundant background information , The types of actions are diverse and the duration is relatively short , It is necessary to accurately judge the starting point and corresponding category of the action , The task is difficult ;
The information in the video is diverse , How to effectively use these characteristic information .
To solve the above problems , We finally chose TSN+BMN+LSTM As the basic model scheme , Ensure the accuracy of fragment extraction . The optimization strategy includes using a method for extracting video image features Flying propeller Characteristic model PP-TSM、TSN and TSM, Data expansion and extended timing behavior proposal. The final accuracy is 91%,F1-score achieve 76.2%.
Example course of industrial practice
Help enterprises to stride forward AI Landing gap
Flying propeller Examples of industrial practice , Committed to accelerating AI In the forward path of industrial landing , Reduce the gap between theoretical technology and industrial application . The example comes from the real business scenario of the industry , Through complete code , Provide solution process analysis from data preparation to model deployment , It can be called an industrial landing “ Automatic pilot ”.
Real industrial scene : With the actual AI Enterprise cooperation and co construction of application , Select the high-frequency demand of the enterprise AI Application scenarios such as smart city - Helmet detection 、 Intelligent manufacturing - Meter reading, etc ;
Complete code implementation : Provide code that can be run with one key , stay “AI Studio One stop development platform ” Use the free power one button on Notebook function ;
Detailed process analysis : Deep parsing starts with data preparation and processing 、 Model selection 、 Model optimization and deployment AI The whole process of landing , Share reusable model tuning and optimization experience ;
Direct project landing : Baidu senior engineer teaches users the whole process code practice , Easy access to the project POC Stage .
Wonderful course preview
The three scenes of the above sports events have been built into industrial practice examples for everyone to quickly start to experience and apply , besides , We have also prepared corresponding course explanations .2 month 17 Japan 20:00-21:30, Professor Liu of Dalian University of technology and Baidu senior engineer will deeply analyze from data preparation 、 The whole development process from scheme design to model optimization deployment , Hand in hand to teach you code practice .
Welcome to sweep the code into the group , Get free links to live classes and playback videos , More opportunities to cover smart cities 、 Industrial manufacturing 、 Finance 、 Internet and other industries Flying propeller Industry practice example manual ! Also welcome interested enterprises and developers to contact us , Exchange technology and discuss cooperation .
Excellent content first
Official account , Get more technical content ~
This article is shared in Blog “ Flying propeller PaddlePaddle”(CSDN).
If there is any infringement , Please contact the [email protected] Delete .
Participation of this paper “OSC Source creation plan ”, You are welcome to join us , share .
边栏推荐
- LeetCode 19. Delete the penultimate node of the linked list
- Tensorflow and deep learning day 3
- 2. < tag dynamic programming and conventional problems > lt.343 integer partition
- 6.1 - 6.2 Introduction à la cryptographie à clé publique
- 图像翻译/GAN:Unsupervised Image-to-Image Translation with Self-Attention Networks基于自我注意网络的无监督图像到图像的翻译
- Douban top250
- NVM installation and use and NPM package installation failure record
- RESNET in tensorflow_ Train actual combat
- Sentimentin tensorflow_ analysis_ layer
- Multipass Chinese document - setup driver
猜你喜欢
torchvision_ Transform (image enhancement)
DBeaver 安装及配置离线驱动
Multipass中文文档-设置驱动
6.1 - 6.2 公鑰密碼學簡介
[unity3d] rigid body component
Second day of deep learning and tensorfow
How MySQL deletes all redundant duplicate data
图解OneFlow的学习率调整策略
Pycharm package import error without warning
How can the intelligent transformation path of manufacturing enterprises be broken due to talent shortage and high cost?
随机推荐
[ide (imagebed)]picgo+typora+aliyunoss deployment blog Gallery (2022.6)
YOLOv5-6.0的一些参数设置和特征图可视化
[latex] error type summary (hold the change)
Machine learning final exercises
Solution to back-off restarting failed container
Day4 branch and loop jobs
Hash problem
【Unity3D】碰撞体组件Collider
Nabicat connection: local MySQL & cloud service MySQL and error reporting
超高精度定位系统中的UWB是什么
[unity3d] rigid body component
0622 horse palm fell 9%
【Unity3D】刚体组件Rigidbody
Genius makers: lone Rangers, technology giants and AI | ten years of the rise of in-depth learning
ROS 笔记(07)— 客户端 Client 和服务端 Server 的实现
UWB ultra high precision positioning system architecture
Large numbers (C language)
【quartz】从数据库中读取配置实现动态定时任务
Multipass中文文档-移除实例
Method of saving pictures in wechat applet