当前位置:网站首页>Robot decision-making system based on self-learning (daki technology, Zhao kaiyong)
Robot decision-making system based on self-learning (daki technology, Zhao kaiyong)
2022-07-23 19:43:00 【Master Ma】
2020 year 9 month 25-26 Japan ,2020 The salon for young scientists of the China Science and Technology Summit Series of activities will usher in a new phase —“ AI academic ecology and industrial innovation ”. This activity is sponsored by China Association for science and Technology , Department of computer science, Tsinghua University 、AI TIME、 Wisdom spectrum ·AI undertake ; Complete video report of the Conference , Please be there. B Focus on “AI Time On the way ”, Or click below “ Read the original ”.
9 month 25 The morning of , The conference invited the chief architect of Kaike Technology , Mr. zhaokaiyong, vice president of R & D, made a project named 《 Robot decision system based on self-learning 》 Keynote speech of .
In his speech, , Mr. Zhao kaiyong mainly introduced that daki technology accelerates robot learning through cloud platform , And formed a set of traditional methods + Human experience + Methods of reinforcement learning .
Zhao kaiyong , Doctor , robot , Artificial intelligence , Senior practitioners in the field of high-performance computing , With many years of scientific and Technological Development , Team management and industry development experience, M & A experience . The current CloudMinds chief architect , Vice president of research and development , Responsible for leading AI and Navigation Department , Previously, he was the head of Dajiang Internet business department , Responsible for the company's Internet services and 3D The overall strategy for the application of Surveying and mapping industry . Dr. Zhao has long been engaged in high-performance computing
One 、 Problems faced by robot control 
Dahui technology is Huang Xiaoqing, former president of China Mobile Research Institute 2015 A company founded in , It is a cloud intelligent robot operator , Mainly engaged in cloud intelligent robot operation level security cloud computing network 、 Large scale hybrid artificial intelligence machine learning platform 、 And the research of safety intelligent terminal and robot controller technology .
Take a look at their main products , Including service robots in the cloud 、 Cloud security robot 、 Cleaning robots 、 Life robot and cloud access control , These terminal devices pass through the safe and high-speed optical fiber network (VBN), With the cloud robot operating system HARIX Connect . Many practical problems have been encountered in this process , During the research process, many academic related methods and research will be brought to the robot application development , This paper introduces the practical problems and solutions encountered in robot application or development .

The above figure shows several topics of the report , The first part explains the problems faced by robots in traditional control methods , Our service robot is a humanoid robot , How does this robot learn to move , The general traditional way is to plan the trajectory of each action of the robot , And code , This leads to the need to reprogram every time a new action is added .. The second part is to gradually improve the learning ability of the robot system , When a robot wants to learn a new action , You don't have to reprogram , Robots can learn new actions through machine learning , Gradually improve their learning 、 The ability to make decisions .. The third is to build a simulation platform , Digital twin platform . stay 20 When I was a robot years ago , It is not convenient to do robot training or learning on the simulation platform , Due to the improvement of computing power in recent years , It is easy to build a complete robot kinematics in the cloud or in a virtual environment 、 Dynamics and control system , In this way, a lot of training can be carried out in the simulation platform , Instead of having to build robot hardware and then do development . With such a simulation environment , The next step is to consider how to add traditional control methods and some existing biological experience to the simulation platform , Form a set of self-learning system . I will give a few examples later , One is how humanoid robots learn to dance , How do robots grasp , And gait learning of quadruped robot dog .

In the above two pictures , The left picture shows the robot learning to dance with Jasmine Music . At the beginning of the choreography, I specially asked the teacher of the Dance Academy for help , But how to make robot action softer 、 More anthropomorphic , Is a very challenging problem . The right picture shows the process of robot grasping . It can be seen that the grasping action of the service robot is quite different from that of the industrial robot in the factory , Because service robots need to work in unstructured spaces such as daily life , Therefore, the types of items grabbed 、 size 、 weight 、 The location cannot be determined , Moreover, obstacles may be encountered in the grasp planning to avoid obstacles , The process of grabbing is also a relatively complex process .

The picture on the left is quadruped robot dog , Gait planning of quadruped robot is an unsolved problem at present , The gait generated by current traditional methods is very different from that of real quadrupeds , And most of the traditional methods do not consider the difference of gait in different situations , Even now, different environments can be simulated in simulation , However, traditional methods still cannot generate flexible gait planning . The right picture shows the obstacle avoidance of robots in the community , The environment in the community is very complex , In the laboratory, there will be no children around the robot , Some children may cover the camera , Even laser radar , Or climb on the robot , These practical problems have tested the stability of robot planning and decision-making system . Generally speaking, it is similar to the action of the robot dog in front 、 Grab 、 Dancing has similar content , We abstract these processes , Put all the control processes , Defined as robot decision . Using bionic or reinforcement learning methods combined with traditional methods to realize robot decision control in the simulation environment .
Two 、 Robot decision system

Traditional motor control , Including current loop , Speed loop , Position loop . This is a mature process , I won't talk about this today . We define the control of each joint as the control of the base layer , With the control of the foundation layer , Multiple joints are combined to form some coordinated control . Through the previous videos , We can see , Traditional joint control , combined , It's multi joint linkage , It is generally two-dimensional or three-dimensional path planning or gait planning . We abstract the process of combination into a robot decision-making process , Basic action decision . Just like the balance decision of our cerebellum , Not just a simple planning problem .
3、 ... and 、 Number twin 
Inside the company , With the help of high-performance hardware and the improvement of Computing , A robot training simulation platform is constructed , Including cloud management and storage , It also contains AI Training platform . With the help of bionics principles and human and animal motion data , Then combine imitation learning 、 Strengthen learning, etc AI Algorithm , Thus, a set of basic action learning library is constructed . In the simulation platform , We model the joints of each robot in a way that is close to the real physical model , Robot training can be carried out in the simulation environment . In the simulation platform , It can also be controlled according to the requirements , Modify the parameters of hardware joints , Finally, put forward the requirements for the real production joints . This process can provide good help for the design of hardware .

This is an open platform for intelligent robots in the cloud , It has been applied in some universities . This platform is equivalent to a physical real robot , At the same time, there will be a digital twin system close to the real one , Simulate each robot . From the cloud , There will be a set first 3D Semantic environment of , Build a set of usage scenarios for robots , At the same time, put the robot model into this environment . At the same time, put the existing knowledge base or traditional movement skills into this system , Then develop the movement according to the requirements of training . Then according to 3 and 4 The collected data will be processed in a large amount AI Training . This is equivalent to using traditional experience and human experience to build a limited space and then go through AI Bionic learning and reinforcement learning methods for higher-level space search , It is divided into several layers for different cooperation and training . Is similar to alphago In the process of , Train through human chess scores , There is a foundation for training, and then we can fight left and right , More space and retrograde search .
Four 、 Robot control 
Summarize some past studies , You can see traditional control methods such as RRT、DMP etc. , It defines a control domain , But if we combine bionic learning and reinforcement learning , It is equivalent to searching for the optimal solution in a larger range or higher dimensional space .
The above figure is a schematic diagram of the cooperation training between a real robot and a digital twin robot in a virtual environment . For example, sensing world information through real robot sensors , Three dimensional reconstruction can be carried out in the virtual environment , Re pass AI Reasoning and decision making , Can produce behavior , And try to evaluate the action in the virtual environment , Download it to the real robot for execution after it is accurate .
If you want to learn a new action, you will first pass a video , In the video, the staff makes an action , Generate actions after real-time recognition according to each action . We caught a lot of videos from Tiktok , adopt 2D Video get 3D The attitude of the , These gestures are mapped into the robot joints . Of course, the mapping in this is not a simple mapping , If it is a simple mapping, there will be problems , For example, the joints and movements of the robot platform may not be consistent with those of the dancer , There may also be a collision . If you want to make the action generated by the robot more beautiful , More anthropomorphic , We need to learn from these data , And then generate data-driven behavior , In the process , The robot will generate actions as similar as possible according to its own structural characteristics and physical constraints , Make the robot dance as close to nature as possible .

The second scenario is crawling . On the left is the real scene , On the right is a virtual scene . In order to generate a more anthropomorphic grab action , First, people need to wear motion capture devices to record data , Combine in the simulation platform AI Do a lot of training , It can form a set of robot grasping knowledge base , This also avoids the new capture action to collect data again .

The picture above shows the robot dog , Use in the simulation environment MIT Model for control , For example, forward and backward , You may have seen it on the Internet .

The realization of this robot action is to combine the traditional control mode with deep learning 、 Bionics combined , Equivalent to the existing traditional search space , At the same time, some machine learning methods such as reinforcement learning are used to search for a larger space . Because traditional methods usually need modeling , Therefore, the control effect is often affected by the simplification of modeling , When combined with reinforcement learning, we can get a broader search space .

By comparing the two sides , It can be seen that bionic training is end-to-end training , There's no need for complicated design , But it's not flexible enough , Can't land now . The traditional method is more flexible , The robustness is also relatively strong , But action is not the best action , Just say every action , For example, when walking , The gait planning of quadruped robot is quite different from that of real animals . Combine the two , Energy consumption can be reduced , More stable .

This is the energy consumption curve of quadrupeds during walking . In the traditional control mode of robot , Each gait is a separate state , You can only instantly switch from one gait to another , But real animals are quite different . Recently, Google has a paper , It is carrying out such AI When training or searching , A large amount of data is captured online or collected externally . In fact, in this process, we are also aware of this problem , Because people or nature already have a lot of data , We need to combine such data , Form a data-driven robot action training method , You don't need to train an action completely from scratch , Especially for quadruped gait robot or robot grasping , Because there are already a lot of experience values . After using these empirical values , By defining some constraints and boundary conditions on the data , Search in a limited space , And achieve the desired effect faster , The energy consumed by these actions can be minimized .

This is the time to do different gait training for robot dogs , Train multiple robots , Add different parameters, such as different forces , Different circumstances .

This is a large-scale scene training , There are different states . Because this is a distributed platform , So the speed can be done very fast . The key point here is , We will get a limited search space with the help of traditional methods , At the same time, another search space is obtained by using empirical values , Combined with reinforcement learning AI Training can combine the two in a wider range to find the best . It's kind of like AlphaGO When learning chess score, first learn some information with human chess score , Of course, the learning here will be more artificially controlled , Put people's experience value into this .

This is the open platform of daki , It has been used in Colleges and universities . After this training platform and the whole training process are put online , More people will use this open platform , You can train your robot on this , Even build a robot system by yourself , You can put it on this to get the actual effect you want . Now when we design robots in-house , It has been different from the traditional way of designing robots , We will first design the characteristic requirements of the robot on the simulation platform . This is based on the current strong computing power , Get rid of the shackles of physical robots , So I will do training in the simulation platform first . Requirements for structure , Requirements of each link , Requirements of each joint , For example, quadruped robots can first carry out some gait training in it , Ask for hardware after training , This is also the purpose of our open platform .

边栏推荐
- 吃透Chisel语言.21.Chisel时序电路(一)——Chisel寄存器(Register)详解
- 华为云HCS解决方案笔记HUAWEI CLOUD Stack【面试篇】
- .net core implements background tasks (scheduled tasks) longbow Tasks component (III)
- 基于自学习的机器人决策系统(达闼科技赵开勇)
- 小熊拍学习之LED灯的点亮
- 树莓派ssh登录
- Sui of the public chain (New Public chain project established by former Facebook /meta employees)
- PowerCLi 导入 LicenseKey 到esxi
- 为啥一问 JVM 就 懵B ?
- 简历上写的电商,那请问Redis 如何实现库存扣减操作和防止被超卖?
猜你喜欢

Todo fix bug tag feature and other configurations

Calculation of structure size (structure memory alignment)

LeetCode刷题:回文数

Weights & biases (I)

Type-C蓝牙音箱单C口可充电可OTG方案

AE tutorial, how to animate illustrator layered documents in after effects?

There is great competition pressure for software testing jobs, and the "migrant workers" who graduated from 985 also suffer

吃透Chisel语言.21.Chisel时序电路(一)——Chisel寄存器(Register)详解
![Codeworks round 805-808 [partial solution]](/img/02/6fb3a963eac165cbae64cdfc0735e1.png)
Codeworks round 805-808 [partial solution]

树莓派ssh登录
随机推荐
AE tutorial, how to animate illustrator layered documents in after effects?
Analyse de l'industrie | interphone logistique
R语言使用quantile函数计算向量数据或者dataframe指定数据列的分位数(百分位数)
Codeforces Round #809 (Div. 2)【VP记录】
paddle实现,多维时序数据增强 ,mixup(利用beta分布制作连续随机数)
Codeforces Round #805-#808【部分题解】
R语言data.table包进行数据分组聚合统计变换(Aggregating transforms)、计算dataframe数据的分组最小值(min)
Publish the local image to Alibaba cloud warehouse
Powercli add esxi host to vCenter
公链之Sui(前脸书/Meta员工成立的新公链项目)
安全停止nodeos
PowerCLi 管理VMware vCenter 一键批量部署OVF
impala的详细写入流程
Educational codeforces round 132 (rated for Div. 2) [competition record]
Industry analysis | logistics intercom
某些题目对应的智慧数据总结
Powercli imports licensekey to esxi
A preliminary study of the relationship between combinatorial mathematics and DP, and the derivation of resettable combinatorial formulas
selenium中元素定位正确但是操作失败,6种解决办法全稿定
Educational Codeforces Round 132 (Rated for Div. 2)【比赛记录】