当前位置：网站首页>Robot decision-making system based on self-learning (daki technology, Zhao kaiyong)

Robot decision-making system based on self-learning (daki technology, Zhao kaiyong)

2022-07-23 19:43:00 【Master Ma】

2020 year 9 month 25-26 Japan ,2020 The salon for young scientists of the China Science and Technology Summit Series of activities will usher in a new phase —“ AI academic ecology and industrial innovation ”. This activity is sponsored by China Association for science and Technology , Department of computer science, Tsinghua University 、AI TIME、 Wisdom spectrum ·AI undertake ; Complete video report of the Conference , Please be there. B Focus on “AI Time On the way ”, Or click below “ Read the original ”.

9 month 25 The morning of , The conference invited the chief architect of Kaike Technology , Mr. zhaokaiyong, vice president of R & D, made a project named 《 Robot decision system based on self-learning 》 Keynote speech of .

In his speech, , Mr. Zhao kaiyong mainly introduced that daki technology accelerates robot learning through cloud platform , And formed a set of traditional methods + Human experience + Methods of reinforcement learning .

Zhao kaiyong , Doctor , robot , Artificial intelligence , Senior practitioners in the field of high-performance computing , With many years of scientific and Technological Development , Team management and industry development experience, M & A experience . The current CloudMinds chief architect , Vice president of research and development , Responsible for leading AI and Navigation Department , Previously, he was the head of Dajiang Internet business department , Responsible for the company's Internet services and 3D The overall strategy for the application of Surveying and mapping industry . Dr. Zhao has long been engaged in high-performance computing

One 、 Problems faced by robot control
Insert picture description here
Dahui technology is Huang Xiaoqing, former president of China Mobile Research Institute 2015 A company founded in , It is a cloud intelligent robot operator , Mainly engaged in cloud intelligent robot operation level security cloud computing network 、 Large scale hybrid artificial intelligence machine learning platform 、 And the research of safety intelligent terminal and robot controller technology .

Take a look at their main products , Including service robots in the cloud 、 Cloud security robot 、 Cleaning robots 、 Life robot and cloud access control , These terminal devices pass through the safe and high-speed optical fiber network （VBN）, With the cloud robot operating system HARIX Connect . Many practical problems have been encountered in this process , During the research process, many academic related methods and research will be brought to the robot application development , This paper introduces the practical problems and solutions encountered in robot application or development .

Insert picture description here
The above figure shows several topics of the report , The first part explains the problems faced by robots in traditional control methods , Our service robot is a humanoid robot , How does this robot learn to move , The general traditional way is to plan the trajectory of each action of the robot , And code , This leads to the need to reprogram every time a new action is added .. The second part is to gradually improve the learning ability of the robot system , When a robot wants to learn a new action , You don't have to reprogram , Robots can learn new actions through machine learning , Gradually improve their learning 、 The ability to make decisions .. The third is to build a simulation platform , Digital twin platform . stay 20 When I was a robot years ago , It is not convenient to do robot training or learning on the simulation platform , Due to the improvement of computing power in recent years , It is easy to build a complete robot kinematics in the cloud or in a virtual environment 、 Dynamics and control system , In this way, a lot of training can be carried out in the simulation platform , Instead of having to build robot hardware and then do development . With such a simulation environment , The next step is to consider how to add traditional control methods and some existing biological experience to the simulation platform , Form a set of self-learning system . I will give a few examples later , One is how humanoid robots learn to dance , How do robots grasp , And gait learning of quadruped robot dog .

Insert picture description here
In the above two pictures , The left picture shows the robot learning to dance with Jasmine Music . At the beginning of the choreography, I specially asked the teacher of the Dance Academy for help , But how to make robot action softer 、 More anthropomorphic , Is a very challenging problem . The right picture shows the process of robot grasping . It can be seen that the grasping action of the service robot is quite different from that of the industrial robot in the factory , Because service robots need to work in unstructured spaces such as daily life , Therefore, the types of items grabbed 、 size 、 weight 、 The location cannot be determined , Moreover, obstacles may be encountered in the grasp planning to avoid obstacles , The process of grabbing is also a relatively complex process .

Insert picture description here
The picture on the left is quadruped robot dog , Gait planning of quadruped robot is an unsolved problem at present , The gait generated by current traditional methods is very different from that of real quadrupeds , And most of the traditional methods do not consider the difference of gait in different situations , Even now, different environments can be simulated in simulation , However, traditional methods still cannot generate flexible gait planning . The right picture shows the obstacle avoidance of robots in the community , The environment in the community is very complex , In the laboratory, there will be no children around the robot , Some children may cover the camera , Even laser radar , Or climb on the robot , These practical problems have tested the stability of robot planning and decision-making system . Generally speaking, it is similar to the action of the robot dog in front 、 Grab 、 Dancing has similar content , We abstract these processes , Put all the control processes , Defined as robot decision . Using bionic or reinforcement learning methods combined with traditional methods to realize robot decision control in the simulation environment .

Two 、 Robot decision system

Insert picture description here
Traditional motor control , Including current loop , Speed loop , Position loop . This is a mature process , I won't talk about this today . We define the control of each joint as the control of the base layer , With the control of the foundation layer , Multiple joints are combined to form some coordinated control . Through the previous videos , We can see , Traditional joint control , combined , It's multi joint linkage , It is generally two-dimensional or three-dimensional path planning or gait planning . We abstract the process of combination into a robot decision-making process , Basic action decision . Just like the balance decision of our cerebellum , Not just a simple planning problem .

3、 ... and 、 Number twin Insert picture description here
Inside the company , With the help of high-performance hardware and the improvement of Computing , A robot training simulation platform is constructed , Including cloud management and storage , It also contains AI Training platform . With the help of bionics principles and human and animal motion data , Then combine imitation learning 、 Strengthen learning, etc AI Algorithm , Thus, a set of basic action learning library is constructed . In the simulation platform , We model the joints of each robot in a way that is close to the real physical model , Robot training can be carried out in the simulation environment . In the simulation platform , It can also be controlled according to the requirements , Modify the parameters of hardware joints , Finally, put forward the requirements for the real production joints . This process can provide good help for the design of hardware .

Insert picture description here
This is an open platform for intelligent robots in the cloud , It has been applied in some universities . This platform is equivalent to a physical real robot , At the same time, there will be a digital twin system close to the real one , Simulate each robot . From the cloud , There will be a set first 3D Semantic environment of , Build a set of usage scenarios for robots , At the same time, put the robot model into this environment . At the same time, put the existing knowledge base or traditional movement skills into this system , Then develop the movement according to the requirements of training . Then according to 3 and 4 The collected data will be processed in a large amount AI Training . This is equivalent to using traditional experience and human experience to build a limited space and then go through AI Bionic learning and reinforcement learning methods for higher-level space search , It is divided into several layers for different cooperation and training . Is similar to alphago In the process of , Train through human chess scores , There is a foundation for training, and then we can fight left and right , More space and retrograde search .

Four 、 Robot control
Insert picture description here
Summarize some past studies , You can see traditional control methods such as RRT、DMP etc. , It defines a control domain , But if we combine bionic learning and reinforcement learning , It is equivalent to searching for the optimal solution in a larger range or higher dimensional space .
Insert picture description here
The above figure is a schematic diagram of the cooperation training between a real robot and a digital twin robot in a virtual environment . For example, sensing world information through real robot sensors , Three dimensional reconstruction can be carried out in the virtual environment , Re pass AI Reasoning and decision making , Can produce behavior , And try to evaluate the action in the virtual environment , Download it to the real robot for execution after it is accurate .
Insert picture description here

If you want to learn a new action, you will first pass a video , In the video, the staff makes an action , Generate actions after real-time recognition according to each action . We caught a lot of videos from Tiktok , adopt 2D Video get 3D The attitude of the , These gestures are mapped into the robot joints . Of course, the mapping in this is not a simple mapping , If it is a simple mapping, there will be problems , For example, the joints and movements of the robot platform may not be consistent with those of the dancer , There may also be a collision . If you want to make the action generated by the robot more beautiful , More anthropomorphic , We need to learn from these data , And then generate data-driven behavior , In the process , The robot will generate actions as similar as possible according to its own structural characteristics and physical constraints , Make the robot dance as close to nature as possible .

Insert picture description here
The second scenario is crawling . On the left is the real scene , On the right is a virtual scene . In order to generate a more anthropomorphic grab action , First, people need to wear motion capture devices to record data , Combine in the simulation platform AI Do a lot of training , It can form a set of robot grasping knowledge base , This also avoids the new capture action to collect data again .

Insert picture description here
The picture above shows the robot dog , Use in the simulation environment MIT Model for control , For example, forward and backward , You may have seen it on the Internet .

Insert picture description here
The realization of this robot action is to combine the traditional control mode with deep learning 、 Bionics combined , Equivalent to the existing traditional search space , At the same time, some machine learning methods such as reinforcement learning are used to search for a larger space . Because traditional methods usually need modeling , Therefore, the control effect is often affected by the simplification of modeling , When combined with reinforcement learning, we can get a broader search space .

Insert picture description here
By comparing the two sides , It can be seen that bionic training is end-to-end training , There's no need for complicated design , But it's not flexible enough , Can't land now . The traditional method is more flexible , The robustness is also relatively strong , But action is not the best action , Just say every action , For example, when walking , The gait planning of quadruped robot is quite different from that of real animals . Combine the two , Energy consumption can be reduced , More stable .

Insert picture description here
This is the energy consumption curve of quadrupeds during walking . In the traditional control mode of robot , Each gait is a separate state , You can only instantly switch from one gait to another , But real animals are quite different . Recently, Google has a paper , It is carrying out such AI When training or searching , A large amount of data is captured online or collected externally . In fact, in this process, we are also aware of this problem , Because people or nature already have a lot of data , We need to combine such data , Form a data-driven robot action training method , You don't need to train an action completely from scratch , Especially for quadruped gait robot or robot grasping , Because there are already a lot of experience values . After using these empirical values , By defining some constraints and boundary conditions on the data , Search in a limited space , And achieve the desired effect faster , The energy consumed by these actions can be minimized .

Insert picture description here

This is the time to do different gait training for robot dogs , Train multiple robots , Add different parameters, such as different forces , Different circumstances .

Insert picture description here
This is a large-scale scene training , There are different states . Because this is a distributed platform , So the speed can be done very fast . The key point here is , We will get a limited search space with the help of traditional methods , At the same time, another search space is obtained by using empirical values , Combined with reinforcement learning AI Training can combine the two in a wider range to find the best . It's kind of like AlphaGO When learning chess score, first learn some information with human chess score , Of course, the learning here will be more artificially controlled , Put people's experience value into this .

Insert picture description here
This is the open platform of daki , It has been used in Colleges and universities . After this training platform and the whole training process are put online , More people will use this open platform , You can train your robot on this , Even build a robot system by yourself , You can put it on this to get the actual effect you want . Now when we design robots in-house , It has been different from the traditional way of designing robots , We will first design the characteristic requirements of the robot on the simulation platform . This is based on the current strong computing power , Get rid of the shackles of physical robots , So I will do training in the simulation platform first . Requirements for structure , Requirements of each link , Requirements of each joint , For example, quadruped robots can first carry out some gait training in it , Ask for hardware after training , This is also the purpose of our open platform .