CN-122024240-A - Pseudo tag generation method for self-learning of robot with body and robot system

CN122024240ACN 122024240 ACN122024240 ACN 122024240ACN-122024240-A

Abstract

The application relates to a pseudo tag generation method for self-learning of a robot with a body and a robot system. The method comprises the steps of obtaining interactive multimedia data between a robot body and an object to be learned, determining state multimedia data from the interactive multimedia data, enabling the state multimedia data to reflect state changes of the object to be learned in the process of interacting with the robot body, performing motion decomposition on the state multimedia data to obtain component multimedia data of key components of the object to be learned, generating component pseudo tags related to the object to be learned based on the component multimedia data, generating state pseudo tags used for reflecting state changes of the object to be learned based on the state multimedia data, generating object pseudo tags related to the object to be learned based on the state pseudo tags and the component pseudo tags, and enabling the object pseudo tags to be used for learning training of the object to be learned by the robot body. By adopting the method, the quality of the pseudo tag can be improved, and further the task success rate of the robot with the body is improved.

Inventors

Request for anonymity
Request for anonymity
LI ZHICHEN
PAN YANG
LI JIANGANG

Assignees

卧安科技(深圳)有限公司

Dates

Publication Date: 20260512
Application Date: 20260410

Claims (13)

1. A pseudo tag generation method for self-learning of a robot of interest, the method comprising: Acquiring interactive multimedia data between the robot with the body and an object to be learned; determining state multimedia data from the interactive multimedia data, wherein the state multimedia data reflects the state change of the object to be learned in the interaction process with the robot with the body; Performing motion decomposition on the state multimedia data to obtain component multimedia data of key components of the object to be learned, and generating a component pseudo tag related to the object to be learned based on the component multimedia data; Generating a state pseudo tag for reflecting the state change of the object to be learned based on the state multimedia data; And generating an object pseudo tag related to the object to be learned based on the state pseudo tag and the part pseudo tag, wherein the object pseudo tag is used for learning and training the object to be learned by the robot.
2. The method of claim 1, wherein the critical components comprise moving components and stationary components; correspondingly, the motion decomposition is performed on the state multimedia data to obtain the component multimedia data of the key component of the object to be learned, which comprises the following steps: The state multimedia data is subjected to motion decomposition to obtain multimedia data of a follow-up region and multimedia data of a static region, wherein the follow-up region is a region corresponding to a motion part of the object to be learned; And generating component multimedia data of the key component based on the multimedia data of the follow-up region, the multimedia data of the static region and the state multimedia data.
3. The method of claim 2, wherein the critical component further comprises an operable component and a motion-engaging component; Correspondingly, the generating the component multimedia data of the key component based on the multimedia data of the follow-up region, the multimedia data of the static region and the status multimedia data includes: Determining multimedia data of an end effector region of the autonomous robot from the status multimedia data; Determining multimedia data for the operable component based on an intersection between the multimedia data for the end effector region and the multimedia data for the follower region; acquiring a motion matching relation between the follow-up region and the static region, and analyzing the motion matching region of the state multimedia data based on the motion matching relation to obtain the multimedia data of the motion matching component; and determining the multimedia data of the follow-up area, the multimedia data of the static area, the multimedia data of the operable component and the multimedia data of the motion coordination component as the component multimedia data of the key component.
4. The method of claim 2, wherein the generating a part pseudo tag for the object to be learned based on the part multimedia data comprises: performing mask processing on the multimedia data of the follow-up region to obtain a moving part mask of the moving part, and determining the moving part mask as a mask label of the moving part; performing mask processing on the multimedia data of the static area to obtain a static component mask of the static component, and masking the static component with a mask label of the static component; a component pseudo tag is generated for the object to be learned based on the mask tag of the moving component and the mask tag of the stationary component.
5. A method according to claim 3, wherein said generating a component pseudo tag for the object to be learned based on the component multimedia data comprises: Performing mask processing on the multimedia data of the operable component to obtain an operable component mask, and determining the operable component mask as a mask label of the operable component; performing mask processing on the multimedia data of the motion matching component to obtain a motion matching component mask, and determining the motion matching component mask as a mask label of the motion matching component; one or more of the mask tag of the moving part, the mask tag of the stationary part, the mask tag of the operable part, and the mask tag of the moving mating part are determined as part pseudo tags with respect to the object to be learned.
6. The method of claim 5, wherein the component multimedia data includes pixel position data, wherein the component pseudo tag further includes a position tag of each key component and a relative position relationship tag between different key components, and wherein the method further comprises: Determining position information of any key component based on pixel position data of the key component, and determining a position label of the key component based on the position information; and generating a relative position relation label between the key component and other key components based on the position information of the key component and the position information of other key components.
7. A method according to claim 3, wherein said obtaining a kinematic fit relationship between the follower region and the stationary region comprises: Determining motion interaction data of the body robot from the interaction multimedia data, wherein the motion interaction data reflects the joint state of the body robot in the interaction process with the object to be learned; Performing joint inverse calculation on the object to be learned based on the follow-up region, the static region and the motion interaction data to obtain the motion type and motion constraint data of the object to be learned, wherein the motion constraint data comprises one or more of a motion constraint direction and a motion constraint range; And determining the motion type and the motion constraint data of the object to be learned as a motion coordination relation between the follow-up region and the static region.
8. The method according to claim 7, wherein the analyzing the motion matching region of the status multimedia data based on the motion matching relationship to obtain the multimedia data of the motion matching component includes: under the condition that the motion type characterizes the object to be learned as a rotation type, taking a motion constraint direction corresponding region in the motion constraint data as a motion coordination region; Determining a rotation axis of the object to be learned from the motion coordination area; and determining the multimedia data of the region where the rotation axis is located as the multimedia data of the motion matching component.
9. The method of claim 1, wherein the status multimedia data comprises M frames of multimedia data, M being a positive integer; the generating a state pseudo tag for reflecting the state change of the object to be learned based on the state multimedia data comprises the following steps: The motion type and motion constraint data of the object to be learned are obtained by performing joint inverse calculation on the object to be learned based on the state multimedia data and the motion interaction data of the robot with body; Determining motion constraint data of an object to be learned in the ith frame of multimedia data based on the motion type and motion constraint data of the object to be learned and the state data of the object to be learned in the ith frame of multimedia data aiming at the ith frame of multimedia data in the M frames of multimedia data; and generating a state pseudo tag for reflecting the state change of the object to be learned based on the motion constraint data and the motion constraint data respectively corresponding to the M frames of multimedia data.
10. The method of claim 9, wherein the motion constraint data comprises a motion constraint direction and a motion constraint range; The generating a state pseudo tag for reflecting the state change of the object to be learned based on the motion constraint data and the motion constraint data respectively corresponding to the M frames of multimedia data includes: Determining a movable direction and a movable range of the object to be learned based on the movement constraint direction and the movement constraint range; determining a motion constraint model of the object to be learned based on the motion constraint data respectively corresponding to the M frames of multimedia data and the movable direction and movable range of the object to be learned; And generating a state pseudo tag for reflecting the state change of the object to be learned based on the motion constraint model of the object to be learned.
11. The method of claim 1, wherein the acquiring interactive multimedia data between the autonomous robot and the object to be learned comprises: acquiring task multimedia data of the robot body and the task object in the interaction process; Determining object feature information of the task object based on the task multimedia data; Determining familiarity of the robot with respect to the task object based on the object feature information; And if the familiarity degree is smaller than or equal to a preset degree threshold, determining the task object as an object to be learned, and determining the task multimedia data as interactive multimedia data between the robot with the body and the object to be learned.
12. The method of claim 11, wherein the object feature information includes object type information and object mask information; the determining familiarity of the self-contained robot with the task object based on the object feature information includes: If the object type information of the task object is a target object type, matching the object mask information with each piece of candidate mask information included in a memory bank of the robot with a body to obtain a mask matching result, wherein the target object type comprises one or more of an operable object type and a variable state object type; and if the mask matching result indicates that the candidate mask information matched with the object mask information does not exist in the memory bank, determining that the familiarity of the robot with the task object is smaller than or equal to a preset degree threshold.
13. A self-contained robotic system, the self-contained robotic system comprising: the first acquisition module is used for acquiring interactive multimedia data between the robot with the body and an object to be learned; The first determining module is used for determining state multimedia data from the interactive multimedia data, wherein the state multimedia data reflects the state change of the object to be learned in the interaction process with the robot with the body; The first generation module is used for carrying out motion decomposition on the state multimedia data to obtain component multimedia data of key components of the object to be learned, and generating a component pseudo tag related to the object to be learned based on the component multimedia data; the second generation module is used for generating a state pseudo tag for reflecting the state change of the object to be learned based on the state multimedia data; And the third generation module is used for generating an object pseudo tag related to the object to be learned based on the state pseudo tag and the part pseudo tag, and the object pseudo tag is used for learning and training the object to be learned by the robot.

Description

Pseudo tag generation method for self-learning of robot with body and robot system Technical Field The application relates to the technical field of self-learning robots, in particular to a pseudo tag generation method and a robot system for self-learning of the self-learning robot. Background With the development of technology in the intelligent field, the self-contained robots are increasingly widely applied and have more powerful functions, like home service robots can execute tasks according to user instructions. The label generation of the robot body mainly depends on the label of the interactive object when the robot body executes the task, and in the traditional technology, the label generation mainly depends on manual pixel-by-pixel labeling, and depends on model prediction or teacher network output. However, in the two traditional schemes, as the shapes of objects facing the robot body are various, the classes are long, the manual pixel-by-pixel labeling cost is high, and the model prediction or teacher network output is easy to distort, so that the task execution of the robot body fails, and the task success rate is affected. Disclosure of Invention In view of the foregoing, it is desirable to provide a pseudo tag generation method and a robot system for self-learning of a robot with a body, which can improve the quality of the pseudo tag and further improve the success rate of the interaction task of the robot with the body. In a first aspect, the present application provides a pseudo tag generation method for self-learning of an autonomous robot, including: Acquiring interactive multimedia data between the robot with the body and an object to be learned; Determining state multimedia data from the interactive multimedia data, wherein the state multimedia data reflects the state change of an object to be learned in the process of interacting with the robot with the body; Performing motion decomposition on the state multimedia data to obtain component multimedia data of key components of the object to be learned, and generating a component pseudo tag related to the object to be learned based on the component multimedia data; generating a state pseudo tag for reflecting state change of an object to be learned based on the state multimedia data; and generating an object pseudo tag related to the object to be learned based on the state pseudo tag and the part pseudo tag, wherein the object pseudo tag is used for learning and training the object to be learned by the robot with the body. In a second aspect, the present application also provides a robot system with body, comprising: the acquisition module is used for acquiring interactive multimedia data between the robot with the body and the object to be learned; The system comprises a determining module, a learning module and a control module, wherein the determining module is used for determining state multimedia data from the interaction multimedia data, wherein the state multimedia data reflects the state change of the object to be learned in the interaction process with the robot with the body; The first generation module is used for carrying out motion decomposition on the state multimedia data to obtain component multimedia data of key components of the object to be learned, and generating a component pseudo tag related to the object to be learned based on the component multimedia data; the second generation module is used for generating a state pseudo tag for reflecting the state change of the object to be learned based on the state multimedia data; And the third generation module is used for generating an object pseudo tag related to the object to be learned based on the state pseudo tag and the part pseudo tag, and the object pseudo tag is used for learning and training the object to be learned by the robot. In a third aspect, the present application also provides a computer device comprising a memory storing a computer program and a processor implementing the steps in the above-mentioned pseudo tag generation method for self-learning of an autonomous robot when the processor executes the computer program. In a fourth aspect, the present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in the above-described pseudo tag generation method for self-learning of an autonomous robot. In a fifth aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps in the above-described pseudo tag generation method for self-learning of an autonomous robot. According to the pseudo tag generation method and the robot system for self-learning of the robot with the body, the interactive multimedia data between the robot with the body and the object to be learned are obtained, and the state multimedia data are determined from the interactive multimedia dat