WO-2026092842-A1 - METHOD AND DEVICE FOR DEMONSTRATION-BASED PROGRAMMING OF A ROBOT OPERABLE IN MULTIPLE CONTROL MODES

WO2026092842A1WO 2026092842 A1WO2026092842 A1WO 2026092842A1WO-2026092842-A1

Abstract

A method of programming an industrial robot (100), which comprises a robot manipulator (110) and a robot controller (120), comprising: recording movements of a robot manipulator during a demonstration-based programming session, for thereby obtaining a robot trajectory; capturing speech data while recording the movements of the robot manipulator; decomposing the captured speech data into a plurality of phases, wherein the robot trajectory has corresponding phases; providing an annotated robot trajectory by performing the following for at least one phase of the speech data: (i) using a natural-language model, parsing the speech data into at least one robot parameter value, and (ii) annotating a corresponding phase of the robot trajectory with the at least one robot parameter value; and generating a robot program on the basis of the annotated robot trajectory. The robot parameter value indicates a motion template to be used for realizing the robot trajectory, selected from a predefined set including a motion template in force-control mode and a motion template in position-control mode.

Inventors

ZUDAIRE, Sebastian
WAHRBURG, Arne
ENAYATI, Nima

Assignees

ABB SCHWEIZ AG

Dates

Publication Date: 20260507
Application Date: 20241030

Claims (13)

1. A method (200) of programming an industrial robot (100), which comprises a robot manipulator (no) and a robot controller (120), the method comprising: recording (210) movements of a robot manipulator during a demonstration-based programming session, for thereby obtaining a robot trajectory (307); capturing (211) speech data (302) while recording the movements of the robot manipulator; decomposing (213) the captured speech data into a plurality of phases, wherein the robot trajectory has corresponding phases; providing an annotated robot trajectory (308) by performing the following for at least one phase of the speech data: using a natural-language model (171), parsing (215) the speech data into at least one robot parameter value (306); annotating (216) a corresponding phase of the robot trajectory with the at least one robot parameter value; and generating (218) a robot program (309) on the basis of the annotated robot trajectory, characterized in that the robot parameter value indicates a motion template to be used for realizing the robot trajectory, wherein the motion template is selected from a predefined set including at least one motion template in which the robot is operated in force-control mode and at least one motion template in which the robot is operated in position-control mode.
2. The method (200) of claim 1, wherein the robot parameter value further includes a template argument indicating a configurable property of the selected motion template.
3. The method (200) of claim 2, wherein the template argument is one or more of: movement speed, degree of compliance with the robot trajectory, degree of movement precision, gripping force of a robot tool, status of a robot tool, handling force, geometric movement constraints, whether to operate the robot in force-control or position-control mode. 43
4. The method (200) of any of the preceding claims, further comprising: capturing (212) a video (303) of the programming session; and generating (214), using a vision-enabled language model, VLM (172), a naturallanguage description (305) of phases of the video, which is utilized in providing the annotated robot trajectory.
5. The method (200) of claim 4, further comprising: parsing the VLM-generated description (305), in addition to the speech data (302), into said at least one robot parameter value (306).
6. The method (200) of claim 4, further comprising: parsing the VLM-generated description (305), in addition to the speech data (302), into a template argument indicating whether the robot is to be operated in force-control mode or positioncontrol mode.
7. The method (200) of any of the preceding claims, wherein said annotating (216) comprises deriving a robot parameter value, or a modification of a robot parameter value, from the robot trajectory (307) on the basis of predefined heuristics.
8. The method (200) of claim 7, wherein the predefined heuristics are adapted for locating one or more missing template arguments in a motion template and to predict contextually suitable values of the template arguments.
9. The method (200) of claim 7 or 8, wherein multiple predefined heuristics are selectively applied, each being specific to one or more motion templates in the predefined set.
10. The method (200) of any of the preceding claims, wherein the natural-language model (171) and the VLM (172) are implemented as distinct models.
11. The method (200) of any of claims 1 to 9, wherein the natural-language model and the VLM are implemented as a single multi-modal model.
12. A programming device (160) for facilitating programming of an industrial robot (100), which comprises a robot manipulator (no) and a robot controller (120), the programming device comprising memory (162) and processing circuitry (161) configured to: record movements of a robot manipulator during a demonstration-based programming session, for thereby obtaining a robot trajectory (307); 44 capture speech data (302) while recording the movements of the robot manipulator; decompose the captured speech data into a plurality of phases, wherein the robot trajectory has corresponding phases; provide an annotated robot trajectory (308) by performing the following for at least one phase of the speech data: parse, using a natural-language model (171), the speech data into at least one robot parameter value (306); annotate the phase of the robot trajectory with the at least one robot parameter value; and generate a robot program (309) on the basis of the annotated robot trajectory, characterized in that the robot parameter value indicates a motion template to be used for realizing the robot trajectory, wherein the motion template is selected from a predefined set including at least one motion template in which the robot is operated in force-control mode and at least one motion template in which the robot is operated in position-control mode
13. A computer program (163) comprising instructions which, when the program is executed by a computer, cause the computer to carry out the method (200) of any of claims 1 to 11.

Description

METHOD AND DEVICE FOR DEMONSTRATION-BASED PROGRAMMING OF A ROBOT OPERABLE IN MULTIPLE CONTROL MODES TECHNICAL FIELD [0001] The present disclosure generally relates to the field of robotic control, and specifically to demonstration-based programming of an industrial robot. More precisely, methods and devices are proposed herein which support decision-making on whether to operate the industrial robot in force-control mode or position-control mode. The decision may be expressed in terms of a motion template in which the robot is operated in either of these control modes. BACKGROUND [0002] Two important branches of the field of robot control are human-to-robot instruction and robot teaching. Human-to-robot instruction addresses the problem of automatically generating robot motion from a non-technical, high-level description of a task such as “Assemble the car” or “Make me coffee”. In robot teaching, a robot program realizing the robot motion is automatically generated from a demonstration by an operator, who shows the robot how the task is to be carried out. Demonstration-based robot programming may include kinesthetic programming and so-called passive observation. In kinesthetic robot programming the robot handles the workpiece during the programming session, while in passive observation (robotless demonstration) a human operator who carries out the robot task may handle the workpiece in the robot’s stead, and the robot learns to imitate whatever handling the workpiece is exposed to. [0003] Once an area exclusive to seasoned programmers of high technical expertise, the user base of demonstration-based robot programming has broadened lately thanks to the advent of Large Language Models (LLMs), and Generative Pretrained Transformers (GPTs) in particular. LLMs are increasingly being applied to robot control and robot programming. See for instance the applicant’s prior disclosure PCT/EP2023/087949, which proposed a speech-supplemented robot programming method based on the following workflow: A demonstration-based (in particular, a kinesthetic) programming session, during which movements of the robot manipulator are captured by proprioceptive sensors. The movements are recorded and saved as a robot trajectory. • A recording of speech data, which takes place during the demonstrationbased programming session. • By suitable prompting, an LLM is caused to parse the speech data into values of robot parameters, such as a tool state, a movement speed, a degree of compliance with the trajectory, a degree of movement precision, a choice of reference frame to express the movements in. • The robot trajectory is annotated with each robot parameter value at a point of the trajectory which corresponds to the time of utterance. • A robot program is generated as a sequence of commands which realizes the robot trajectory while applying the robot parameter values as modifiers. The programming method according to PCT/EP2023/087949 offers the operator a convenient, hands-free way of adding explanatory remarks to the recorded movements (purpose of a step, aspects to pay particular attention to), or of requesting certain robot parameter settings which cannot be seen or sensed during the demonstration (speed, precision) etc., with the ultimate aim of providing a usable robot program in shorter time. The programming workflow as such appears to be proper to the applicant. [0004] Configuring contact-force regulation is a known difficulty in the programming of robot applications that include interaction with workpieces and other objects in the working environment. In particular, it may have to be configured whether the industrial robot is to be operated in position-control mode, which generally speaking has better precision, or force-control mode, which may be preferable for applications that involve contact forces between the robot and the environment, and delicate workpieces in particular. [0005] At the core of this difficulty is the fact that, since the static forces sum to zero at each external interface of the robot, a proprioceptive sensor cannot capture whatever attempts the operator makes to demonstrate a desired magnitude of a contact force. It will not be felt by the gripper actuator that the operator pinches the gripper harder around a workpiece (e.g., to indicate an increased gripping force). Likewise, the operator’s intuitive attempt to ask for an increased downward force by pressing the end-effector against a table will not give rise to an increase in torque at the inner joints of the robot manipulator. In the absence of a dedicated external force sensor (contact force sensor) arranged in or at the external interface concerned, the contact force can only be determined in exceptional situations. [0006] It would be desirable to automate or at least support the decision-making on whether to operate the industrial robot in force-control mode or position-control mode. SUMMARY [0007] One objective of the present disclosure is to