Search

CN-121995912-A - Equipment control method and device, electronic mobile equipment and storage medium

CN121995912ACN 121995912 ACN121995912 ACN 121995912ACN-121995912-A

Abstract

The application discloses a device control method, a device, an electronic mobile device and a storage medium, which are characterized in that a decision loop for reinforcement learning is integrated by the physical boundary constraint of the device and the extraction depth of geometric features of a road edge, on the premise of ensuring that the mobile equipment does not collide with the road edge or the obstacle, the welt cleaning precision and decision stability of the equipment under the complex road edge environment are obviously improved.

Inventors

  • ZHOU JUNYU
  • ZHANG LIANGLIANG
  • ZHAO ZHUO
  • XU XUEYI

Assignees

  • 广州星程智能科技有限公司

Dates

Publication Date
20260508
Application Date
20251230

Claims (10)

  1. 1.A device control method, characterized by comprising: Acquiring equipment outline boundary frame size parameters, laser radar beam quantity and detection range; Determining an environmental true distance observation value for characterizing an actual distance of the mobile device from the boundary obstacle based on the device outline bounding box size parameter, the laser radar beam number, and the detection range; Extracting the road edge topology information of the boundary barrier from the environment real distance observation value, and converting the road edge topology information into a specific operation target point to be tracked by the mobile equipment; combining the environment real distance observation value and the operation target point, respectively achieving two dimensions of benefits from physical collision risks and tasks, and quantifying the safety performance and the guiding weight of each action of the mobile equipment in a discrete action space; and generating an action mask by fusing the safety performance and the guide weight, correcting the decision probability distribution of the reinforcement learning model by using the action mask, outputting an execution action, and controlling the mobile equipment based on the execution action.
  2. 2. The method of claim 1, wherein the step of determining an ambient true range observation for characterizing an actual distance of a mobile device from a boundary obstacle based on the device outline bounding box size parameter, the number of lidar beams, and the detection range comprises: Calculating geometric intersection points of radar beams of all the laser radars and dimension parameters of the equipment outline boundary box based on the number of the laser radar beams to obtain radar basic offset values; Acquiring an original observed value of the laser radar, and performing superposition operation on the original observed value and the radar base offset value to obtain an original environment true distance observed value; And executing validity check on the initial environment real distance observation value by adopting the detection range, and determining the initial environment real distance observation value as a corrected environment real distance observation value when a check result accords with a preset condition.
  3. 3. The method according to claim 2, wherein the step of extracting the road edge topology information of the boundary obstacle from the environmental real distance observation value and converting the road edge topology information into a specific job target point to be tracked by the mobile device comprises: determining a current position point of the mobile equipment and a route edge line segment of the boundary obstacle, projecting the current position point of the equipment to the route edge line segment, and determining a nearest projection point; Searching front and rear adjacent sampling points of the nearest projection point on the road edge line segment according to the discrete point sequence of the road edge line segment to obtain a road edge neighborhood point; Calculating a road edge direction vector according to the road edge neighborhood points; And performing spatial offset on the nearest projection point by utilizing the normal direction of the road edge direction vector and a preset target distance to generate a welt target point, and determining the welt target point as a specific operation target point.
  4. 4. The method of claim 3, further comprising, prior to the step of combining the environmental true distance observation and the job target point to achieve two dimensions of benefit from physical collision risk and task, respectively, quantifying the security performance and guidance weights of each action of the mobile device in a discrete action space: And simulating the motion track of each motion of the mobile device in a discrete motion space in a preset safety verification step number by using the device outline bounding box size parameter, the laser radar beam quantity and the detection range as constraint conditions and recording the space vertex coordinates occupied by the mobile device under each time step length by using an Ackerman steering model to generate a motion position mapping table.
  5. 5. The method of claim 4, wherein the step of combining the environmental real distance observation and the job target point to achieve two dimensions of benefit from physical collision risk and task, respectively, and quantifying the security performance and guidance weight of each action of the mobile device in a discrete action space comprises: determining the distance improvement amount, the direction improvement amount and the road edge progress value of the welt target point; Traversing each action in the discrete action space, calculating an equipment end point position of the mobile equipment after the action is executed, and respectively comparing the equipment end point position with the distance improvement amount, the direction improvement amount and the path edge progress value to allocate a target weight for each action; invoking a collision detection algorithm, and calculating the space intersection relation between the track vertex coordinates of each step length in the action position mapping table and the environment real distance observation value in the space perception boundary determined by the detection range, so as to determine the movable safety distance of each action before collision; Comparing the magnitude relation between the safety distance and a preset safety threshold value, screening out actions without collision risks, and counting the duration of the actions without collision risks under continuous step length to generate a safety step number vector; And carrying out fusion calculation and normalization processing on the safety step number vector and the target weight to obtain a combined weight.
  6. 6. The method of claim 5, wherein the steps of fusing the security effectiveness with the guidance weights to generate an action mask, and correcting a decision probability distribution of a reinforcement learning model using the action mask, and outputting an execution action comprise: adopting a one-dimensional minimum filter to carry out smooth filtering on the combination weights and outputting an action mask; And applying the action mask to probability distribution constructed by action mean and standard deviation output by the reinforcement learning model, filtering out invalid actions which do not meet the welt requirement or have collision risk, and randomly selecting a final execution action after Softmax normalization calculation.
  7. 7. The method of claim 1, wherein the step of obtaining device outline bounding box size parameters, laser radar beam number, and detection range comprises: Analyzing a standardized description file preset in the mobile equipment, and extracting geometric definition data of the equipment outline border frame to obtain size parameters of the equipment outline border frame; calling a self-description protocol of the laser radar through a communication interface, and reading beam distribution data of the laser radar in a current working mode to obtain the number of laser radar beams; And acquiring configuration parameters of the laser radar, and dynamically extracting the effective range of the laser radar according to the perceived demand of the current working environment to obtain the detection range.
  8. 8. An apparatus control device, comprising: The data acquisition module is used for acquiring the dimension parameters of the outline border frame of the equipment, the number of laser radar beams and the detection range; An environmental true distance observation value determining module, configured to determine an environmental true distance observation value used for characterizing an actual distance between a mobile device and a boundary obstacle based on the device outline bounding box size parameter, the laser radar beam number, and the detection range; the specific operation target point acquisition module is used for extracting the road edge topology information of the boundary barrier from the environment real distance observation value and converting the road edge topology information into a specific operation target point to be tracked by the mobile equipment; The safety performance and guiding weight quantification module is used for respectively achieving two dimensions of benefits from physical collision risks and tasks by combining the environment real distance observation value and the operation target point, and quantifying the safety performance and guiding weight of each action of the mobile equipment in a discrete action space; and the mobile equipment control module is used for fusing the safety performance and the guide weight to generate an action mask, correcting the decision probability distribution of the reinforcement learning model by utilizing the action mask, outputting an execution action, and controlling the mobile equipment based on the execution action.
  9. 9. An electronic mobile device comprising a processor, a memory and a program or instruction stored on the memory and executable on the processor, which program or instruction when executed by the processor implements the method of claims 1-7.
  10. 10. A readable storage medium, characterized in that it stores thereon a program or instructions, which when executed by a processor, implements the method according to claims 1-7.

Description

Equipment control method and device, electronic mobile equipment and storage medium Technical Field The present invention relates to the field of automatic driving control, and in particular, to an apparatus control method, an apparatus control device, an electronic mobile apparatus, and a readable storage medium. Background With the rapid development of mobile robot (such as sweeping robot, unmanned car, etc.) technology, automatic cleaning equipment has been widely used in various indoor and outdoor scenes. Taking a sweeping robot as an example, in the automatic sweeping process, "welt sweeping" is a key index for evaluating the operation quality of equipment. In order to realize accurate welt control, the prior art often adopts a reinforcement learning algorithm to carry out path planning and action decision. The welt cleaning method based on reinforcement learning in the related art still has the following technical problems in practical application: First, the reinforcement learning model of the related art often regards the device as particles or a simplified geometry, and fails to fully consider the physical dimensions of the boundary of the device outline and the real-time constraints of the detection range of the sensor when outputting the motion command. This results in the device being prone to collision risk in the face of complex, curved cursors or obstacles, or being too conservative to achieve the desired welt distance. Secondly, in the edge-attached cleaning scheme of the related art, when the laser radar observation value is processed, only simple obstacle avoidance logic judgment is generally carried out, and it is difficult to accurately extract topology information of the road edge from the scattered and noisy original point cloud. Because of the lack of accurate calculation of the road edge direction and normal line offset, the device cannot generate a stable operation target point, so that the track oscillates in the welting operation process, and a constant welting distance is difficult to maintain. Disclosure of Invention Embodiments of the present invention provide a device control method, apparatus, electronic mobile device, and readable storage medium to overcome or at least partially solve the above-described problems. In order to solve the technical problems, the application is realized as follows: In a first aspect, an embodiment of the present application provides an apparatus control method, including: Acquiring equipment outline boundary frame size parameters, laser radar beam quantity and detection range; Determining an environmental true distance observation value for characterizing an actual distance of the mobile device from the boundary obstacle based on the device outline bounding box size parameter, the laser radar beam number, and the detection range; Extracting the road edge topology information of the boundary barrier from the environment real distance observation value, and converting the road edge topology information into a specific operation target point to be tracked by the mobile equipment; combining the environment real distance observation value and the operation target point, respectively achieving two dimensions of benefits from physical collision risks and tasks, and quantifying the safety performance and the guiding weight of each action of the mobile equipment in a discrete action space; and generating an action mask by fusing the safety performance and the guide weight, correcting the decision probability distribution of the reinforcement learning model by using the action mask, outputting an execution action, and controlling the mobile equipment based on the execution action. Optionally, the step of determining an ambient true range observation value for characterizing an actual distance of the mobile device from the boundary obstacle based on the device outline bounding box size parameter, the laser radar beam number, and the detection range comprises: Calculating geometric intersection points of radar beams of all the laser radars and dimension parameters of the equipment outline boundary box based on the number of the laser radar beams to obtain radar basic offset values; Acquiring an original observed value of the laser radar, and performing superposition operation on the original observed value and the radar base offset value to obtain an original environment true distance observed value; And executing validity check on the initial environment real distance observation value by adopting the detection range, and determining the initial environment real distance observation value as a corrected environment real distance observation value when a check result accords with a preset condition. Optionally, the step of extracting the road edge topology information of the boundary obstacle from the real distance observation value of the environment and converting the road edge topology information into a specific job target point to be tracked by the mob