CN-122008190-A - Multi-scene mechanical arm sensing and controlling method and system based on layered decoupling architecture

CN122008190ACN 122008190 ACN122008190 ACN 122008190ACN-122008190-A

Abstract

The invention discloses a multi-scene mechanical arm sensing and controlling method and system based on a layered decoupling architecture, and relates to the technical field of mechanical arm sensing and controlling. The invention remarkably improves the expandability and compatibility of the system by constructing the layered architecture of the sensing layer, the planning layer and the control layer which are completely decoupled. The invention shields the driving difference of heterogeneous hardware by using a camera abstraction layer and realizes modularized communication among all levels by using standardized data streams. The method means that a developer can randomly replace a higher-precision visual model, access different types of image input sources or expand new task logic on the premise of not modifying the kernel code of the control layer, the secondary development cost and maintenance difficulty of the system are greatly reduced, and the problems that the existing mechanical arm system is high in software and hardware coupling degree and difficult to adapt to the requirement of rapid iteration are solved.

Inventors

XU JINYANG
HOU YUNLONG
WANG FEI
CAO DONGGANG

Assignees

杭州希秀泛在计算技术有限公司

Dates

Publication Date: 20260512
Application Date: 20260109

Claims (10)

1. The multi-scene mechanical arm sensing and controlling method based on the layered decoupling architecture is characterized by comprising the following steps of: S10, establishing communication channels among layers after the system is started, initializing a visual sensor by a sensing layer through a camera abstract layer, acquiring environment image or video stream data, packaging the environment image or video stream data into a standardized sensing data stream, and transmitting the standardized sensing data stream to a planning layer; s20, generating multi-scene algorithm reasoning and decision, wherein a planning layer receives the perception data stream, and invokes a corresponding vision and decision algorithm module according to a currently set task scene to acquire feature information or decision results of a target on an image plane; S30, space mapping and standardized pose resolving, wherein a planning layer inputs the target characteristic information or the decision result obtained in the step S20 into a two-dimensional calibration adapter, coordinate transformation is carried out by utilizing a homography matrix calibrated in advance, a pixel coordinate system of an image plane is mapped to a base physical coordinate system of a mechanical arm, standardized pose data comprising space position coordinates and terminal pose angles is generated, and the data is sent to a control layer; And S40, the control layer receives the standardized pose data and generates an action instruction, and the weighted inverse kinematics solver distributes corresponding position weights and pose weights according to different requirements of the current task on the position precision and the pose precision, solves the joint angle combination of the mechanical arm, and drives the mechanical arm to execute the action after track smooth interpolation processing.
2. The method for sensing and controlling the multi-scene mechanical arm based on the layered decoupling structure according to claim 1, wherein in step S10, the sensing layer is provided with a camera abstract interface, the interface encapsulates the bottom drivers of different image acquisition devices, dynamically binds specific driver implementation classes by reading fields in a configuration file, and provides a unified data acquisition method to the outside, so as to realize switching between local camera input and network video stream input.
3. The multi-scenario robotic arm sensing and control method based on hierarchical decoupling architecture of claim 1, wherein in step S20, when in an item classification or grabbing scenario, the visual algorithm module invoked by the planning layer is a rotation target detection model; the prediction result output by the rotating target detection model comprises a center point pixel coordinate, a length and width dimension and a rotation angle relative to an image coordinate system of a target; In step S30, the planning layer transforms and maps the rotation angle into a rotation attitude angle of the end effector of the mechanical arm through a coordinate system, so as to realize adaptive grabbing of the object placed at any angle.
4. The multi-scenario robotic arm sensing and control method based on a hierarchical decoupling architecture of claim 1, wherein the rotating object detection model employs YOLOv-OBB network or variants thereof.
5. The multi-scenario robotic arm sensing and control method based on hierarchical decoupling architecture of claim 1, wherein in step S20, the planning layer starts a large model adapter when in a semantic interaction scenario; The large model adapter receives natural language instructions of users, invokes a local or cloud multi-mode large language model, carries out reference expression understanding or open set target detection by combining the sensing data flow, and outputs a boundary frame or a center coordinate of a target object in an image.
6. The multi-scene mechanical arm sensing and controlling method based on the layered decoupling architecture according to claim 1, wherein in step S30, the homography matrix is a 3 x 3 matrix solved by a least square method by collecting pixel point coordinates of a plurality of pairs of feature points in an image and corresponding point coordinates in a physical space; The specific calculation process of the coordinate transformation is that the pixel coordinates output by the visual algorithm are linearly transformed into physical coordinates through matrix multiplication, and a preset height offset is overlapped in the Z-axis direction.
7. The multi-scenario robotic arm sensing and control method based on hierarchical decoupling architecture of claim 1, wherein in step S40, the weighted inverse kinematics solver employs a sequential least squares programming algorithm; When the target object is positioned at the edge of the working space of the mechanical arm or the risk of kinematic singular points exists, the solver calculates the joint angle which preferentially ensures that the tail end of the mechanical arm reaches the target position by reducing the gesture weight and improving the position weight.
8. A multi-scenario robotic arm sensing and control system based on a hierarchical decoupling architecture for implementing the method of any one of claims 1-7, the system comprising: The perception subsystem is used for establishing communication connection with an external vision sensor, shielding the bottom hardware difference through a camera abstract interface, and collecting and outputting standardized image or video stream data in real time; The planning subsystem is used for receiving the data output by the perception subsystem, loading and operating a vision and decision algorithm module according to a configuration file, and mapping pixel characteristics output by the algorithm into standardized pose data under a physical coordinate system by utilizing a two-dimensional calibration adapter, wherein the vision and decision algorithm module comprises a rotating target detection model, a large model adapter and a game decision model; And the control subsystem is used for receiving the standardized pose data, calculating the joint angle by using a weighted inverse kinematics solver, and generating a stable motion instruction through a track smooth interpolation module to drive the mechanical arm to execute actions.
9. The multi-scenario robotic arm sensing and control system based on a hierarchical decoupling architecture of claim 8, wherein the planning subsystem comprises a configuration management module, a multi-scenario algorithm integration module, and a two-dimensional calibration adaptation module; the configuration management module is used for analyzing the global configuration file, extracting task scene parameters and algorithm weight paths and scheduling corresponding algorithm resources; The multi-scenario algorithm integration module integrates a YOLO rotation detection unit, a LLM semantic analysis unit and a game decision unit, and activates a specific unit according to an instruction of the configuration management module; The two-dimensional calibration adaptation module stores a pre-calibrated homography matrix for performing linear transformation from a pixel coordinate system to a physical coordinate system of the mechanical arm base.
10. The multi-scene manipulator perception and control system based on layered decoupling architecture of claim 8, wherein the trajectory smoothing interpolation module in the control subsystem is configured to receive discrete joint angles obtained by inverse kinematics solution, generate a series of smooth transition intermediate control points between the current angle and the target angle using linear interpolation or S-type velocity planning algorithm, and issue the intermediate control points to the underlying drive.

Description

Multi-scene mechanical arm sensing and controlling method and system based on layered decoupling architecture Technical Field The invention relates to the technical field of mechanical arm sensing and control, in particular to a multi-scene mechanical arm sensing and control method and system based on a layered decoupling architecture. Background With the rapid development of artificial intelligence and body intelligence technologies, the mechanical arm vision grabbing system is widely applied to a plurality of fields such as industrial sorting, home service, logistics storage and education and scientific research. The conventional mechanical arm control system usually realizes grabbing through a fixed track programmed or preset by a demonstrator, but when facing unstructured environments (such as random object placement positions and different postures), autonomous positioning and grabbing are often needed to be realized by combining a computer vision technology. Although the existing mechanical arm vision grabbing technology has advanced to some extent, in the practical application and secondary development process, the following significant technical defects and the problems to be solved still exist: Most existing robotic vision systems typically employ a "chimney" or "unibody" development mode. That is, visual perception algorithms (e.g., object detection), business decision logic (e.g., grab order), and underlying motion control codes (e.g., motor drive) are tightly interleaved together, with severe Hard-coding (Hard-coding) phenomena. Such a tightly coupled architecture results in a system that is extremely difficult to maintain and expand. Firstly, hardware replacement is difficult, compatibility is poor, if a user needs to replace a local USB camera with a network camera (RTSP stream) or replace a mechanical arm body, the whole link code of the lower layer driver to the upper layer logic is often modified, and the workload is huge. Second, algorithm upgrades are limited, and when it is desired to upgrade a visual model from an old version (e.g., YOLOv) to a new version, or to replace a generic detection algorithm with a specific grab detection algorithm (e.g., graspNet), the developer must reconstruct a large amount of data processing logic due to the lack of a uniform data interface standard. Meanwhile, the task reusability is poor, for example, codes developed aiming at the 'color sorting' scene cannot be directly used for the 'chess playing' scene or the 'semantic grabbing' scene, so that the development efficiency is low, and the code reusability is extremely low. In the field of object detection, in the prior art, a horizontal rectangular frame (Axis-Aligned Bounding Box, AABB) is mostly used for the visual grabbing task to position an object. However, in practical situations, objects (such as screws, long blocks, pens, etc.) are often placed at any angle. The use of an AABB box results in a detection region that contains a significant amount of background redundancy and that does not accurately acquire the rotation angle (Yaw angle) of the object. When the mechanical arm is used for grabbing the objects, the mechanical arm can only grab the objects at a fixed angle, and the grabbing slipping or collision with surrounding objects is easy to occur. In addition, the existing multi-camera system often depends on complex 3D hand-eye calibration, is large in calculated amount and complex in deployment, and is difficult to meet the requirements of light weight and quick deployment. In terms of robotic arm control, conventional robotic arm inverse kinematics (INVERSE KINEMATICS, IK) solution algorithms are typically mathematical solutions based on rigid constraints. The goal is to find a solution that perfectly satisfies both the target position (x, y, z) and the target pose (roll, pitch, yaw). However, in practical multi-scenario applications, there is often a "soft constraint" requirement. For example, when the target object is located at the edge of the robot arm working space, the theoretical perfect pose may not be reachable (without solution), but the gripping may be accomplished by actually tilting the jaws only slightly, or in some handling tasks, the position accuracy requirement is high and the pose accuracy requirement is low. The existing control system has insufficient inverse kinematics solution flexibility, and can not flexibly adjust the priority of the position and the gesture according to task requirements under specific working conditions (such as the edge of a working space or the vicinity of singular points), so that control failure or unsmooth action is easy to cause. The lack of the dynamic adjustment mechanism based on the weight causes the mechanical arm to easily generate planning failure or movement shake near the limit position or the singular point, thereby greatly limiting the effective working range of the mechanical arm. Furthermore, conventional robotic arm systems are