CN-121982072-A - Lightweight model-based forklift pallet tracking method, system, equipment and medium

CN121982072ACN 121982072 ACN121982072 ACN 121982072ACN-121982072-A

Abstract

The invention relates to the technical field of artificial intelligence, and particularly provides a forklift pallet tracking method, system, equipment and medium based on a lightweight model. The YOLOv algorithm is improved at three points, a brand new StarNet backbone network is introduced, a high-dimensional implicit characteristic space is built through star operation, characteristic expression capacity is improved while the quantity of parameters is reduced, a dynamic mixed convolution module is designed on a neck network, multi-scale characteristics are adaptively extracted through multi-branch deep convolution and a dynamic weight fusion mechanism, flexibility of the model is improved, a lightweight rotary detection head is provided, a group normalization and sharing convolution structure is applied, the quantity of parameters is reduced, stability of small-batch training is improved, and meanwhile the rotation angle of a tray is efficiently predicted. Finally, the improved detection model is combined with ByteTrack tracking algorithm of the optimized adaptive rotating frame to form a complete identification tracking system.

Inventors

Ge Aidong
LI SHUCHANG
ZHANG CHENGYU
SUN MINGLIN
ZHAO LUNING
QI HAOXIANG
FENG YU
Hou Qiuyu
LIU YUEYAO
MAO JIANYU
GUO ZHONGYUAN

Assignees

齐鲁工业大学(山东省科学院)

Dates

Publication Date: 20260505
Application Date: 20260409

Claims (10)

1. A forklift pallet tracking method based on a lightweight model is characterized by comprising the following steps: acquiring image data containing a forklift pallet target, and marking the pallet target with a directed bounding box to construct a standard pallet rotation target data set; constructing a target detection model based on YOLOv framework, wherein the target detection model comprises a backbone network, a neck network and a detection head network which are sequentially connected in series, the backbone network adopts a Starnet network to replace an original backbone network, the neck network adopts a dynamic mixed convolution module C3K2-DH to replace an original C3K2 module, and the detection head network adopts a lightweight rotary detection head LOBB; inputting the preprocessed target data set into the target detection model for training to obtain a trained target detection model; inputting the forklift operation image acquired in real time into a trained target detection model, and outputting the category, positioning coordinates and confidence information of the detection result tray; and inputting the detection result into a modified ByteTrack algorithm, and carrying out multi-target tracking on the tray target.
2. The method of claim 1, wherein acquiring image data containing pallet targets of a forklift and labeling pallet targets with a directed bounding box to construct a standard pallet rotation target dataset comprises: collecting real shot pictures of a standard tray of a warehouse factory by adopting a smart phone or a forklift depth camera; Carrying out directed bounding box labeling on a target by utilizing a roLabelImg tool, and introducing a rotation angle parameter; dividing the data set into a training set, a verification set and a test set according to the proportion of 8:1:1; And carrying out size unification and enhancement treatment on all the images.
3. The method of claim 1, wherein the Starnet network is a backbone network comprising 4 stages, each stage comprising a convolutional downsampling layer stacked with a plurality of Star Blocks, wherein the Star Blocks implement implicit high-dimensional mapping by element-by-element multiplication through a fully connected layer FC of branching features, and wherein the amount of computation is reduced in combination with depth separable convolutions DWConv.
4. The method of claim 1, wherein the neck network receives a multi-scale feature map from a backbone network, wherein the dynamic hybrid convolution module C3K2-DH is located in the neck network and is used for feature fusion and extraction of an input feature map, and the input feature map is derived from the multi-scale feature map output by the backbone network or a higher-level feature layer of the neck network; the processing procedure of the dynamic mixed convolution module C3K2-DH comprises the following steps: inputting the input feature map to a dynamic hybrid convolution Dynamic HybridConv d; the input feature map generates dynamic routing weights through a dynamic path selector; Distributing the feature map to three parallel convolution branches for processing according to the dynamic routing weight, wherein the three parallel convolution branches comprise a standard 3x3 convolution branch, a depth separable convolution branch and a dynamic convolution branch; And carrying out gate control fusion on the processing results output by each convolution branch through the self-adaptive feature fusion unit, carrying out residual error addition on the processing results and the input through the quick connection, and outputting the processing results to the next stage of the neck network or a detection head.
5. The method of claim 1, wherein the lightweight rotary sensing head LOBB replaces batch normalized BN with a shared convolution Group norm and Group normalization is used in rotation angle OBB tributary computation while scaling features with Scale layers to accommodate different Scale targets.
6. The method of claim 1, wherein inputting the real-time acquired forklift operation image into the trained target detection model, and outputting the category, the positioning coordinates and the confidence information of the detection result tray, comprises: inputting a forklift operation image acquired in real time into the Starnet backbone network, and extracting multi-scale semantic features of the forklift operation image through element-by-element multiplication and depth separable convolution in Star Blocks; Inputting the multi-scale semantic features into the C3K2-DH neck network, and carrying out self-adaptive fusion and enhancement on the multi-scale semantic features by utilizing a dynamic path selector and a parallel multi-branch structure in a dynamic mixed convolution module to obtain a fused feature map; and inputting the fused feature map to the LOBB detection head, classifying and regressing the features by utilizing shared convolution and group normalization, and outputting the directed bounding box coordinates containing the rotation angle of the tray, the class probability and the confidence information.
7. The method of claim 1, wherein the method of improving the ByteTrack algorithm comprises: expanding the 8-dimensional state of the native ByteTrack to a 10-dimensional state to support angle dynamics, modifying the state transition matrix to a 10 x 10 identity matrix; In the cost matrix calculation, a rotation frame intersection ratio R-IoU is adopted to replace the traditional horizontal frame IoU, and the angle difference weights are fused; And predicting the track by using Kalman filtering, and respectively carrying out two-stage data association on the high-confidence detection frame and the low-confidence detection frame by using a Hungary algorithm.
8. A lightweight model-based pallet tracking system for a forklift, comprising: the training labeling module is used for acquiring image data containing a forklift pallet target, and labeling a oriented bounding box on the pallet target so as to construct a standard pallet rotating target data set; The model construction module is used for constructing a target detection model based on YOLOv framework, wherein the target detection model comprises a backbone network, a neck network and a detection head network which are sequentially connected in series, the backbone network adopts a Starnet network to replace an original backbone network, the neck network adopts a dynamic mixed convolution module C3K2-DH to replace an original C3K2 module, and the detection head network adopts a light-weight rotary detection head LOBB; The model training module is used for inputting the preprocessed target data set into the target detection model for training to obtain a trained target detection model; The real-time identification module is used for inputting the forklift operation image acquired in real time into the trained target detection model and outputting the category, the positioning coordinates and the confidence information of the detection result tray; And the target tracking module is used for inputting the detection result into the improved ByteTrack algorithm and carrying out multi-target tracking on the tray target.
9. An apparatus, comprising: The storage is used for storing a forklift pallet tracking program based on the lightweight model; A processor for implementing the steps of the lightweight model-based forklift pallet tracking method according to any one of claims 1 to 7 when executing the lightweight model-based forklift pallet tracking program.
10. A computer readable storage medium storing a computer program, wherein the readable storage medium stores a lightweight model-based forklift pallet tracking program, and the lightweight model-based forklift pallet tracking program when executed by a processor implements the steps of the lightweight model-based forklift pallet tracking method according to any one of claims 1 to 7.

Description

Lightweight model-based forklift pallet tracking method, system, equipment and medium Technical Field The invention belongs to the technical field of computer vision, and particularly relates to a forklift pallet tracking method, system, equipment and medium based on a lightweight model. Background In recent years, with the rapid development of industries such as intelligent manufacturing and intelligent logistics, the automation and intellectualization level of warehouse logistics has become a key to improving the operation efficiency and core competitiveness of enterprises. In logistics hubs such as storage, wharfs and production lines, forklifts are used as the most main material handling tools, and the operation intelligent upgrading demands are increasingly urgent. The method is used for quickly and accurately identifying and stably tracking standard trays (such as European standard trays, american standard trays and the like), is a basic premise for realizing advanced functions of automatic guiding, unmanned carrying, intelligent loading and unloading and the like of a forklift, and has important significance for reducing labor cost, improving carrying precision and efficiency and guaranteeing operation safety. The traditional forklift operation mode based on manual driving is highly dependent on experience and visual judgment of operators, is easy to cause inaccurate positioning and increased collision risk due to factors such as fatigue, sight shielding and the like in a complex and dynamic storage environment, and is difficult to meet the requirement of 7x 24-hour continuous operation. Early automatic schemes mostly adopt methods based on auxiliary identifications such as fixed scanners, two-dimensional codes or RFID (radio frequency identification) to position the trays. Although the method improves the automation level to a certain extent, specific physical marks are required to be deployed in advance, the flexibility is poor, the transformation and maintenance costs are high, and the method cannot cope with the actual situations such as mark fouling, falling or environmental layout change. With the progress of computer vision technology, especially the breakthrough development of deep learning, vision-based object recognition and tracking methods provide a new technical path for label-free, high-flexibility tray perception. In the prior art, there have been attempts to identify and track pallets using general purpose target detection models (e.g., YOLO, fast R-CNN series) or target tracking algorithms. However, when these advanced algorithms are directly deployed in a forklift embedded system, a serious challenge is faced, and the mainstream high-precision deep learning model is generally large in parameter and high in computation complexity, and the computing resources, memory and power consumption of a forklift on-board computing platform (such as an embedded GPU or a high-performance edge computing device) are limited. The real-time requirement of high frame rate (usually more than or equal to 10 FPS) for image processing in the moving process of the forklift is difficult to meet when a large model is operated, system delay is easy to cause, and timeliness and safety of control response are affected. The storage field environment is complex and changeable, and has interferences such as uneven illumination (such as strong light and shadow), weather influence (indoor and outdoor transition areas), tray stacking, partial shielding, abrasion in appearance, color and texture close to that of a background object and the like. Under such complex scenes, the recognition accuracy and tracking stability of the universal model are liable to be obviously reduced. Standard pallets have relatively fixed dimensions, structures (e.g., specific prong shapes, wood strips, or plastic textures) and color features, but existing solutions mostly employ a general target detection framework, which is undermined and exploited with these a priori knowledge and structural characteristics, limiting further space in terms of accuracy and efficiency. Many studies now focus on performance improvement of the algorithm itself, lacking overall consideration of the complete technology chain from lightweight design of the model, optimized deployment to seamless integration with the forklift control system. A system suitable for practical forklift applications must achieve an optimal balance of accuracy, speed, resource consumption and robustness. Therefore, developing a standard tray identification and tracking method which is specially aimed at forklift application scenes, is based on deep learning and has the advantages of being lightweight, high in efficiency and high in robustness becomes a key technical requirement for pushing intelligent forklift technologies to land and solving industry pain points. The method aims at realizing rapid and accurate identification and stable continuous tracking of the standard pallet in a c