CN-121982442-A - Ball-transducer model training method and sphere target tracking method
Abstract
The invention provides a Ball-transform model training method and a sphere target tracking method, the method comprises the steps of adopting a target detection model to process each frame of image of a video sequence to obtain a single frame detection frame matrix; the method comprises the steps of constructing a historical track sequence through a sliding window strategy, obtaining a current frame candidate frame set, generating a prediction result through three decision heads of a Ball-transform model, updating a historical track buffer area based on the sliding window strategy, adding a current frame matching result into the buffer area, removing a historical frame exceeding the window length, executing the following decision, judging that a target is disappeared if the existence probability is lower than an existence threshold, otherwise, selecting a candidate frame with the highest matching probability and the rationality score meeting a set threshold from the candidate frame set as a current frame tracking result, outputting the position coordinate and boundary frame information of the target in the current frame, and guaranteeing that the sphere target tracking is accurate.
Inventors
- LU YUXI
- WANG HONGBIN
- HU CANFENG
- WANG RONG
- ZHANG ZHUMING
Assignees
- 恒鸿达(福建)体育科技有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20251210
Claims (10)
- 1. The Ball-transducer model training method is characterized by comprising the following steps of: step 1, acquiring a video sequence data set containing real sphere motion trail labels, processing each frame of training video by using a target detection model, extracting each frame of detection frame, and normalizing the coordinates of the detection frames; Step 2, constructing a historical track sequence and a current frame candidate frame set for each training video, and labeling the following supervision information: Marking the real matching relation between each candidate frame of the current frame and the history track, wherein the positive sample label is 1, and the negative sample label is 0; The existence label is used for marking whether a sphere exists in a scene in the current frame, the existence label is 1, and the nonexistence label is 0; the track rationality label is used for marking whether the track is rational or not based on the physical movement rule; Step 3, extracting a historical track sequence and multidimensional motion characteristics of a current candidate frame through a characteristic encoder of a Ball-transform model, wherein the multidimensional motion characteristics comprise geometric coordinate characteristics, speed characteristics and acceleration characteristics; inputting the encoded characteristics into a transducer encoder of a Ball-transducer model, wherein the encoder carries out track sequence characterization learning based on a self-attention mechanism, and injecting time sequence information through a position encoding module; based on the output of the transducer encoder, the prediction results are generated by three decision heads of the Ball-transducer model: A matching degree calculating head for fusing the history track characterization vector with each candidate frame characteristic of the current frame and generating a matching probability score of each candidate frame through a multi-layer perceptron structure; The existence judging head is used for judging whether the sphere exists in the scene or not through the two classification layers based on the semantic characterization vector of the current frame and outputting the existence probability; track rationality head, based on the history track representation vector, evaluating whether the track accords with the physical motion rule, and outputting rationality score; and 4, optimizing by adopting a multi-task loss function, wherein the total loss function is as follows: ; Wherein, the For matching loss, a weighted binary cross entropy loss function is adopted, L velocity is speed consistency loss, L acceleration is acceleration smoothing loss, L existence is existence judgment loss, L continuity is track continuity loss, and lambda match 、λ motion 、λ existence 、λ continuity is a corresponding weight coefficient; Step 5, calculating the gradient of the loss function to the parameters of the Ball-transform model through a back propagation algorithm, and updating the parameters of the Ball-transform model by using an optimizer; And step 6, evaluating tracking accuracy and recall index of the Ball-transducer model on the verification set, and storing model parameters with optimal performance on the verification set to obtain the trained Ball-transducer model.
- 2. The method for training a Ball-Transformer model according to claim 1, wherein the calculation formula of the matching loss L match is: ; Wherein y i is a true matching tag, ŷ i is a predicted matching probability, and α is a class balance weight.
- 3. The method for training a Ball-transducer model according to claim 1, wherein the calculation formulas of the velocity consistency loss L velocity and the acceleration smoothing loss L acceleration are respectively: ; ; Wherein p t is the position coordinate at time T, v t is the instantaneous speed at time T, and T is the sequence length.
- 4. The method for training a Ball-transducer model according to claim 1, wherein the calculation formula of the trace continuity loss L continuity is: ; where σ is a hyper-parameter that controls the severity of the continuity constraint.
- 5. A method for tracking a sphere target based on a transducer model, characterized in that the method adopts the transducer model according to any one of claims 1 to 4, and specifically comprises the following steps: s1, processing each frame of image of a video sequence by adopting a target detection model to obtain a single-frame detection frame matrix Wherein i is the number of spheres detected by a single frame, and each detection frame comprises a center coordinate (x, y) and a boundary frame size (w, h); s2, constructing a history track sequence through a sliding window strategy Wherein Acquiring a candidate frame set of the current frame N is the number of candidate targets detected by the current frame; S3, extracting a historical track sequence and multidimensional motion characteristics of the current candidate frame through a characteristic encoder of a Ball-transform model, wherein the multidimensional motion characteristics comprise geometric coordinate characteristics, speed characteristics and acceleration characteristics; inputting the encoded characteristics into a transducer encoder of a Ball-transducer model, wherein the encoder carries out track sequence characterization learning based on a self-attention mechanism, and injecting time sequence information through a position encoding module; s4, generating a prediction result through three decision heads of a Ball-transducer model based on the output of the transducer encoder: A matching degree calculating head for fusing the history track characterization vector with each candidate frame characteristic of the current frame and generating a matching probability score of each candidate frame through a multi-layer perceptron structure; The existence judging head is used for judging whether the sphere exists in the scene or not through the two classification layers based on the semantic characterization vector of the current frame and outputting the existence probability; track rationality head, based on the history track representation vector, evaluating whether the track accords with the physical motion rule, and outputting rationality score; s5, updating a historical track buffer area based on a sliding window strategy, adding a current frame matching result into the buffer area, and removing historical frames exceeding the window length; combining the matching probability score, the existence probability and the rationality score, and executing the following decision: If the existence probability is lower than the existence threshold value, judging that the target is disappeared; Otherwise, selecting a candidate frame with highest matching probability and a rationality score meeting a set threshold from the candidate frame set as a current frame tracking result; and outputting the position coordinates of the target in the current frame and the boundary frame information.
- 6. The method for tracking a sphere object based on a transducer model according to claim 5, wherein the object detection model in the step S1 is YOLOv model 11; in the step S2, a history track sequence with a fixed length is maintained through a sliding window strategy, so that the continuity of time sequence information is ensured.
- 7. The method for tracking a spherical object based on a transducer model according to claim 5, wherein in the step S3, the velocity characteristic is obtained by the coordinate difference of adjacent frames, the acceleration characteristic is obtained by the velocity change rate, and the calculation formulas are respectively: Speed v t =p t -p t-1 ; Acceleration a t =v t -v t-1 ; wherein p t is the position coordinate of time t, v t is the speed of time t; the geometric coordinate features include a center coordinate of the detection frame and a bounding frame size.
- 8. The method according to claim 5, wherein in the step S3, the transducer encoder includes a motion pattern attention mechanism, which uses a motion state vector at the latest moment as a query vector, uses a feature vector of a history track sequence as a key pair, and calculates and searches for a motion pattern related to a current motion trend in the history track through an attention weight.
- 9. The method for tracking a sphere target based on a transducer model according to claim 5, wherein in the step S4, the matching degree calculation head adopts a multi-layer perceptron structure, and combines the historical track characterization with the candidate frame feature to generate a matching probability score.
- 10. The method for tracking a sphere object based on a transducer model according to claim 5, wherein in the step S2, the sliding window length k has a value ranging from 5 to 30 frames.
Description
Ball-transducer model training method and sphere target tracking method Technical Field The invention relates to the technical field of image processing, in particular to a Ball-transform model training method and a sphere target tracking method. Background Under the background of AI development, the AI algorithm is used for realizing the calculation of the results of table tennis, badminton, volleyball, football, basketball and the like and the comprehensive performance evaluation, so that the development trend is realized. In this type of focus on the movement of the sphere, the score calculation improvement of the learner is important. The existing ball score calculation often relies on simple tracking based on target detection results, so that score judgment is ensured according to the movement track of the ball. The existing method comprises tracking based on a continuous frame detection frame IOU, judging based on characteristic similarity of an image target, judging based on modes such as Kalman filtering of a motion track, judging based on Hungary matching and the like. IOU-based tracking is susceptible to fast motion failure, occlusion, and depends on detection quality. The method based on the target feature similarity is large in calculated amount, and is simple for the sphere, and the objects with large numbers are invalid. The Kalman filtering-based method motion model is assumed to be too ideal to handle long-term occlusion and abrupt maneuver. Disclosure of Invention The invention aims to solve the technical problem of providing a Ball-transform model training method and a Ball target tracking method, and the Ball target tracking accuracy is guaranteed. In a first aspect, the present invention provides a method for training a Ball-transducer model, comprising the steps of: step 1, acquiring a video sequence data set containing real sphere motion trail labels, processing each frame of training video by using a target detection model, extracting each frame of detection frame, and normalizing the coordinates of the detection frames; Step 2, constructing a historical track sequence and a current frame candidate frame set for each training video, and labeling the following supervision information: Marking the real matching relation between each candidate frame of the current frame and the history track, wherein the positive sample label is 1, and the negative sample label is 0; The existence label is used for marking whether a sphere exists in a scene in the current frame, the existence label is 1, and the nonexistence label is 0; the track rationality label is used for marking whether the track is rational or not based on the physical movement rule; Step 3, extracting a historical track sequence and multidimensional motion characteristics of a current candidate frame through a characteristic encoder of a Ball-transform model, wherein the multidimensional motion characteristics comprise geometric coordinate characteristics, speed characteristics and acceleration characteristics; inputting the encoded characteristics into a transducer encoder of a Ball-transducer model, wherein the encoder carries out track sequence characterization learning based on a self-attention mechanism, and injecting time sequence information through a position encoding module; based on the output of the transducer encoder, the prediction results are generated by three decision heads of the Ball-transducer model: A matching degree calculating head for fusing the history track characterization vector with each candidate frame characteristic of the current frame and generating a matching probability score of each candidate frame through a multi-layer perceptron structure; The existence judging head is used for judging whether the sphere exists in the scene or not through the two classification layers based on the semantic characterization vector of the current frame and outputting the existence probability; track rationality head, based on the history track representation vector, evaluating whether the track accords with the physical motion rule, and outputting rationality score; and 4, optimizing by adopting a multi-task loss function, wherein the total loss function is as follows: ; Wherein L match is a matching loss, a weighted binary cross entropy loss function is adopted, L velocity is a speed consistency loss, L acceleration is an acceleration smoothing loss, L existence is a existence judgment loss, L continuity is a track continuity loss, and lambda match、λmotion、λexistence、λcontinuity is a corresponding weight coefficient; Step 5, calculating the gradient of the loss function to the parameters of the Ball-transform model through a back propagation algorithm, and updating the parameters of the Ball-transform model by using an optimizer; And step 6, evaluating tracking accuracy and recall index of the Ball-transducer model on the verification set, and storing model parameters with optimal performance on the verification