CN-121366169-B - Image target detection method based on intelligent traffic system
Abstract
The invention discloses an image target detection method based on an intelligent traffic system, which comprises the steps of obtaining an original image data set from the intelligent traffic system, preprocessing the data set to generate a feature image to be processed, constructing a Trifuse module with feature extraction, scale perception and context information capture, embedding the Trifuse module into a YOLOv5s network structure to obtain an image target detection method model, receiving the feature image to be processed by the Trifuse module, firstly inputting the feature image to be processed into a CBR module to generate a processed feature image and generating enhanced local features through local paths, dividing the enhanced local features into three branches, namely, preserving original shallow information, transmitting the three branches into a Nested-Stream Nested Stream, processing the three branches through a path based on cavity convolution, and inputting the processed feature image into the image target detection method model to obtain a result data set.
Inventors
- ZHANG XU
- YU XUFENG
- FENG JIAHAO
- WANG JIACHENG
- ZHENG LI
- CHEN WEISI
- XIAO WEIDONG
Assignees
- 厦门理工学院
Dates
- Publication Date
- 20260512
- Application Date
- 20251223
Claims (7)
- 1. An image target detection method based on an intelligent transportation system is characterized by comprising the following steps: Acquiring an original image data set from an intelligent traffic system, preprocessing the data set, and generating a feature map to be processed; The method comprises the steps of constructing Trifuse modules with feature extraction, scale perception and context information capture, embedding a Trifuse module into a YOLOv s network structure to obtain an image target detection method model, firstly inputting the Trifuse modules to a CBR module to generate a processed feature map and generate enhanced local features through local paths, wherein the CBR module consists of a two-dimensional convolution layer without bias terms, a batch normalization layer and a nonlinear activation function, the enhanced local features are divided into three branches, one branch retains original shallow information, the other branch is transmitted into a Nested-Stream Nested Stream, the other branch is processed through a path based on hole convolution, the path based on hole convolution adopts a mixed parallel-cascade hole structure and comprises three hole convolutions, the hole rate of the three hole convolutions is selected by an adaptive snake-shaped expansion mechanism, and the specific formulas of the hole rates are 3, 6 and 9 are expressed as follows: ; initial input data representing a hole convolution path, 、 And Respectively represent the dimension as The input channel of each convolution kernel is C, the output channel is C, the convolution kernel size is 3, 、 And Respectively, represent offset parameters added to the convolution operation, d represents different void fractions, An output characteristic diagram of the 1 st hole convolution layer is shown, An output characteristic diagram of the 2 nd hole convolution layer is shown, Representing the output characteristic diagram of the 3 rd hole convolution layer, BN representing batch normalization, Representing an activation function; The output of all branches is transmitted into a global path for activation, the global path captures the semantic dependence of a long distance, global context is projected into a space domain again by using global average pooling and bilinear upsampling; and inputting the processed feature map into an image target detection method model to obtain a result data set.
- 2. The method of claim 1, wherein the local path uses a light-weight 1 x1 convolution followed by a mixed convolution of 3 x 3 convolutions.
- 3. The image object detection method of claim 1, wherein the specific formula of the adaptive serpentine expansion mechanism is expressed as follows: ; Wherein, the Is a feature map of the input and is displayed, Is a convolution operation for generating a characteristic image from an input The spatial information is extracted from the data, For the activation function, for increasing the nonlinearity; The selection of the set of void fractions D is predicted by the following formula: ; wherein FC is a fully connected layer for convolving the convolved feature map Mapping to a void rate set D, wherein the D comprises different void rate sets; ; The learning module dynamically selects a void rate set D according to the image content, and Select represents a selection operation; ; Wherein, the And Respectively the minimum value and the maximum value in the cavity rate set D; ; in the first layer convolution, the void fraction Selected as ; ; In the second layer convolution, void fraction Is based on the first layer void fraction And ensure Greater than While being smaller than the third layer void fraction ; ; In the third layer convolution, void fraction Selected as Is the maximum void fraction in set D.
- 4. The image object detection method according to claim 1, wherein the specific calculation flow of the global path includes: The channels are subjected to linear transformation through 1×1 convolution to obtain preliminary compression characteristics, and the global context expression is extracted by adopting adaptive global average pooling, and the specific formula is as follows: ; Wherein, the Representing a1 x 1 convolved weight matrix, ∈ The input channel is C, the output channel is C, Representing an average pooling of the entire spatial dimension, A two-dimensional convolution operation is represented, Representing bias terms, X representing the feature map of the input, X ε B represents batch size, C represents channel, H represents height, W represents width, Representing global features, which are output after 1×1 convolution and GAP, with dimensions of 。
- 5. The method for detecting an image object according to claim 4, wherein the adaptive global averaging pooling extraction global context represents that channel reconstruction branches are enabled, and channel expression capacity is enhanced through a full connection layer, and a specific formula is as follows: ; Wherein, the The global characteristics of the input are represented, Representing the full-connection layer weight matrix, Representing the bias term of the fully connected layer, The reconstruction is represented by a representation of the reconstruction, Representing the characteristics of the enhanced channel, 。
- 6. The method of claim 5, wherein the enhanced channel features are remapped to the original spatial dimensions H x W by bilinear upsampling as follows: ; Wherein, the Representing a bilinear interpolation up-sampling, Representing the characteristics of the enhanced channel, The up-sampled feature map is shown.
- 7. A computer program product, characterized in that it has stored thereon a computer program which, when executed by a processor, implements the method according to any of claims 1-6.
Description
Image target detection method based on intelligent traffic system Technical Field The invention relates to a target recognition technology in an intelligent traffic system, in particular to a multi-scale image target detection method applied to the public traffic field, which can effectively improve the target recognition precision based on big data and an intelligent technology, and particularly detect small and shielding targets. Background With the rapid development of intelligent traffic systems, the demand for target detection technology in the public transportation field is increasing. In particular, in a complex traffic environment, the conventional target detection method faces the problem of insufficient recognition accuracy of small targets, and particularly has difficulty in recognition of small vehicles, pedestrians, traffic signs and the like. Real-time data analysis based on big data can effectively optimize the traffic management system and improve the safety and efficiency of public transportation. In automatic driving and intelligent traffic systems, detection of small targets (such as pedestrians and vehicles at a distance) has been a challenge, and it is difficult for conventional target detection methods to obtain a good detection result in a complex background. Small objects typically occupy a small area in the image and background information may interfere with the detection results. In addition, due to the dynamic nature and complexity of the traffic environment, the dimensional change and shielding problems of the targets are more demanding on the existing method. In order to solve these problems, in recent years, many multi-scale image object detection methods have been developed. For example, FPN (feature pyramid network) extracts feature graphs with different scales from top to bottom, and PANet (path aggregation network) enhances the utilization of low-level features by introducing a bottom-up path, thereby improving the accuracy of small target detection. Although these methods solve the problem of multi-scale feature fusion to some extent, the real-time processing capability is still limited due to the complex structure and high calculation overhead. The Trifuse module is used as a new multi-scale image target detection method, and can effectively fuse information of different scales through local, nested and global paths. Compared with the existing multi-scale fusion structure (such as FPN and BiFPN), the Trifuse can remarkably improve the detection efficiency while maintaining high precision by introducing an adaptive path selection mechanism. The Trifuse module provided by the invention not only enhances the detection capability of small targets, but also supports real-time analysis through a big data technology and optimizes target identification in a public transportation system. Disclosure of Invention In order to solve the problems, the invention provides an image target detection method based on an intelligent traffic system, which carries out multi-scale feature enhancement through local, nested and global paths of Trifuse modules, and remarkably improves the detection precision, particularly under small targets and complex shielding scenes in a public traffic environment, and comprises the following steps: Acquiring an original image data set from an intelligent traffic system, preprocessing the data set, and generating a feature map to be processed; The method comprises the steps of constructing a Trifuse module with characteristic extraction, scale perception and contextual information capture, embedding the Trifuse module into a YOLOv5s network structure to obtain an image target detection method model, firstly inputting the Trifuse module to a CBR module to generate a processed characteristic image and generating enhanced local characteristics through a local path, wherein the CBR module consists of a two-dimensional convolution layer without bias items, a batch normalization layer and a nonlinear activation function; The output of all branches is transmitted into a global path for activation, the global path captures the semantic dependence of a long distance, global context is projected into a space domain again by using global average pooling and bilinear upsampling; and inputting the processed feature map into an image target detection method model to obtain a result data set. The input signature is first processed by a standard Conv-BN-ReLU (CBR) module to unify channel dimensions. This mapping not only reduces redundancy, but also aligns the feature space, ready for subsequent multi-branch processing. Activating the output of all branches into a global path, wherein the global path captures the long-distance semantic dependence, and re-projects the global context into a spatial domain by using global average pooling and bilinear upsampling; finally, the outputs of all the active paths are connected and projected back to the target channel dimension by a 1 x1