CN-115527270-B - Specific behavior identification method in dense crowd environment

CN115527270BCN 115527270 BCN115527270 BCN 115527270BCN-115527270-B

Abstract

The invention discloses a specific behavior identification method in a dense crowd environment, which comprises the following steps of S1, acquiring a data set, S2, preprocessing the data set, S3, inputting images in the preprocessed pedestrian detection data set into a feature map pyramid network to extract corresponding features, generating a candidate region and category information of the candidate region through a region generation network, S4, removing an overlapping target by using a maximum suppression algorithm, S5, identifying a specific behavior target based on a classification identification network of a residual network, S6, training grid parameters of the classification identification network, S7, obtaining optimal grid parameters through the step S6, importing the optimal grid parameters into the classification identification network, and testing through the behavior identification data set. The method realizes the detection and recognition task of specific behaviors in the dense crowd environment by fusing two stages of detection and recognition tasks for the dense crowd environment.

Inventors

CHENG SHICHAO
ZHANG JIANHAI
ZHOU JUNZHE
LIU HUASHENG

Assignees

杭州电子科技大学

Dates

Publication Date: 20260512
Application Date: 20221010

Claims (5)

1. A method for identifying specific behaviors in a dense crowd environment, comprising the steps of: s1, acquiring a data set, wherein the data set comprises a pedestrian detection data set and a behavior identification data set; s2, preprocessing a data set S2-1, unifying the sizes of data sets on the premise of not losing image information, sampling and calculating to obtain image data by using an ImageNet training set, and normalizing; S2-2, labeling whether a pedestrian has a specific behavior in the normalized image, wherein 0 is the existence of the specific behavior, and 1 is the absence of the specific behavior; S3, inputting the image in the preprocessed pedestrian detection data set into a feature map pyramid network to extract corresponding features, and generating a candidate region and category information of the candidate region through a region generation network; In the step S3, an RPN area generation network is adopted to generate candidate areas and category information of the candidate areas, the method comprises the steps of firstly generating anchor frames, judging whether each anchor frame contains a foreground or a background of an object and performing two classifications, performing fine adjustment on the anchor frames by using a boundary frame regression, enabling the screened anchor frames to be closer to a real frame, and predicting an instance set by using a detection function with a parameter K for generating the candidate areas, wherein the expression is as follows: (1) (2) Wherein the method comprises the steps of Represent the first The candidate region proposal box of the item, Representation and representation A corresponding set of real instance sets ground truth, Representing all of the ground truth boxes sets of sets, Representing the maximum number of instances in the candidate box, also representing Is used for the purpose of determining the maximum cardinal number of the (c) set, Is a set of predicted instances of the type, Is a class confidence label that is to be used, Is the corresponding position of the two-way valve, Is a threshold for a given joint intersection ratio; S4, removing the overlapped targets by using an improved maximum suppression algorithm; selecting one of the candidate areas with the highest confidence coefficient as a first boundary frame from the plurality of candidate areas obtained in the step S3, then selecting one of the candidate areas as a second boundary frame, skipping the suppressing step if the two boundary frames are from the same candidate area, otherwise, calculating the value of the joint intersection ratio of the two boundary frames through a maximum suppressing algorithm, if the value is greater than a threshold value, rejecting the second boundary frame, and then sequentially repeating the operation on the remaining candidate areas until all the candidate areas are traversed, and determining the final candidate area; s5, a classification recognition network based on a residual network is used for recognizing a specific behavior target, wherein the classification recognition network comprises a first-stage detection network and a second-stage classification network, the first-stage detection network uses a three-layer fully-connected convolutional neural network for preliminary classification, and targets with the detection probability larger than 0.1 are directly classified into mobile phone playing categories, and targets with the probability smaller than 0.1 are set as undetermined; s6, training grid parameters of the classified identification network; s7, obtaining optimal grid parameters through the step S6, importing the optimal grid parameters into a classification recognition network, and testing through a behavior recognition data set.
2. The method of claim 1, wherein the pedestrian detection dataset uses CrowdHuman datasets for crowded scenes, the behavior recognition dataset being obtained by self-photographing pedestrian-dense traffic scenes.
3. The method for identifying specific behaviors in a dense crowd environment according to claim 1, wherein in the step S3, a ro alignment feature map pyramid network is adopted, up-sampling is performed by high-level features of the feature map pyramid network, top-down connection is performed by low-level features, and each level is predicted to obtain corresponding features.
4. The method according to claim 1, wherein in step S6, the first stage detects a network Training of parameters using a push-map distance function Minimizing prediction set And (3) with Corresponding ground truth instance sets The distance between them is expressed as follows: (3) (4) (5) (6) Wherein, the Representing a particular arrangement First, the The items are , Is the first Items Is a real example ground truth box of the number of (a), And The classification loss and the regression loss of the box are represented respectively, Is a class confidence label that is to be used, Is the corresponding position of the two-way valve, Is the smoothl 1 loss function.
5. The method of claim 4, wherein in step S6, the classification identifies a network And The training of the parameters is carried out, Calculation using cross entropy loss function The parameters, expressions are as follows: (7) Wherein, the Is the output vector of the network; (8) Wherein, the Is the output vector of the network and, Is a real tag.

Description

Specific behavior identification method in dense crowd environment Technical Field The invention relates to the technical field of recognition and positioning, in particular to a specific behavior recognition method in a dense crowd environment. Background With the continuous maturity of artificial intelligence technology, target recognition technologies such as pedestrians, vehicles and the like are gradually applied to life of people, such as face brushing payment supported by face recognition technology, and vehicle entry and exit registration supported by license plate recognition technology. However, due to the abundant and diverse living scenes, multiple targets are usually required to be detected in complex crowds at the same time, and a single target recognition technology is insufficient for solving a specific target recognition task of mass-distribution. Therefore, for the dense crowd environment, it is necessary to provide specific crowd distribution as detection and identification technology, and it is also an essential stage of improving human life by artificial intelligence technology. Considering the diversity of specific behaviors in a complex environment, the solution is designed aiming at specific problems, and the patent uses the traffic safety of pedestrians as an access point to focus and identify the specific behavior of 'low-head family' playing with a mobile phone on a road. The popularization of smart phones and the increasing variety of functions thereof make the dependence of the masses on the phones more and more serious, so that a plurality of 'low-head families' can be frequently seen on roads, traffic accidents caused by the low-head families frequently occur, and great hidden danger is brought to traffic safety. At present, the phenomenon of 'low head group' on the road lacks legal constraint, mainly depends on the persuasion teaching of duty traffic police, and has larger workload. Therefore, the patent provides an automatic recognition technology of 'low-head group' in the dense crowd environment so as to lighten the burden of traffic police, improve the working efficiency and standardize the traffic behavior of pedestrians. Disclosure of Invention Aiming at a special environment of dense crowd, the invention provides a specific behavior recognition method in the dense crowd environment, and the detection and recognition task of the specific behavior in the dense crowd environment is realized through two stages of fusion detection and recognition tasks. In order to solve the technical problems, the technical scheme of the invention is as follows: a method for identifying specific behaviors in a dense crowd environment, comprising the steps of: s1, acquiring a data set, wherein the data set comprises a pedestrian detection data set and a behavior identification data set; s2, preprocessing a data set S2-1, unifying the sizes of data sets on the premise of not losing image information, sampling and calculating to obtain image data by using an ImageNet training set, and normalizing; S2-2, labeling whether a pedestrian has a specific behavior in the normalized image, wherein 0 is the existence of the specific behavior, and 1 is the absence of the specific behavior; S3, inputting the image in the preprocessed pedestrian detection data set into a feature map pyramid network to extract corresponding features, generating a candidate region and category information of the candidate region through a region generating network, and predicting an instance set for the generated candidate region by using a multi-instance prediction method; S4, removing the overlapped targets by using an improved maximum suppression algorithm; S5, identifying specific behavior targets by using classification identification network based on residual network S6, training grid parameters of the classified identification network; s7, obtaining optimal grid parameters through the step S6, importing the optimal grid parameters into a classification recognition network, and testing through a behavior recognition data set. Preferably, the pedestrian detection dataset uses CrowdHuman datasets for crowded scenes, and the behavior recognition dataset is obtained by self-shooting of pedestrian-intensive traffic scenes. Preferably, in the step S3, a ro alignment feature map pyramid network is adopted, up-sampling is performed by high-level features of the feature map pyramid network, top-down connection is performed by low-level features, and each level is predicted to obtain corresponding features. Preferably, in the step S3, a candidate region and category information of the candidate region are generated by using a region candidate network in a pyramid structure, and the method includes that anchor frames (anchor frames) are generated first, each anchor frame is judged to be foreground or background containing an object, two classifications are performed, and fine adjustment is performed on the anchor frames by using