Search

CN-122024137-A - Police condition video accurate positioning method for cross behavior monitoring

CN122024137ACN 122024137 ACN122024137 ACN 122024137ACN-122024137-A

Abstract

The invention relates to the technical field of monitoring video positioning and discloses a warning condition video accurate positioning method for cross behavior monitoring, which comprises the steps of extracting a warning condition text to obtain a warning condition key entity set, and constructing a time axis index corresponding to a video frame in a monitoring video stream; the method comprises the steps of collecting key behavior anchor point time and key behavior element events by a warning condition key entity, constructing a time association relation between the key behavior anchor point time and a time axis index, selecting a key video frame, extracting behavior description characteristics of the key behavior element events and behavior monitoring characteristics of the key video frame, and carrying out cross behavior matching and frame expansion processing on the extracted characteristics to serve as a warning condition video positioning result. According to the invention, the video positioning result which accords with the description of the warning text is obtained by carrying out multidimensional behavior matching on the key event information in the warning text and the key video frames in the monitoring video stream, so that the accurate association of the video frames and the warning events is ensured.

Inventors

  • JIANG SHOUPING

Assignees

  • 青岛市公安局即墨分局信息网络管理中心

Dates

Publication Date
20260512
Application Date
20260128

Claims (9)

  1. 1. The police condition video accurate positioning method for cross behavior monitoring is characterized by comprising the following steps of: S1, acquiring a warning text and a monitoring video stream, extracting the warning text to obtain a warning key entity set, and establishing a time axis for the monitoring video stream to obtain a time axis index corresponding to a video frame in the monitoring video stream; s2, extracting key behavior anchor point time and key behavior element events in a warning key entity set, constructing a time association relation between the key behavior anchor point time and a time axis index of a video frame, and selecting a key video frame in a monitoring video stream based on the time association relation; S3, extracting behavior description characteristics of key behavior element events and behavior monitoring characteristics of key video frames; And S4, based on the time association relation, performing cross behavior matching on the behavior description feature and the behavior monitoring feature to obtain a key video frame successfully matched with the cross behavior, and performing frame expansion processing to obtain a continuous video frame sequence, wherein the continuous video frame sequence is used as a warning condition positioning result of key behavior element events corresponding to the behavior description feature.
  2. 2. The police condition video accurate positioning method for cross behavior monitoring as claimed in claim 1, wherein the steps of collecting the police condition text and monitoring video stream, and extracting the police condition text for entity comprise: The method comprises the steps of collecting warning text in real time, wherein the warning text consists of a structured field and an unstructured field, the structured field comprises warning numbers, warning time, place coordinates and warning person information, and the unstructured field is a free description text of a warning person; According to the acquired warning text, synchronously calling a police video monitoring platform interface, and calling a video stream from an area where a place coordinate is located in the warning text as a monitoring video stream associated with the warning text, wherein each section of monitoring video stream comprises time index information and deployment position information of a monitoring camera, the time index information comprises a video stream number, the number of the monitoring camera to which the video stream belongs, a start-stop time stamp of the video stream and a frame rate of the video stream, and the deployment position information comprises the deployment position of the monitoring camera and the shooting direction of the monitoring camera; Entity extraction is carried out on the acquired warning text, wherein the entity extraction flow is as follows: Carrying out unified format processing on numbers and time in the warning text, and carrying out sentence segmentation and word segmentation processing on the unstructured field to obtain a word segmentation result sequence corresponding to the unstructured field; sequentially generating word vectors of word segmentation results and entity perception position coding vectors in the word segmentation result sequences, and splicing the coding vectors of the word segmentation results in the word segmentation result sequences to obtain coding vector sequences corresponding to the word segmentation result sequences; Receiving a coded vector sequence by using an entity perception model fusing a conditional random field and an attention mechanism, and generating an entity type of a coded vector in the coded vector sequence as an entity type of a word segmentation result associated with the coded vector, wherein the entity type of the word segmentation result comprises a fuzzy expression class, a time class, a place class, a person class and a behavior class; The method comprises the steps of forming a warning situation key entity set by the structured fields, word segmentation results and entity types of the word segmentation results, wherein the warning situation key entity set consists of a plurality of warning situation key entities, the entity forms of the warning situation key entities are { entity types: entity contents }, the entity types comprise field tags in the structured fields and entity types generated by unstructured fields, the entity types are word segmentation results which are processed in a unified format in warning situation texts and are processed by word segmentation, and the field tags are warning situation numbers, warning time, place coordinates and warning person information.
  3. 3. The police condition video accurate positioning method for cross behavior monitoring according to claim 2, wherein the method for establishing a time axis for a monitoring video stream to obtain a time axis index corresponding to a video frame in the monitoring video stream comprises the following steps: Extracting the frame rate and the start-stop time stamp of the monitoring video stream, dividing the monitoring video stream into a plurality of continuous video frames, and generating continuous numbers of the video frames; And constructing a time axis covering the start and stop time stamps of the monitoring video stream, calculating the time stamp of any video frame in the monitoring video stream, taking the time stamp as a time axis index of the video frame, and inserting the video frame into the time axis position of the time axis index corresponding to the video frame based on the time axis index.
  4. 4. The police condition video accurate positioning method for cross behavior monitoring according to claim 1, wherein the extracting key behavior anchor point time and key behavior element event in the police condition key entity set comprises: The method comprises the steps of extracting entity contents with alarm time and time types from alarm key entity sets as key behavior anchor point time, and extracting entity labels to splice entity contents of characters, behaviors and places in sequence respectively as key behavior element events.
  5. 5. The police condition video accurate positioning method for cross behavior monitoring according to claim 4, wherein a time association relation between key behavior anchor point time and a time axis index of a video frame is constructed, and a calculation formula of the time association relation is as follows: ; Wherein, the Representing key behavior anchor points time Index with time axis Time association relation between when The smaller the time association relation is, the closer to 1, the stronger the time association between the video frame corresponding to the time axis index and the key behavior anchor point time is, Represents an exponential function with a base of a natural constant, And representing the time sensitive parameters, and selecting key video frames in the monitoring video stream based on the time association relation.
  6. 6. The police condition video accurate positioning method for cross behavior monitoring according to claim 5, wherein video frames with time association relation with key behavior anchor point time higher than a preset time threshold in the monitoring video stream are selected as key video frames, and the key video frames in the monitoring video stream are obtained.
  7. 7. The police condition video accurate positioning method for cross behavior monitoring according to claim 1, wherein the step S3 comprises: the extraction flow of the behavior description characteristics of the key behavior element event is as follows: sequentially and respectively carrying out word vector conversion on the entity contents of the figures, the behaviors and the places in the key behavior element event to obtain word vectors of the entity contents of the figures, the word vectors of the entity contents of the behaviors and the word vectors of the entity contents of the places in the key behavior element event, wherein the word vectors are used as behavior description characteristics of the key behavior element event; the extraction flow of the behavior monitoring characteristics of the key video frames is as follows: performing multi-target detection on the key video frames to obtain target detection frames and target categories in the key video frames, wherein the target categories comprise characters, tools, vehicles and scene targets; Estimating human body key points and extracting clothing color features of the characters in the key video frames to obtain key point features of the characters in the key video frames, and acquiring optical flow features of the key video frames by adopting an optical flow method to serve as motion features of the characters in the key video frames; Identifying a scene tag vector of a target detection frame corresponding to the scene target in a semantic identification mode; carrying out feature matching on the key point features of the characters in different key video frames, and marking the characters with successfully matched features as the same character for tracking; the method comprises the steps of receiving a key video frame after multi-target detection and motion characteristics of characters in the key video frame by using an action recognition network, and outputting action probability distribution of the characters in the key video frame; and splicing the key point characteristics, the action probability distribution and the scene label vectors of the characters in the key video frames to serve as the behavior monitoring characteristics of the key video frames.
  8. 8. The police condition video accurate positioning method for cross behavior monitoring according to claim 1, wherein the cross behavior matching of the behavior description feature and the behavior monitoring feature based on the time association relation comprises the following steps: The behavior description features in the key behavior element event comprise character entity content word vectors, behavior entity content word vectors and place entity content word vectors, the behavior monitoring features of the key video frames comprise key point features, action probability distribution and scene tag vectors of characters, and the behavior description features and the behavior monitoring features are subjected to cross behavior matching by using a cross behavior matching function, wherein the calculation formula of the cross behavior matching function is as follows: ; ; ; ; Wherein, the Representing the cross-behavior matching function, Representing behavioral description features Behavior monitoring features The cross-behavior between the two matches the result, Sequentially and respectively describe characteristics of behaviors Character type entity content word vector, behavior type entity content word vector and place type entity content word vector, In turn, respectively behavior monitoring features Key point features of the character in (a), motion probability distribution, and scene tag vectors, Representing a cosine similarity calculation function, Representing behavior monitoring features The temporal association of the associated key video frames, Representing behavior monitoring features The distance between the associated surveillance camera deployment location and the alert location, The distance control coefficient is represented by a distance control coefficient, Representing a preset minimum matching similarity degree, For selecting The maximum value between the two, Representation selection A maximum value therebetween; If it is Above a preset behavior matching threshold, behavior monitoring features The cross behavior matching of the associated key video frames is successful.
  9. 9. The method for accurately positioning police condition video of cross behavior monitoring according to claim 8, wherein the method for positioning police condition of key behavior element event corresponding to behavior description feature comprises steps of: Setting a time window range, extracting all video frames in the time window range in the same monitoring video stream as adjacent frames of the key video frames for the key video frames successfully matched with the cross behaviors, and extracting key video frames obtained by person tracking in the key video frames successfully matched with the cross behaviors and corresponding adjacent frames; and sequencing all the extracted adjacent frames and the key video frames according to the time axis index sequence, and taking the extracted adjacent frames and the key video frames as frame expansion results of the key video frames successfully matched by the cross behavior, namely warning situation positioning results of key behavior element events corresponding to the behavior description features.

Description

Police condition video accurate positioning method for cross behavior monitoring Technical Field The invention relates to the technical field of big data processing, in particular to the field of monitoring video positioning, and specifically relates to a police condition video accurate positioning method for cross behavior monitoring. Background With the continuous development and wide application of video monitoring technology, especially in the fields of public security, urban management, enterprise security, etc., video monitoring systems have become an important means for real-time monitoring and event response. The monitoring system is not only used for video recording, but it also takes on the task of analyzing and processing massive amounts of video data. How to extract the key events related to specific police conditions rapidly and accurately in huge video data becomes one of the technical problems to be solved in the intelligent monitoring field. In practical application, the surveillance video records a large amount of scene data, and the data not only includes visual information such as characters, actions, places and the like in the video, but also includes text information such as time, events, scenes and the like. The video data generated by the monitoring system is often very bulky and contains a large amount of irrelevant or redundant information. How to extract the effective data related to the police situation rapidly from the video, not only a powerful video analysis technology is needed, but also visual information in the video is needed to be combined with event descriptions in the police situation text. Information such as time, place, event type, character features, etc. is often included in alert text, while video data provides dynamic information in the scene such as behavior actions of characters, timelines of event occurrence, and changes in the scene. The prior patent CN117453949A discloses a video positioning method and a device thereof, the main steps of the method comprise the steps of firstly obtaining a video data set, wherein the set comprises a plurality of video fragments, each video fragment comprises multi-frame images, then extracting first video features and text description features of video data, then segmenting the video features to obtain a plurality of video fragment features, mapping each video fragment and corresponding text description thereof, finally training a video positioning model according to the text description of each video fragment to generate a video positioning model, and accurately positioning the video through the model. The method successfully combines the video content and the text information by carrying out feature segmentation on the video data and mapping with the text description, but still has the problems of time synchronization and accurate mapping matching of the video stream and the text description. Aiming at the problem, the invention provides a warning condition video accurate positioning method for cross behavior monitoring, which combines warning condition text data with a time line of a monitoring video stream, intelligently and rapidly reduces the analysis range, automatically positions key behavior anchor point time, and accurately positions and clips the monitoring video stream by adopting a cross behavior matching mode so as to help public security authorities and security personnel to rapidly take countermeasures. Disclosure of Invention The invention provides a warning condition video accurate positioning method for cross behavior monitoring, which is characterized in that a traditional video analysis method generally depends on only a single information source in video data or warning condition text, the situation that information is asymmetric or semantic understanding is inaccurate easily occurs, S1 step is used for carrying out multidimensional association on event description in text and actual behavior characteristics in video by fusing the data of the warning condition text and the monitoring video stream, so that the defect in a single mode is effectively overcome, S4 step is used for carrying out behavior matching on key information (such as characters, behaviors and places) in the video stream and the warning condition text description, and carrying out accurate positioning on video frames, S2 step is used for solving the time synchronization problem between text description and video data by establishing a time association relation between key behavior anchor point time in the warning condition text and a video frame time axis in the monitoring video stream, ensuring accurate positioning from the monitoring video to a time period and a specific frame related to the event, and obviously improving the retrieval efficiency, and S3 and S4 step is used for carrying out accurate positioning on the key behavior description elements by extracting the behavior description characteristics of the key event and combining