CN-121999419-A - Visual recognition method and system for man-machine cooperation-oriented aerial work behaviors

CN121999419ACN 121999419 ACN121999419 ACN 121999419ACN-121999419-A

Abstract

The embodiment of the application provides a visual recognition method and a visual recognition system for aloft work behaviors facing man-machine cooperation, which comprise the steps of obtaining multiple source code streams of an aloft work construction site, wherein the multiple source code streams comprise a first code stream corresponding to fixed equipment and a second code stream corresponding to an unmanned aerial vehicle, decoding and aligning the first code stream and the second code stream to obtain a unified frame sequence, carrying out personnel detection and scene element object generation based on the unified frame sequence to obtain a personnel track sequence and a scene element object, calculating a dangerous boundary relationship based on the personnel track sequence and the scene element object, determining a target relationship sequence, enabling the target relationship sequence to represent the relationship sequence between the personnel and the boundary, fusing the target relationship sequence based on the personnel track sequence and the target relationship sequence to obtain candidate events, carrying out research and judgment based on the candidate events, determining a final recognition result, and carrying out visual display on the final recognition result. By means of the scheme, the accuracy of identifying potential safety hazards in high-altitude operation can be improved.

Inventors

TAN YI
XU WENYU
YI WEN
CHEN BINGQUAN

Assignees

深圳大学

Dates

Publication Date: 20260508
Application Date: 20260410

Claims (10)

1. A man-machine cooperation-oriented visual recognition method for aerial work behaviors is characterized by comprising the following steps: The method comprises the steps of obtaining a multi-source code stream of an aerial work construction site, wherein the multi-source code stream comprises a first code stream corresponding to fixed equipment and a second code stream corresponding to an unmanned aerial vehicle; Performing decoding processing and alignment processing on the first code stream and the second code stream to obtain a unified frame sequence; based on the unified frame sequence, performing personnel detection and scene element object generation to obtain a personnel track sequence and a scene element object; calculating dangerous boundary relations based on the personnel track sequence and the scene element objects, and determining a target relation sequence, wherein the target relation sequence represents a relation sequence between a person and a boundary; based on the personnel track sequence and the target relation sequence, fusing to obtain candidate events; and performing research and judgment based on the candidate event to determine a final recognition result, and performing visual display on the final recognition result.
2. The method of claim 1, wherein the decoding and aligning the first and second code streams to obtain a unified frame sequence comprises: Decoding the first code stream and the second code stream, and respectively determining the system time of each frame of each code stream in the first code stream and the second code stream for finishing decoding and the display time stamp obtained by decoding; determining a time offset based on the display timestamp and the system time; determining a uniform timestamp for each frame of each of the first and second code streams, respectively, based on the display timestamp and the time offset; And outputting the first code stream and the second code stream at a unified frame rate based on the unified time stamp to determine a unified frame sequence, wherein the unified frame sequence comprises a video source number, a video increment signal, a display time stamp, a unified time stamp, an image matrix and an additional information dictionary.
3. The method of claim 1, wherein performing person detection and scene element object generation based on the unified frame sequence to obtain a person track sequence and a scene element object comprises: performing multi-person operation personnel detection on each frame in the unified frame sequence to obtain a personnel detection set of each frame; performing non-maximum suppression on each detection frame in the personnel detection set of each frame to obtain a final personnel detection set of each frame; based on the final personnel detection set of each frame, carrying out continuous track generation to determine the personnel track sequence, wherein the personnel track sequence comprises personnel drop point pixels; performing aerial work risk clue detection on each frame in the unified frame sequence to obtain a clue detection set of each frame; and generating a scene element object based on the cue detection set of each frame, and determining the scene element object.
4. The method of claim 1, wherein the scene element objects include a borderline and an opening polygon; The step of calculating dangerous boundary relations based on the personnel track sequence and the scene element objects and determining a target relation sequence comprises the following steps: Determining a conversion matrix for each frame in the unified frame sequence from the pixel plane to the world plane; based on the conversion matrix, carrying out conversion processing on the personnel falling point pixels in the personnel track sequence to obtain final personnel falling point pixels in the world coordinate plane; Calculating a first shortest distance value between the person and the adjacent edge based on the final person falling point pixel and the adjacent edge folding line, and determining a second distance value between the person and the opening based on the final person falling point pixel and the opening polygon; Smoothing and trend calculation are carried out based on the first shortest distance value and the second distance value, and a first distance trend value and a second distance trend value are determined; the target relationship sequence is determined based on the first shortest distance value, the second distance value, the first distance trend value, and the second distance trend value.
5. The method of claim 4, wherein determining a transition matrix for each frame in the unified frame sequence to transition from the pixel plane to the world plane comprises: acquiring pixel coordinates of any frame in the unified frame sequence and world coordinates of preset mark points, and calculating to determine the conversion matrix by the pixel coordinates and the world coordinates, or The method comprises the steps of obtaining a camera internal parameter, a rotation matrix and a translation vector under a world coordinate system, determining a conversion inverse matrix through the camera internal parameter, the rotation matrix and the translation vector, and performing inverse operation on the conversion inverse matrix to determine the conversion matrix, wherein the conversion inverse matrix is a matrix converted from a world plane to a pixel plane.
6. The method of claim 1, wherein the fusing based on the sequence of person trajectories and the sequence of target relationships to obtain candidate events comprises: Determining an event type, an event fragment index, a key frame index and a trigger basis based on the personnel track sequence and the target relation sequence, wherein the key frame corresponding to the key frame index comprises a first frame, a last frame and a minimum frame in the event fragment; Performing risk assessment based on the event fragment index and the key frame index to determine a risk score; and determining candidate events through the event type, the event fragment index, the key frame index, the trigger basis and the risk score.
7. The method of claim 1, wherein the determining a final recognition result based on the candidate event comprises: Determining a first identification result of the aerial work through a risk score and a preset risk threshold based on the candidate event; Receiving a second identification result of the aerial work, wherein the second identification result is an identification result made by a safety officer for the candidate event; And carrying out comprehensive analysis based on the first recognition result and the second recognition result to determine the final recognition result.
8. A man-machine cooperation-oriented visual recognition system for high-altitude operation is characterized by comprising an acquisition module, a decoding alignment module, a detection generation module and a determination module, wherein, The acquisition module is used for acquiring a multi-source code stream of an aerial work construction site, wherein the multi-source code stream comprises a first code stream corresponding to fixed equipment and a second code stream corresponding to an unmanned aerial vehicle; The decoding alignment module is used for performing decoding processing and alignment processing on the first code stream and the second code stream to obtain a unified frame sequence; the detection generation module is used for carrying out personnel detection and scene element object generation based on the unified frame sequence to obtain a personnel track sequence and a scene element object; The determining module is used for calculating a dangerous boundary relation based on the personnel track sequence and the scene element object to determine a target relation sequence, wherein the target relation sequence represents a relation sequence between a person and a boundary, fusion is carried out based on the personnel track sequence and the target relation sequence to obtain a candidate event, research and judgment are carried out based on the candidate event to determine a final recognition result, and the final recognition result is visually displayed.
9. A man-machine cooperation-oriented visual recognition device for high-altitude operation behavior is characterized by comprising a processor and a memory, wherein, The memory is used for storing a computer program; the processor being adapted to call and run the computer program from the memory to perform the method of any one of claims 1 to 7.
10. A computer readable storage medium storing executable instructions for causing a processor to perform the method of any one of claims 1 to 7.

Description

Visual recognition method and system for man-machine cooperation-oriented aerial work behaviors Technical Field The application relates to the technical field of civil industry, in particular to a man-machine cooperation-oriented visual recognition method and system for high-altitude operation behaviors. Background The high-altitude operation is one of high-risk operation types of construction sites, and is commonly used in scaffold operation, facade construction operation platforms, edge hole operation, roof operation, high-altitude equipment overhaul and the like. The operation scene has high risk and strong field complexity, on one hand, the boundary of dangerous space of high-altitude operation is clear (edges, holes, overhanging platform edges and the like), once personnel approach or cross the boundary, the result is serious, on the other hand, a large amount of shielding and interference (scaffold rods, safety nets, material stacking and personnel interleaving) exist on the site, the operation behavior has obvious time sequence (such as actions of approaching edges, staying, crossing, climbing and the like are usually continuous fragments), and single-frame identification or a scheme which only depends on static images is difficult to stably operate. The traditional safety management mainly comprises manual inspection, and has the problems of discontinuous coverage, multiple blind areas, high rechecking cost, unstructured recording and the like. Disclosure of Invention The embodiment of the application provides a visual recognition method and a visual recognition system for man-machine cooperation-oriented aerial work behaviors, which can improve the accuracy of aerial work potential safety hazard recognition. The technical scheme of the application is realized as follows: in a first aspect, an embodiment of the present application provides a visual recognition method for aloft work behavior facing man-machine cooperation, where the method includes: The method comprises the steps of obtaining a multi-source code stream of an aerial work construction site, wherein the multi-source code stream comprises a first code stream corresponding to fixed equipment and a second code stream corresponding to an unmanned aerial vehicle; Performing decoding processing and alignment processing on the first code stream and the second code stream to obtain a unified frame sequence; based on the unified frame sequence, performing personnel detection and scene element object generation to obtain a personnel track sequence and a scene element object; calculating dangerous boundary relations based on the personnel track sequence and the scene element objects, and determining a target relation sequence, wherein the target relation sequence represents a relation sequence between a person and a boundary; based on the personnel track sequence and the target relation sequence, fusing to obtain candidate events; and performing research and judgment based on the candidate event to determine a final recognition result, and performing visual display on the final recognition result. In the above scheme, the decoding and aligning the first code stream and the second code stream to obtain a unified frame sequence includes: Decoding the first code stream and the second code stream, and respectively determining the system time of each frame of each code stream in the first code stream and the second code stream for finishing decoding and the display time stamp obtained by decoding; determining a time offset based on the display timestamp and the system time; determining a uniform timestamp for each frame of each of the first and second code streams, respectively, based on the display timestamp and the time offset; And outputting the first code stream and the second code stream at a unified frame rate based on the unified time stamp to determine a unified frame sequence, wherein the unified frame sequence comprises a video source number, a video increment signal, a display time stamp, a unified time stamp, an image matrix and an additional information dictionary. In the above scheme, the step of performing personnel detection and scene element object generation based on the unified frame sequence to obtain a personnel track sequence and a scene element object includes: performing multi-person operation personnel detection on each frame in the unified frame sequence to obtain a personnel detection set of each frame; performing non-maximum suppression on each detection frame in the personnel detection set of each frame to obtain a final personnel detection set of each frame; based on the final personnel detection set of each frame, carrying out continuous track generation to determine the personnel track sequence, wherein the personnel track sequence comprises personnel drop point pixels; performing aerial work risk clue detection on each frame in the unified frame sequence to obtain a clue detection set of each frame; and generating a sce