CN-121982061-A - Multi-video target fusion method based on indoor scene and related equipment

CN121982061ACN 121982061 ACN121982061 ACN 121982061ACN-121982061-A

Abstract

The invention discloses a multi-video target fusion method based on an indoor scene and related equipment, and relates to the technical field of indoor positioning, wherein the method comprises the steps of obtaining coordinate mapping relations between a plane coordinate system used by an indoor space plane graph corresponding to the indoor scene and image pixel coordinates of each camera deployed in the indoor scene; the method comprises the steps of obtaining position coordinates of each personnel target in a plane coordinate system, obtaining fusion position coordinates of each personnel target in the plane coordinate system at different moments, and generating tracks of each personnel target according to the fusion position coordinates of each personnel target in the plane coordinate system at different moments. The invention only needs to update the coordinate mapping relation, thereby obviously improving the adaptability and maintenance convenience, and simultaneously ensuring the accuracy of indoor personnel positioning and the continuity of track tracking through multi-view data fusion.

Inventors

YAN JINYU
CHEN ZHIQIANG
WU KEWEI
HE XIAOGANG

Assignees

北京卓视智通科技有限责任公司

Dates

Publication Date: 20260505
Application Date: 20251225

Claims (10)

1. The multi-video target fusion method based on the indoor scene is characterized by comprising the following steps of: Acquiring a coordinate mapping relation between a plane coordinate system used by an indoor space plan corresponding to an indoor scene and image pixel coordinates of each camera deployed in the indoor scene; Detecting personnel targets of each image frame in a video stream shot by each camera to obtain a central pixel coordinate of each personnel target in the image frame acquired by each camera at the same time, and converting the central pixel coordinate by a corresponding coordinate mapping relation to obtain a position coordinate of each personnel target in a plane coordinate system; In the indoor space plan, taking the mounting position coordinate of each camera shooting any personnel target at the same moment and the position coordinate of the personnel target in a plane coordinate system as endpoints, determining a plurality of positioning reference lines, and calculating the center point coordinate of the polygonal area according to the polygonal area defined by the intersection of the positioning reference lines determined for the personnel target in the indoor space plan, wherein the center point coordinate is used as the fusion position coordinate of the personnel target in the plane coordinate system at the moment until the fusion position coordinate of each personnel target in the plane coordinate system at different moments is obtained; and generating a track of each personnel target according to the fusion position coordinates of each personnel target in the plane coordinate system at different moments.
2. The method for merging multiple video objects based on an indoor scene according to claim 1, wherein obtaining a coordinate mapping relationship between a plane coordinate system used for an indoor space plan corresponding to the indoor scene and image pixel coordinates of each camera deployed in the indoor scene, respectively, comprises: At least four image pixel marking points are selected from a picture shot by any camera, coordinate points of each image pixel marking point in the indoor space plan are determined, and a coordinate mapping relation from the image pixel coordinates of the camera to the plane coordinate system is established based on all the image pixel marking points and the corresponding coordinate points corresponding to the camera until the coordinate mapping relation from the image pixel coordinates of each camera to the plane coordinate system is obtained.
3. The method for merging multiple video objects based on indoor scene as claimed in claim 2, wherein the step of performing personnel object detection on each image frame in the video stream shot by each camera to obtain the center pixel coordinates of each personnel object in each image frame acquired by each camera at the same time comprises the steps of: Acquiring video streams shot by each camera in real time; For each camera, detecting personnel targets of the image frames in the acquired video stream; for each person object detected by each camera in each image frame, the head detection frame of the person object is extracted and its center pixel coordinates are calculated.
4. The method for merging multiple video objects based on indoor scene as recited in claim 3, further comprising: when personnel target detection is carried out on each image frame in the video stream shot by each camera, the structural information of each personnel target is also extracted, and the structural information comprises face feature codes and clothes color information; In the indoor space plan, a plurality of positioning reference lines are determined by taking the installation position coordinates of each camera shooting any person target at the same time and the position coordinates of the person target in a plane coordinate system as endpoints, wherein the method comprises the following steps: the method comprises the steps of acquiring data generated by processing images acquired at any moment, wherein the data comprise position coordinates of personnel targets, structural information and mounting position coordinates of cameras, correlating data corresponding to the same personnel target from different cameras according to face feature codes and clothes color information in the structural information to obtain multiple groups of data, and respectively defining a positioning reference line for each personnel target correlated at the moment according to the multiple groups of correlated data in the indoor space plan by taking the mounting position coordinates of the cameras in each group of data and the position coordinates of the personnel targets in the group of data as endpoints.
5. The multi-video target fusion system based on the indoor scene is characterized by comprising a coordinate mapping relation acquisition module, a position coordinate acquisition module, a fusion position coordinate generation module and a track generation module; The coordinate mapping relation acquisition module is used for acquiring the coordinate mapping relation between a plane coordinate system used by an indoor space plan corresponding to an indoor scene and image pixel coordinates of each camera deployed in the indoor scene; The position coordinate acquisition module is used for detecting personnel targets of each image frame in the video stream shot by each camera to obtain the center pixel coordinate of each personnel target in the image frame acquired by each camera at the same moment, and converting the center pixel coordinate by the corresponding coordinate mapping relation to obtain the position coordinate of each personnel target in a plane coordinate system; The fused position coordinate generation module is used for determining a plurality of positioning reference lines by taking the installation position coordinate of each camera shooting any personnel target at the same moment and the position coordinate of the personnel target in a plane coordinate system as endpoints in the indoor space plane graph, and calculating the center point coordinate of the polygonal area according to a polygonal area surrounded by the intersecting of the positioning reference lines determined for the personnel target in the indoor space plane graph, wherein the center point coordinate is used as the fused position coordinate of the personnel target in the plane coordinate system at the moment until the fused position coordinate of each personnel target in the plane coordinate system at different moments is obtained; The track generation module is used for generating the track of each personnel target according to the fusion position coordinates of each personnel target under the plane coordinate system at different moments.
6. The multi-video object fusion system based on indoor scene as defined in claim 5, wherein the coordinate mapping relation obtaining module is specifically configured to: At least four image pixel marking points are selected from a picture shot by any camera, coordinate points of each image pixel marking point in the indoor space plan are determined, and a coordinate mapping relation from the image pixel coordinates of the camera to the plane coordinate system is established based on all the image pixel marking points and the corresponding coordinate points corresponding to the camera until the coordinate mapping relation from the image pixel coordinates of each camera to the plane coordinate system is obtained.
7. The system of claim 6, wherein the location coordinate acquisition module is further specifically configured to: Acquiring video streams shot by each camera in real time; For each camera, detecting personnel targets of the image frames in the acquired video stream; for each person object detected by each camera in each image frame, the head detection frame of the person object is extracted and its center pixel coordinates are calculated.
8. The system of claim 7, further comprising a structured information acquisition module for extracting structured information of each person's target when detecting person's target for each image frame in the video stream captured by each camera, the structured information including face feature codes and clothing color information; the fusion position coordinate generation module is further specifically configured to acquire data generated by processing an image acquired at any moment, where the data includes position coordinates of a person target, structural information and mounting position coordinates of a camera, correlate data corresponding to the same person target from different cameras according to face feature codes and clothes color information in the structural information to obtain multiple groups of data, and define a positioning reference line in the indoor space plan according to the multiple groups of correlated data for each person target at the moment, where the mounting position coordinates of the camera in each group of data and the position coordinates of the person target in the group of data are used as endpoints.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a multi-video object fusion method based on indoor scenes according to any of claims 1 to 4 when the computer program is executed.
10. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when executed by a processor, the computer program implements a multi-video object fusion method based on an indoor scene as claimed in any one of claims 1 to 4.

Description

Multi-video target fusion method based on indoor scene and related equipment Technical Field The invention relates to the technical field of indoor positioning, in particular to a multi-video target fusion method based on an indoor scene and related equipment. Background Along with the development of digital twin technology, the application of realizing the accurate tracking and positioning of personnel targets in indoor scenes is more and more widely used. Particularly, in some important security places, the historical activity track of personnel needs to be completely reproduced so as to improve the security management level. However, the conventional GPS device is often costly to locate indoors, and the deployment process is complex, so that it is difficult to realize large-scale and noninductive accurate location. Therefore, there is a need in the industry for a simple method that can utilize existing vision equipment to achieve continuous and accurate positioning of indoor personnel at a low cost. In the prior art, a common implementation manner is to place a certain number of physical calibration plates in advance under an indoor geodetic coordinate system. The coordinate relationship of these calibration plates in space is known. By taking images containing these calibration plates, the internal and external parameters of each camera can be calculated. With these parameters, the image pixel coordinates captured by the camera can be converted into a unified large ground plane coordinate system, thereby realizing position mapping from a two-dimensional image to a three-dimensional space. However, the above prior art solutions have significant drawbacks. In the scenario where the indoor multiple cameras work cooperatively, once the position of a certain camera is moved, or a rotatable pan-tilt camera is used, external parameters relative to the geodetic coordinate system are changed. This requires a re-run of the full-flow operations of calibration plate deployment, image acquisition and parameter calculation. The process is complex, the time consumption of algorithm reconfiguration is long, the flexibility and maintenance efficiency of the system are seriously affected, and the method is difficult to adapt to the practical application environment in which the camera layout or the view angle needs to be frequently adjusted. Aiming at the problems of time consumption and complex flow of recalibration caused by camera position change in the prior art, the invention provides a more convenient and stable solution. The method aims to realize accurate and continuous positioning and track tracking of indoor personnel targets by establishing multi-view data fusion through a simplified mapping relation. Disclosure of Invention The invention aims to solve the technical problems of the prior art, and particularly provides an indoor scene-based multi-video target fusion method and related equipment, wherein the method comprises the following steps of: 1) In a first aspect, the present invention provides a multi-video object fusion method based on an indoor scene, and the specific technical scheme is as follows: Acquiring a coordinate mapping relation between a plane coordinate system used by an indoor space plan corresponding to an indoor scene and image pixel coordinates of each camera deployed in the indoor scene; Detecting personnel targets of each image frame in a video stream shot by each camera to obtain a central pixel coordinate of each personnel target in the image frame acquired by each camera at the same time, and converting the central pixel coordinate by a corresponding coordinate mapping relation to obtain a position coordinate of each personnel target in a plane coordinate system; In an indoor space plan, taking the mounting position coordinates of each camera shooting any personnel target at the same moment and the position coordinates of the personnel target in a plane coordinate system as endpoints, determining a plurality of positioning reference lines, calculating the center point coordinates of the polygonal area according to the polygonal area surrounded by the intersecting of the positioning reference lines determined for the personnel target in the indoor space plan, and taking the center point coordinates as the fused position coordinates of the personnel target in the plane coordinate system at the moment until the fused position coordinates of each personnel target in the plane coordinate system at different moments are obtained; and generating the track of each personnel target according to the fusion position coordinates of each personnel target in the plane coordinate system at different moments. The multi-video target fusion method based on the indoor scene has the following beneficial effects: by establishing the coordinate mapping relation between the plane coordinate system used by the indoor space plan and the image pixel coordinates of each camera, the complex process of camera p