CN-122023505-A - Monocular ranging method, monocular ranging device, monocular ranging equipment and storage medium

CN122023505ACN 122023505 ACN122023505 ACN 122023505ACN-122023505-A

Abstract

The application provides a monocular distance measuring method, a monocular distance measuring device, monocular distance measuring equipment and a storage medium, wherein the method comprises the steps of acquiring a video to be processed of a region to be measured based on a monocular camera, wherein the video to be processed comprises continuous video frame images; the method comprises the steps of carrying out target detection on each video frame image to obtain an object detection frame of a target object, obtaining internal reference information of the monocular camera, carrying out camera distortion correction on the object detection frame based on the internal reference information to obtain a correction detection frame, carrying out attitude estimation on the target object, carrying out angle compensation on the correction detection frame based on an attitude estimation result of the target object to obtain the target detection frame, calculating single-frame initial distance between the target object and the monocular camera in each video frame image based on size information of the target detection frame, the internal reference information and set size information of the target object, and carrying out stable fitting on the single-frame initial distance of the target object in each video frame image to obtain the target distance.

Inventors

Request for anonymity
Request for anonymity

Assignees

北京中科慧灵机器人技术有限公司

Dates

Publication Date: 20260512
Application Date: 20251210

Claims (10)

1. A method of monocular ranging, the method comprising: acquiring a video to be processed of an area to be detected based on a monocular camera, wherein the video to be processed comprises continuous video frame images; Performing target detection on each video frame image to obtain an object detection frame of a target object; Acquiring internal reference information of the monocular camera, and correcting camera distortion of an object detection frame based on the internal reference information to obtain a correction detection frame; Carrying out attitude estimation on the target object, and carrying out angle compensation on the correction detection frame based on an attitude estimation result of the target object to obtain a target detection frame; Calculating a single-frame initial distance between the target object and the monocular camera in each video frame image based on the size information of the target detection frame, the internal reference information and the set size information of the target object; And performing stable fitting on the single-frame initial distance of the target object in each video frame image to obtain the target distance.
2. The method of claim 1, wherein performing object detection on each video frame image comprises: inputting each video frame image into a lightweight object detection model to obtain an initial detection frame of at least one target object and a confidence coefficient corresponding to the initial detection frame; The object detection frame is determined from the initial detection frame based on a comparison of the confidence level to a confidence level threshold, the confidence level threshold being determined based on the lighting information and scene demand information of the current video frame image.
3. The method of claim 1, wherein camera distortion correction is performed on an object detection frame based on the internal reference information, comprising: Extracting initial pixel coordinates of a plurality of key points of the object detection frame; mapping the initial pixel coordinates of the key points to an undistorted image coordinate system by a table look-up method based on the distortion coefficient and the internal reference matrix shown by the internal reference information, and obtaining target pixel coordinates of the key points; Reconstructing the object detection frame based on target pixel coordinates of the plurality of key points to obtain the correction detection frame.
4. A method according to claim 3, wherein extracting initial pixel coordinates of a plurality of keypoints of the object detection frame comprises: Determining the number of lines and the number of columns for sampling the boundary of the object detection frame based on the size information of the object detection frame and the resolution information of the current video frame image; and sampling on the boundary of the object detection frame based on the determined number of rows and columns, generating the plurality of key points, and extracting the initial pixel coordinates of each key point.
5. The method of claim 1, wherein angle compensating the corrective detection frame based on the pose estimation result of the target object comprises: Determining a deflection included angle between the direction of the target object and the optical axis direction of the monocular camera based on the gesture estimation result; Calculating a compensation coefficient corresponding to the deflection included angle by using a cosine function; determining a target pixel height based on the pixel height of the correction detection frame and the compensation coefficient; And updating the pixel height of the correction detection frame based on the target pixel height to obtain the target detection frame.
6. The method according to claim 1, wherein calculating a single frame initial distance between the target object and the monocular camera in each video frame image based on the size information of the target detection frame, the reference information of the monocular camera, and the set size information of the target object, comprises: And calculating the single frame initial distance between the target object and the monocular camera in each video frame image by using a ranging formula based on the pixel height shown by the size information of the target detection frame, the focal length shown by the internal reference information of the monocular camera and the standard physical height of the target object shown by the set size information.
7. The method of claim 1, wherein robustly fitting the target object's single frame initial distance at each video frame image comprises: And performing robust fitting on the single-frame initial distance of the target object in each video frame image by using a random sampling coincidence algorithm.
8. A monocular distance measuring device, the device comprising: the acquisition module is used for acquiring a video to be processed of the area to be detected based on the monocular camera, wherein the video to be processed comprises continuous video frame images; the detection module is used for carrying out target detection on each video frame image to obtain an object detection frame of a target object; the correction module is used for acquiring the internal reference information of the monocular camera, and correcting the distortion of the object detection frame based on the internal reference information to obtain a correction detection frame; The compensation module is used for estimating the gesture of the target object, and carrying out angle compensation on the correction detection frame based on the gesture estimation result of the target object to obtain a target detection frame; the calculation module is used for calculating a single-frame initial distance between the target object and the monocular camera in each video frame image based on the size information of the target detection frame, the internal reference information and the set size information of the target object; and the fitting module is used for carrying out stable fitting on the single-frame initial distance of the target object in each video frame image to obtain the target distance.
9. An electronic device, comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-7.

Description

Monocular ranging method, monocular ranging device, monocular ranging equipment and storage medium Technical Field The present application relates to the field of ranging technologies, and in particular, to a monocular ranging method, device, apparatus, and storage medium. Background Pedestrian distance measurement is a key technology for realizing safety functions such as collision early warning and automatic braking of a vehicle-mounted driving auxiliary system. In the related ranging scheme, monocular vision is widely studied and applied because it requires only a single camera at low cost. The method mainly estimates the distance through the proportional relation between the visual size of the target in the image and the real physical size of the target. However, in the practical application process, quality defects of the original image, diversity of the target gesture and natural fluctuation of the single frame detection result can obviously influence the precision and stability of the monocular ranging, so that the application of the monocular ranging in a safety system with high reliability requirements is limited. Although there are also solutions for binocular vision, infrared, etc. that provide higher accuracy ranging, the hardware cost and system complexity of these solutions are also relatively higher. Disclosure of Invention The application provides a monocular distance measuring method, a monocular distance measuring device, monocular distance measuring equipment and a storage medium, which are used for at least solving the technical problems in the prior art. According to a first aspect of the present application, there is provided a monocular ranging method, the method comprising: acquiring a video to be processed of an area to be detected based on a monocular camera, wherein the video to be processed comprises continuous video frame images; Performing target detection on each video frame image to obtain an object detection frame of a target object; Acquiring internal reference information of the monocular camera, and correcting camera distortion of an object detection frame based on the internal reference information to obtain a correction detection frame; Carrying out attitude estimation on the target object, and carrying out angle compensation on the correction detection frame based on an attitude estimation result of the target object to obtain a target detection frame; Calculating a single-frame initial distance between the target object and the monocular camera in each video frame image based on the size information of the target detection frame, the internal reference information and the set size information of the target object; And performing stable fitting on the single-frame initial distance of the target object in each video frame image to obtain the target distance. In one embodiment, performing object detection on each video frame image includes: inputting each video frame image into a lightweight object detection model to obtain an initial detection frame of at least one target object and a confidence coefficient corresponding to the initial detection frame; The object detection frame is determined from the initial detection frame based on a comparison of the confidence level to a confidence level threshold, the confidence level threshold being determined based on the lighting information and scene demand information of the current video frame image. In an embodiment, the method for correcting the distortion of the object detection frame based on the internal reference information includes: Extracting initial pixel coordinates of a plurality of key points of the object detection frame; mapping the initial pixel coordinates of the key points to an undistorted image coordinate system by a table look-up method based on the distortion coefficient and the internal reference matrix shown by the internal reference information, and obtaining target pixel coordinates of the key points; Reconstructing the object detection frame based on target pixel coordinates of the plurality of key points to obtain the correction detection frame. In one embodiment, extracting initial pixel coordinates of a plurality of keypoints of the object detection frame includes: Determining the number of lines and the number of columns for sampling the boundary of the object detection frame based on the size information of the object detection frame and the resolution information of the current video frame image; and sampling on the boundary of the object detection frame based on the determined number of rows and columns, generating the plurality of key points, and extracting the initial pixel coordinates of each key point. In an embodiment, performing angle compensation on the correction detection frame based on the estimated pose of the target object includes: Determining a deflection included angle between the direction of the target object and the optical axis direction of the monocular camera based on the