CN-122024129-A - Long video positioning method, device, computer equipment and storage medium
Abstract
The invention discloses a long video positioning method, a long video positioning device, computer equipment and a storage medium, which relate to the field of artificial intelligence and comprise the steps of responding to the received abnormal short video, extracting a first target image, acquiring a timestamp associated with the first target image, and determining a target positioning time range in an original long video stream corresponding to the abnormal short video; the method comprises the steps of carrying out target detection on a first target image, extracting at least one first target detection frame of the first target image, carrying out target detection on a second target image in an original long video stream within a target positioning time range, extracting at least one second target detection frame of the second target image, carrying out matching on the first target image and the second target image based on the first target detection frame and the second target detection frame, and determining the accurate positioning position of an abnormal short video in the original long video stream based on the second target image successfully matched with the first target image. The invention realizes the accurate and efficient positioning from the abnormal short video to the original long video.
Inventors
- WANG YUJIAN
- SHAO YU
Assignees
- 深圳市亚略特科技股份有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20251230
Claims (10)
- 1. A long video positioning method, comprising: In response to receiving an abnormally short video, extracting a first target image from the abnormally short video, acquiring a timestamp associated with the first target image, and determining a target positioning time range in an original long video stream corresponding to the abnormally short video based on the timestamp; Performing target detection on the first target image, and extracting at least one first target detection frame of the first target image; Performing target detection on a second target image in the original long video stream within the target positioning time range, and extracting at least one second target detection frame of the second target image; Matching the first target image with the second target image based on the first target detection frame and the second target detection frame; and determining the accurate positioning position of the abnormal short video in the original long video stream based on the second target image successfully matched with the first target image.
- 2. The long video positioning method according to claim 1, wherein determining the target positioning time range in the original long video stream corresponding to the abnormally short video based on the time stamp comprises: and respectively extending forward and backward for a preset time length by taking the time indicated by the time stamp as a reference so as to form the target positioning time range.
- 3. The long video positioning method according to claim 1, wherein determining the target positioning time range in the original long video stream corresponding to the abnormally short video based on the time stamp comprises: obtaining a confidence score for the timestamp, the confidence score determined based on the reliability of the device or system generating the timestamp; determining a duration of the target positioning time range based on the confidence score, wherein the greater the confidence score, the shorter the duration; And determining the target positioning time range based on the duration, wherein the timestamp is positioned at the middle point of the target positioning time range.
- 4. The long video positioning method according to claim 1, wherein if the first target image has only one first target detection frame and the second target image has only one second target detection frame, the matching the first target image and the second target image based on the first target detection frame and the second target detection frame includes: Determining a first similarity of the first target image and the second target image based on the first target detection frame and the second target detection frame; If the first similarity is larger than a preset similarity threshold, judging that the first target image is matched with the second target image; And if the first similarity is not greater than a preset similarity threshold, judging that the first target image is not matched with the second target image.
- 5. The long video positioning method according to claim 4, wherein the determining the first similarity of the first target image and the second target image based on the first target detection frame and the second target detection frame comprises: Calculating a position similarity component and a size similarity component based on the position parameter and the size parameter of the first target detection frame and the position parameter and the size parameter of the second target detection frame respectively; And carrying out weighted calculation on the position similarity component and the size similarity component according to preset position weight and size weight so as to obtain the first similarity.
- 6. The long video positioning method according to claim 4, wherein the determining the first similarity of the first target image and the second target image based on the first target detection frame and the second target detection frame comprises: determining a first feature vector of the first target detection frame based on the position parameter and the size parameter of the first target detection frame; determining a second feature vector of the first target detection frame based on the position parameter and the size parameter of the second target detection frame; and calculating the similarity based on the first feature vector and the second feature vector to determine the first similarity.
- 7. The long video positioning method according to claim 1, wherein if the first target image includes a plurality of first target detection frames and the second target image includes a plurality of second target detection frames, the matching the first target image and the second target image based on the first target detection frames and the second target detection frames includes: if the number of the first target detection frames is different from the number of the second target detection frames, judging that the first target image is not matched with the second target image; if the number of the first target detection frames is the same as that of the second target detection frames, respectively determining the second target detection frames corresponding to the first target detection frames; respectively calculating first similarity between each first target detection frame and a corresponding second target detection frame; Acquiring first relative position features of a plurality of first target detection frames of the first target image and second relative position features of a plurality of second target detection frames of the second target image, and determining a second similarity based on the first relative position features and the second relative position features; Determining the overall similarity of the first target image and the second target image based on the first similarity and the second similarity of each first target detection frame and the corresponding second target detection frame; If the overall similarity is larger than a preset similarity threshold, judging that the first target image is matched with the second target image; and if the overall similarity is not greater than a preset similarity threshold, judging that the first target image and the second target image are not matched.
- 8. A long video positioning device comprising means for performing the method of any of claims 1-7.
- 9. A computer device, characterized in that it comprises a memory on which a computer program is stored and a processor which, when executing the computer program, implements the method according to any of claims 1-7.
- 10. A computer readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method according to any of claims 1-7.
Description
Long video positioning method, device, computer equipment and storage medium Technical Field The present invention relates to the field of artificial intelligence technologies, and in particular, to a long video positioning method, a long video positioning device, a computer device, and a storage medium. Background In the field of video monitoring, for fast presentation of an abnormal event, short video clips containing abnormal content are typically cut from a long-time recorded original video stream. Although the short video can rapidly show the abnormal event, the short video is limited by the duration, and often cannot completely show the overall view of the abnormal event, such as the cause before the occurrence, the subsequent development process or the related environmental change and other key context information. Therefore, in practical applications, it is often necessary to reversely locate the generated short video to its corresponding period in the original long video, so as to perform the whole process backtracking and analysis. Currently, common positioning approaches rely on short video-attached generation timestamps. However, due to the reasons of equipment clock bias, system recording delay or storage synchronization error, the timestamp information often has inaccuracy or offset phenomenon, so that when searching is performed in a long video only according to the timestamp, the matching of frame-level precision is difficult to realize, and even the video section which is completely irrelevant can be located. Disclosure of Invention The embodiment of the invention provides a long video positioning method, a long video positioning device, computer equipment and a storage medium, and aims to solve the technical problem of how to realize quick and accurate positioning in a long video based on short video content on the premise of not depending on an accurate time stamp. In a first aspect, an embodiment of the present invention provides a long video positioning method, including: In response to receiving an abnormally short video, extracting a first target image from the abnormally short video, acquiring a timestamp associated with the first target image, and determining a target positioning time range in an original long video stream corresponding to the abnormally short video based on the timestamp; Performing target detection on the first target image, and extracting at least one first target detection frame of the first target image; Performing target detection on a second target image in the original long video stream within the target positioning time range, and extracting at least one second target detection frame of the second target image; Matching the first target image with the second target image based on the first target detection frame and the second target detection frame; and determining the accurate positioning position of the abnormal short video in the original long video stream based on the second target image successfully matched with the first target image. The method further includes determining a target positioning time range in an original long video stream corresponding to the abnormally short video based on the timestamp, including: and respectively extending forward and backward for a preset time length by taking the time indicated by the time stamp as a reference so as to form the target positioning time range. The method further includes determining a target positioning time range in an original long video stream corresponding to the abnormally short video based on the timestamp, including: obtaining a confidence score for the timestamp, the confidence score determined based on the reliability of the device or system generating the timestamp; determining a duration of the target positioning time range based on the confidence score, wherein the greater the confidence score, the shorter the duration; And determining the target positioning time range based on the duration, wherein the timestamp is positioned at the middle point of the target positioning time range. The further technical scheme is that if the first target image has only one first target detection frame and the second target image has only one second target detection frame, the matching of the first target image and the second target image based on the first target detection frame and the second target detection frame includes: Determining a first similarity of the first target image and the second target image based on the first target detection frame and the second target detection frame; If the first similarity is larger than a preset similarity threshold, judging that the first target image is matched with the second target image; And if the first similarity is not greater than a preset similarity threshold, judging that the first target image is not matched with the second target image. The further technical scheme is that the determining the first similarity between the first target image and the