CN-122001993-A - Video fusion method, system and program product based on shadow mapping

CN122001993ACN 122001993 ACN122001993 ACN 122001993ACN-122001993-A

Abstract

The invention relates to the technical field of video fusion and discloses a video fusion method, a system and a program product based on shadow mapping, wherein the method comprises the following steps of creating a virtual perspective camera in a three-dimensional scene and generating the shadow mapping according to the visual cone of the virtual perspective camera; the method comprises the steps of converting pixels in a three-dimensional scene from a visual space coordinate system of a physical world camera to a virtual perspective camera coordinate system, obtaining NDC coordinates of the pixels, aligning the NDC coordinates with texture coordinates of a shadow map to obtain sampling texture coordinates, judging whether the pixels in a visual range of the virtual perspective camera are in a visual area of the shadow map to obtain the color of the current pixels of the three-dimensional scene, and obtaining the color of the current pixels of the fused three-dimensional scene. The invention solves the problems of processing errors and the like of the shielding area in the prior art.

Inventors

HU QINGCHAO
ZHOU ZHOU

Assignees

安徽智汇云舟科技有限公司

Dates

Publication Date: 20260508
Application Date: 20251205

Claims (10)

1. A video fusion method based on shadow mapping, comprising the steps of: creating a virtual perspective camera in a three-dimensional scene according to information of a physical world camera corresponding to the video to be fused, and generating a shadow map according to a cone of the virtual perspective camera; Converting pixels in the three-dimensional scene from a visual space coordinate system of a physical world camera to a virtual perspective camera coordinate system, acquiring NDC coordinates of the pixels, aligning the NDC coordinates of the pixels with texture coordinates of the shadow map, and obtaining the aligned coordinates of the pixels as sampling texture coordinates; Judging whether pixels in the visual range of the virtual perspective camera are in the visual area of the shadow map or not, if yes, sampling textures of the video stream through sampling texture coordinates to obtain colors of the video stream, using the sampled colors of the video stream as colors of current pixels of the three-dimensional scene, and if not, reserving the colors of the current pixels of the three-dimensional scene, so that the colors of the current pixels of the three-dimensional scene after fusion are obtained.
2. The video fusion method based on shadow mapping according to claim 1, wherein the converting the pixel in the three-dimensional scene from the view space coordinate system of the physical world camera to the virtual perspective camera coordinate system, obtaining the NDC coordinate of the pixel, aligning the NDC coordinate of the pixel with the texture coordinate of the shadow mapping, and the aligned pixel coordinate is the sampling texture coordinate, comprising the following steps: A picture is read from each frame of the video stream as texture, and the video stream is cast into a viewable area of the shadow map according to texture coordinates of the shadow map.
3. The video fusion method based on shadow mapping according to claim 2, wherein the converting the pixel in the three-dimensional scene from the view space coordinate system of the physical world camera to the virtual perspective camera coordinate system, obtaining the NDC coordinate of the pixel, aligning the NDC coordinate of the pixel with the texture coordinate of the shadow mapping, and the aligned pixel coordinate is the sampling texture coordinate, comprising the following steps: when a three-dimensional scene is rendered, the pixel shader projects pixel coordinates of an image under virtual camera coordinates to a camera NDC space, and texture coordinates of a shadow map are calculated; The fragment shader uses texture coordinates of the shadow map to sample texture objects of the video stream to obtain video colors.
4. A video fusion method based on shadow mapping according to claim 3, wherein the steps of converting pixels in a three-dimensional scene from a view space coordinate system of a physical world camera to a virtual perspective camera coordinate system, obtaining NDC coordinates of the pixels, aligning the NDC coordinates of the pixels with texture coordinates of the shadow mapping, and obtaining the aligned coordinates of the pixels as sampling texture coordinates include the following steps: Sampling the shadow map according to the texture coordinates of the shadow map to calculate a shadow demarcation value; Comparing the shadow demarcation value of the pixels of the image rendered by the virtual perspective camera under the NDC coordinates with the shadow demarcation value of the sampled shadow map, and calculating a shadow demarcation value difference; And if at least one of the depth value difference and the shadow demarcation value difference is out of the set offset range, using the original color of the pixel of the image rendered by the virtual perspective camera under the NDC coordinates as the final color of the pixel.
5. The shadow mapping-based video fusion method according to claim 1, wherein the creating a virtual perspective camera in a three-dimensional scene according to information of a physical world camera corresponding to a video to be fused comprises the following steps: The position of the virtual perspective camera is set as a light source position and the observation direction is set as a light source direction.
6. The shadow mapping-based video fusion method of claim 1, wherein parameters of the virtual perspective camera can be dynamically adjusted by script to adapt to camera position and/or angle change in real time.
7. The shadow map based video fusion method of claim 6, wherein the parameters of the virtual perspective camera include one or more of: ‌ position ‌, representing coordinates of the virtual perspective camera in world space, for determining a starting point for viewing the three-dimensional scene; ‌ towards ‌, for defining a viewing direction of a virtual perspective camera object; ‌ field angle ‌ for controlling the field of view of a virtual perspective camera object; ‌ ‌ near-far clipping plane ‌ for defining a visual depth range; ‌ aspect ratio ‌ for controlling the image scale.
8. The shadow map-based video fusion method of any one of claims 1 to 7, wherein creating a virtual perspective camera in a three-dimensional scene comprises the steps of: A WebGL-based virtual perspective camera is created in a three-dimensional scene.
9. A shadow mapping based video fusion system comprising a memory, a processor, a computer program stored on the memory, the processor implementing the steps of a shadow mapping based video fusion method according to any one of claims 1 to 8 when the computer program is executed.
10. A program product comprising a computer program which when executed performs the steps of a shadow mapping based video fusion method according to any one of claims 1 to 8.

Description

Video fusion method, system and program product based on shadow mapping Technical Field The invention relates to the technical field of video fusion, in particular to a video fusion method, a system and a program product based on shadow mapping. Background Video fusion is a technology of combining a plurality of video sources into a single continuous video stream, and is widely applied to the fields of multi-view presentation, augmented reality, intelligent monitoring and the like. The core aim is to generate a composite video with more information integrity and visual readability by integrating multi-source video data. The prior video fusion technical scheme and the defect analysis: In the first prior art, a patch model texture map based on artificial modeling: the technical principle is that a patch model matched with the shape of the video is manually created in modeling software, and is loaded into a three-dimensional scene to update the texture material of the video frame by frame. The disadvantage is that each video needs to be customized with a single dough model, and can not be automatically adapted to different video shapes or dynamically changing camera viewing angles (such as a rotary spherical machine). The camera position/angle needs to be modeled again when changing, and the maintenance cost is high. Only video fusion with a fixed viewing angle is supported, and camera movement or rotation scenes cannot be adapted in real time. In the second prior art, video projection is based on WebGL decal technology (WebGL decal technology is technology of adding local details on the surface of a 3D model by projecting textures, and the core principle is that 2D textures are projected onto a target grid according to a normal direction): the technical principle is that video is projected to a designated area of a three-dimensional scene frame by frame to directly cover surface materials. The disadvantage is that the decal technique disregards the geometrical occlusion relationship of the three-dimensional scene, and the video material can cover the occlusion area (e.g. objects behind walls), resulting in visual errors. The invisible part in the projection area is filled with video content forcedly, and the accuracy is low against the physical illumination rule. Disclosure of Invention In order to overcome the defects of the prior art, the invention provides a video fusion method, a system and a program product based on shadow mapping, which solve the problems of processing errors of a shielding area and the like in the prior art. The technical scheme for solving the technical problems is as follows: a video fusion method based on shadow mapping comprises the following steps: creating a virtual perspective camera in a three-dimensional scene according to information of a physical world camera corresponding to the video to be fused, and generating a shadow map according to a cone of the virtual perspective camera; Converting pixels in the three-dimensional scene from a visual space coordinate system of a physical world camera to a virtual perspective camera coordinate system, acquiring NDC coordinates of the pixels, aligning the NDC coordinates of the pixels with texture coordinates of the shadow map, and obtaining the aligned coordinates of the pixels as sampling texture coordinates; Judging whether pixels in the visual range of the virtual perspective camera are in the visual area of the shadow map or not, if yes, sampling textures of the video stream through sampling texture coordinates to obtain colors of the video stream, using the sampled colors of the video stream as colors of current pixels of the three-dimensional scene, and if not, reserving the colors of the current pixels of the three-dimensional scene, so that the colors of the current pixels of the three-dimensional scene after fusion are obtained. The beneficial effects of the invention are as follows: The invention can freely rotate the dome camera and the mobile camera according to the need, solves the defect that the first scheme of the prior art needs to be modeled again or manually calibrated, only projects the video to the ShadowMap visual area, keeps the original scene content in the shielding part, accords with the real physical shielding logic, solves the problem that the second decal technology of the prior art covers the shielding area, solves the problem that the traditional video fusion technology relies on the manual modeling and the shielding area to process wrong industrial pain points, realizes the efficient, real and full-automatic three-dimensional scene video dynamic fusion, and is suitable for complex scenes needing to adapt to camera changes in real time. On the basis of the technical scheme, the invention can be improved as follows. As a preferred technical solution, the converting the pixel in the three-dimensional scene from the view space coordinate system of the physical world camera to the virtual perspective camera