CN-122023739-A - Constraint semantic mask enhancement system based on physical law and space calculation reasoning method

CN122023739ACN 122023739 ACN122023739 ACN 122023739ACN-122023739-A

Abstract

The invention provides a constraint semantic mask enhancement system based on a physical law and a space calculation reasoning method, wherein the system part comprises the following modules: an initial semantic mask generating module, a physical base constructing module, a semantic mask enhancing module, a time sequence consistency module, a dynamic confidence evaluation and mode switching module, a space calculation reasoning module, a physical consistency auditing module, a rendering and safety access module and a space reference alignment module, the invention constructs a physical-visual double closed-loop space computing system, the system can be deployed in a head-mounted terminal, a mobile terminal, a vehicle-mounted terminal or an end cloud cooperative system, and an auditable and executable space reality layer is established between a physical entity and AI vision, so that boundary stability and interaction authenticity are maintained in a complex environment, the system has consistent applicability to different physical field perception modes, mask position stability is maintained when high-speed movement and vision blurring are carried out, and frame-crossing drift and flicker are remarkably reduced.

Inventors

CHEN HUICHONG

Assignees

武汉华创全息数字科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260130

Claims (11)

1. A system for enhancing semantic masks based on physical laws, comprising the following modules: An initial semantic mask generation module for performing semantic segmentation/instance segmentation on the current frame image and outputting pixel-level or instance-level semantic mask ; A physical base construction module for constructing physical base data from at least one non-visual physical field data source And outputs a physical credibility index set ; A semantic mask enhancement module for masking the initial semantic mask Mapping to a physical coordinate system and utilizing physical base data Performing topology clipping/repairing and edge reinforcement to obtain a single-frame enhanced mask ; A time sequence consistency module for being based on physical motion vector Performing cross-frame prediction and updating on the mask, realizing time coherence and outputting an enhanced mask with consistent time ; The dynamic confidence evaluation and mode switching module is used for calculating the mixed confidence factor And switching between a physical dominant mode and a vision + physical clipping mode in dependence on a threshold τ; A space calculation reasoning module for generating object pose/track/contact state prediction according to the enhanced mask, the physical base and the interactive instruction ; The physical consistency audit module is used for executing physical conflict detection on the predicted pose/track/contact state or rendering/interaction instruction output by the space calculation reasoning module, triggering forced correction/projection back to a feasible domain/rollback strategy if necessary, and outputting rendering and interaction instructions passing audit; rendering and security admission module for enhanced mask based Realizing shielding, collision and rendering instruction verification, and executing physical consistency admission control on a third party rendering/interaction instruction; And when the deviation angle residual error between the virtual vertical axis obtained by visual positioning/mapping or pose estimation and the gravity vector exceeds a threshold value, the system executes dynamic correction on the visual coordinate system or the rendering coordinate system through a rotation compensation operator so as to forcedly eliminate the deviation angle drift of the virtual horizontal plane relative to the real ground plane and keep the enhanced semantic mask aligned with the physical topology.
2. The constraint semantic mask enhancement system based on physical laws of claim 1 wherein the system performs unified time base management on visual data and physical field data, wherein a time stamp is given to each frame of image and corresponding physical observation, and multi-sensor clock drift is estimated and corrected to meet alignment requirements of the same time or equivalent time.
3. The constraint semantic mask enhancement system based on the physical law according to claim 2, wherein the system performs external parameter calibration on a visual sensor coordinate system and a physical sensor coordinate system to obtain a coordinate transformation relation, and uses the transformation relation to realize space alignment under a unified physical coordinate system when mask projection/mapping and boundary constraint are performed; when the synchronization quality or the extrinsic credibility is lower than a threshold value, the system triggers a degradation strategy, namely, degradation Or enter a physical dominant/static alignment mode.
4. The system for enhancing semantic masks based on physical laws of claim 1, further comprising a physical base Is characterized by comprising the following steps: the physical base data The acquisition mode comprises real-time sensor acquisition, time-space snapshot backtracking based on historical perception data and pre-stored digital twin priori called from cloud or local storage.
5. The constraint semantic mask enhancement system based on physical laws of claim 4 wherein the Sources of (c) include, but are not limited to: Electromagnetic/radio physical field sensing, UWB impulse response (CIR), wi-Fi/6G multipath fingerprint, millimeter wave/microwave radar echo, RSSI/phase/angle of arrival/doppler; acoustic echo sensing, namely, distance/contour information formed by ultrasonic/sonar echo; Inertial measurement envelope, which is the speed, displacement and motion envelope formed by IMU/odometer; pre-storing a digital twin priori, namely pre-storing a hard boundary provided by a CAD/BIM/point cloud/three-dimensional map/structural model; depth/point cloud perception: liDAR/ToF/structured light/binocular depth; Monocular visual motion recovery structure, namely depth feature or structural feature calculated based on monocular visual SfM/monocular VIO combined with gravity constraint, is used for forming a physical boundary/feasible region with spatial dimension.
6. The constraint semantic mask enhancement system based on physical laws of claim 5 wherein the physical base data Expressed in any one or combination of the following: Distance field/signed distance field ; Occupancy grid/voxel representation ; Hard boundary set ; Boundary surface or feasible region set obtained by inversion of radio/acoustic reflection characteristics 。
7. The constraint semantic mask enhancement system based on physical laws of claim 1 wherein the physical coherence auditing module/audit filter performs a gravity-reference-based virtual-to-real physical coherence check that will enhance semantic masking And when the predicted result violates the physical boundary constraint, the gravity direction consistency constraint or the contact feasibility constraint, the audit filter has a ticket overrule right, and the rendering/interaction instruction is forcibly intercepted and corrected by at least one mode of projection back to a feasible domain, rigid body collision correction, dynamic constraint correction or rollback strategy so as to inhibit the penetration, drift or unreasonable displacement of a virtual object relative to a physical entity.
8. A method of spatial computational reasoning for use in a semantic mask enhancement system as claimed in claim 1, comprising a reasoning generation flow: the space calculation reasoning module enhances the mask according to the user interaction instruction Physical base Generating interactive prediction output 。
9. The space calculation reasoning module generates interaction prediction output according to the user interaction instruction, the enhanced mask and the physical base, inputs the interaction prediction output to the physical consistency auditing module/auditing filter to execute consistency verification based on enhanced mask boundary constraint and gravitation reference, outputs rendering and interaction instruction passing audit when verification is passed, triggers at least one forced correction strategy, namely rigid body collision correction, newton dynamics constraint correction, projection back to a feasible domain or rollback strategy when verification is not passed, and outputs rendering and interaction instruction passing audit after correction.
10. The method of claim 8, wherein the interactive prediction output comprises at least virtual object pose, motion trajectory or speed, contact point/contact normal/collision depth, and interaction parameters.
11. The spatial computational inference method of claim 9, further comprising physical conflict detection, wherein the audit filter performs physical conflict detection on the predicted pose/trajectory/contact state or rendering/interaction instructions output by the spatial computational inference module, including at least boundary conflict detection, gravity consistency detection, infeasible contact detection, and speed mutation detection, and wherein the audit filter is independent of the AI model structure.

Description

Constraint semantic mask enhancement system based on physical law and space calculation reasoning method Technical Field The invention relates to the technical fields of space computation (Spatial Computing), artificial intelligence and augmented reality/mixed reality (AR/MR), in particular to a constraint semantic mask enhancement system based on a physical law and a space computation reasoning method. Background With the development of space computing and augmented reality/mixed reality (AR/MR) application, a system is usually required to be completed in a real scene (1) for semantic understanding and region division of a real object for shielding relation, interactive region definition and content fitting, and (2) for reasoning and predicting the space pose, motion trail and interactive behavior of a virtual object for realizing realistic interaction such as edge fitting, collision, wall leaning, grabbing and the like. The method mainly adopts the following routes in the prior art, namely pure visual semantic segmentation/instance segmentation routes, wherein under the condition of image blurring caused by weak light, shadow, reflection, low texture, shielding or rapid movement, the edges of semantic masks output by the pure visual semantic segmentation are easy to shake, adhere, miss or drift, and are difficult to be aligned with the physical boundary of a real object, so that the problems of wrong shielding relation, drift of welt content, misjudgment of an interaction area and the like are caused. In the prior art, when a virtual object interacts with a real object, phenomena such as mold penetration, boundary crossing, suspension, unrealistic contact and the like often occur. In particular, under the scenes of 'wall leaning, edge adhering, grabbing, collision rebound', and the like, which need strict boundary constraint, the AI reasoning output easily generates pose changes which do not accord with the physical laws, and the immersion feeling and usability are affected. Meanwhile, in the prior art, even if a physical boundary is introduced, the mask is only corrected at a single frame level. Disclosure of Invention Aiming at the defects existing in the prior art, the invention aims to provide a constraint semantic mask enhancement system based on a physical law and a space calculation reasoning method so as to solve the problems in the background art, and an auditable and executable space reality layer (Truth Layer) is established between a physical entity and AI vision, so that the stability of a boundary and the authenticity of interaction are kept in a complex environment. In order to achieve the purpose, the invention is realized by the following technical scheme that the semantic mask enhancement system is constrained based on the physical law and comprises the following modules: An initial semantic mask generation module for performing semantic segmentation/instance segmentation on the current frame image and outputting pixel-level or instance-level semantic mask ; A physical base construction module for constructing physical base data from at least one non-visual physical field data sourceAnd outputs a physical credibility index set; A semantic mask enhancement module for enhancing the semantic mask of a document to be processedMapping to a physical coordinate system and utilizingPerforming topology clipping/repairing and edge reinforcement to obtain a single-frame enhanced mask; A time sequence consistency module for being based on physical motion vectorPerforming cross-frame prediction and updating on the mask, realizing time coherence and outputting an enhanced mask with consistent time; The dynamic confidence evaluation and mode switching module is used for calculating the mixed confidence factorAnd switching between a physical dominant mode and a vision + physical clipping mode in dependence on a threshold τ; A space calculation reasoning module for generating object pose/track/contact state prediction according to the enhanced mask, the physical base and the interactive instruction ; Physical consistency auditing module for pairExecuting physical conflict detection, triggering forced correction/projection back to a feasible domain/rollback strategy if necessary, and outputting rendering and interaction instructions passing audit; rendering and security admission module for based on And shielding, collision and rendering instruction verification are realized, and physical consistency admission control is executed on the third party rendering/interaction instruction. And when the deviation angle residual error between the virtual vertical axis obtained by visual positioning/mapping or pose estimation and the gravity vector exceeds a threshold value, the system executes dynamic correction on the visual coordinate system or the rendering coordinate system through a rotation compensation operator so as to forcedly eliminate the deviation angle drift of the virtual horizontal plane relative to