CN-122018696-A - Intelligent interactive control method and system based on visual integration

CN122018696ACN 122018696 ACN122018696 ACN 122018696ACN-122018696-A

Abstract

The invention relates to an intelligent interactive control method and system based on visual integration, and belongs to the field of visual analysis. The method comprises the steps of collecting visual features of a user in a writing process through intelligent equipment, carrying out multi-scale visual feature fusion to obtain a space-time feature tensor, carrying out intention inference based on the space-time feature tensor, outputting an intention-confidence degree pair, carrying out visual-action mapping based on the intention-confidence degree pair, outputting a preliminary interaction instruction set, and carrying out stability optimization on the preliminary interaction instruction set to obtain a final instruction set. The invention realizes the transition from passive recording to active understanding by fusing multi-scale visual perception, implicit state modeling and probability intention inference, not only can accurately identify writing content, but also can deeply understand user intention and provide proper interactive guidance.

Inventors

LI HENGTAO

Assignees

上海有我科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260211

Claims (6)

1. An intelligent interactive control method based on visual integration is characterized by comprising the following steps: step S1, acquiring visual features of a user in the writing process through intelligent equipment, and carrying out multi-scale visual feature fusion to obtain a space-time feature tensor; s2, intention inference is carried out based on the space-time characteristic tensor, and an intention-confidence degree pair is output; Step S3, performing vision-action mapping based on the intention-confidence level pair, and outputting a preliminary interaction instruction set; And S4, performing stability optimization on the preliminary interaction instruction set to obtain a final instruction set.
2. The visual integration-based intelligent interactive control method according to claim 1, wherein the multi-scale visual feature fusion in step S1 is specifically: Acquiring the visual characteristics through intelligent equipment, wherein the visual characteristics comprise spatial layout characteristics, motion trail characteristics and detail texture characteristics; Deriving the spatio-temporal feature tensor based on the visual features, mathematically described as , wherein, As a tensor of the temporal-spatial characteristics, K is the number of cameras and is a nonlinear fusion operator, In order to mask the spatial attention, For the convolution operator, a local feature fusion is represented, For the weight of the texture feature, For the normalized laplace characteristic, In order for the laplace operator to act on the image, For the image intensity captured by the kth camera at position (x, y) and time t, In order to prevent a small constant of zero removal, For the weight of the motion feature, For the normalized time-varying characteristics, As the time rate of change of the image intensity, For the reference time to be used, As a function of the scale-specific transformation, In order to transform the parameters of the model, For element-by-element multiplication.
3. The visual integration-based intelligent interactive control method according to claim 1, wherein the intention inference in the step S2 is specifically: presetting a prediction time window delta t, and simultaneously obtaining a historical space-time characteristic tensor of [0, t ]; Presetting an implicit context state h; acquiring initial probability density of the implicit context state, and outputting the probability density of the implicit context state given the historical spatiotemporal feature tensor through a context extraction function; System transition to predicted future state given the implicit context state and the historical spatiotemporal feature tensor via state transition kernel function output Conditional probability density of (2); obtaining a predicted future state The conditional probability density given the historical spatiotemporal feature tensor is mathematically described as , wherein, In order to predict the future state of the vehicle, A historical spatiotemporal feature tensor of [0, t ], For predicted future states The conditional probability density given the historical spatiotemporal feature tensor, In order for the context space to be implicit, For the context extraction function, As a function of the state transition kernel, An initial probability density that is an implicit context state; By predicted future states The conditional probability density given the historical spatiotemporal feature tensor, results in a confidence, mathematically described as Wherein C is the confidence, var (& gt) is the variance, Is the square root of the maximum allowable variance; matching predicted future states The intent-confidence pair is derived given the conditional probability density and the confidence given the historical spatiotemporal feature tensor.
4. The visual integration-based intelligent interactive control method according to claim 1, wherein the visual-action mapping in step S3 is specifically: Mapping writing intention to action through action manifold to obtain interaction action instruction vector, which is mathematically described as , wherein, In order to interact with the motion instruction vector, For the projection operator, the action is mapped onto the action manifold M, As a result of the standard instruction weight, For writing purposes Is used in the geometric representation of the (c), For the uncertainty response weight to be a function of, For the gradient operator with respect to the intent, For the uncertainty sensitivity parameter, C is the confidence, For adaptive noise, σ (C) is the noise strength; And outputting the preliminary interaction instruction set based on the interaction instruction vector.
5. The visual integration-based intelligent interactive control method according to claim 4, wherein the stability optimization in step S4 is specifically: Evaluating consistency of the historical interaction instructions and the visual feedback through a visual-action consistency function; deriving final instructions based on the vision-action consistency function, mathematically described as , In order to be the final instruction, For the reference time to be used, For the exponentially decaying weight, For the historical feedback decay factor, As a function of the consistency of the vision-action, In order for the smoothness to be a parameter for adjustment, For the reference to the smoothing coefficient(s), Is a command acceleration; The final instruction set is derived based on the final instruction.
6. An intelligent interactive control system based on visual integration, which is characterized in that the system is applied to the intelligent interactive control method based on visual integration according to any one of claims 1-5, and comprises a visual feature fusion module, an intention inference module, a visual-action mapping module and an optimization module; The visual characteristic fusion module is used for collecting visual characteristics of a user in the writing process through intelligent equipment and carrying out multi-scale visual characteristic fusion to obtain a space-time characteristic tensor; the intention inference module is used for carrying out intention inference based on the space-time characteristic tensor and outputting an intention-confidence degree pair; the vision-action mapping module is used for performing vision-action mapping based on the intention-confidence level pair and outputting a preliminary interaction instruction set; And the optimization module is used for performing stability optimization on the preliminary interaction instruction set to obtain a final instruction set.

Description

Intelligent interactive control method and system based on visual integration Technical Field The invention belongs to the technical field of visual analysis, and particularly relates to an intelligent interactive control method and system based on visual integration. Background In the development process of man-machine interaction technology, from an early keyboard mouse to a touch screen and then to voice and gesture recognition in recent years, the interaction mode is continuously evolved to a more natural and intuitive direction, however, in the basic field of writing interaction, the prior art still has obvious limitations that the traditional dot matrix pen mainly relies on infrared or electromagnetic induction technology to capture the position of a pen point, the function is limited to track recording and simple recognition, the commercial intelligent pen realizes pressure sensing and inclination detection but lacks deep understanding of the intention of a user, an interaction system based on computer vision can recognize writing content but cannot understand the hidden state in the writing process, and although the academic world has attempts to combine biological signals (such as myoelectricity and eye movement) to enhance interaction, the schemes are complex, expensive and difficult to popularize. The basic defects of the prior art are that the perception dimension is single, the surface layer characteristics such as position and pressure are excessively relied on, deep information such as space layout, motion dynamics and micro texture is ignored, the context understanding capability is lacking, users with different proficiency, emotional states and usage scenes cannot be adapted, the response mechanism is driven by a mechanical machine, only completed actions can not be recorded, the user behaviors cannot be predicted and guided, and the system robustness is insufficient, so that the performance is rapidly reduced under the actual complex environment (such as light change and paper material difference). These problems lead to the difficulty in providing a truly intelligent and personalized user experience for the existing writing interaction system, and severely restrict the deep application of digital writing in the fields of education, creation, remote collaboration and the like. Disclosure of Invention In order to solve the problems in the prior art, the invention provides an intelligent interactive control method and system based on visual integration. The aim of the invention can be achieved by the following technical scheme: An intelligent interactive control method based on visual integration, which comprises the following steps: step S1, acquiring visual features of a user in the writing process through intelligent equipment, and carrying out multi-scale visual feature fusion to obtain a space-time feature tensor; s2, intention inference is carried out based on the space-time characteristic tensor, and an intention-confidence degree pair is output; Step S3, performing vision-action mapping based on the intention-confidence level pair, and outputting a preliminary interaction instruction set; And S4, performing stability optimization on the preliminary interaction instruction set to obtain a final instruction set. Preferably, the multi-scale visual feature fusion in step S1 specifically includes: Acquiring the visual characteristics through intelligent equipment, wherein the visual characteristics comprise spatial layout characteristics, motion trail characteristics and detail texture characteristics; Deriving the spatio-temporal feature tensor based on the visual features, mathematically described as , wherein,As a tensor of the temporal-spatial characteristics,K is the number of cameras and is a nonlinear fusion operator,In order to mask the spatial attention,For the convolution operator, a local feature fusion is represented,For the weight of the texture feature,For the normalized laplace characteristic,In order for the laplace operator to act on the image,For the image intensity captured by the kth camera at position (x, y) and time t,In order to prevent a small constant of zero removal,For the weight of the motion feature,For the normalized time-varying characteristics,As the time rate of change of the image intensity,For the reference time to be used,As a function of the scale-specific transformation,In order to transform the parameters of the model,For element-by-element multiplication. Preferably, the intention inference in the step S2 is specifically: presetting a prediction time window delta t, and simultaneously obtaining a historical space-time characteristic tensor of [0, t ]; Presetting an implicit context state h; acquiring initial probability density of the implicit context state, and outputting the probability density of the implicit context state given the historical spatiotemporal feature tensor through a context extraction function; System transition to predicted futu