CN-121995596-A - Optical focal length automatic regulating control system based on intelligent sensor

CN121995596ACN 121995596 ACN121995596 ACN 121995596ACN-121995596-A

Abstract

The invention discloses an optical focal length automatic adjustment control system based on an intelligent sensor, which relates to the technical field of optical engineering and comprises a multi-mode feature fusion module, a predictive tracking modeling module, a dynamic relation graph construction module, a graph network priority evaluation module, a target track prediction module, a camera control instruction generation module and a lens driving execution module, wherein the multi-mode feature fusion module is used for generating a multi-mode feature graph, the predictive tracking modeling module is used for introducing a scanning and state updating mechanism of Mamba networks to improve a TRANSTRACK model and output multi-target tracking results, the dynamic relation graph construction module is used for forming a dynamic time-space relation graph, the graph network priority evaluation module is used for evaluating target priority, the target track prediction module is used for predicting target tracks, the camera control instruction generation module is used for generating focusing and aperture control instructions, and the lens driving execution module is used for executing focusing and depth of field adjustment actions. The invention overcomes the limitations of response lag, blind decision and neglect of the association relation between targets in the traditional optical focal length adjustment method, and provides a high-efficiency and accurate solution.

Inventors

JIN JIAN
SHI LONG

Assignees

美视(杭州)人工智能科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260205

Claims (10)

1. An optical focal length automatic adjustment control system based on an intelligent sensor is characterized by comprising the following modules: the multi-modal feature fusion module is used for synchronously collecting multi-source sensor data, extracting and fusing vision, geometry and motion modal features, and generating a multi-modal feature map; The predictive tracking modeling module is used for introducing a scanning and state updating mechanism of Mamba network to improve TRANSTRACK model, performing predictive modeling on the historical track, fusing dual contexts of the multi-mode feature map and the target detection result, and outputting a multi-target tracking result and a posterior tracking query matrix; The dynamic relation graph construction module is used for instantiating the multi-target tracking result into graph nodes, decoding the original relation score in parallel through double branches and generating weighted graph edges to form a dynamic space-time relation graph; The graph network priority evaluation module is used for aggregating neighborhood information and updating node characteristics through a TGN network based on the dynamic space-time relation graph, generating a priority score of each target by utilizing a priority evaluation linear head and outputting a priority classification result; The target track prediction module is used for screening corresponding query vectors from the posterior tracking query matrix based on the priority classification result, and predicting the future three-dimensional space position and the future radial speed of the target through the LSTM network; The camera control instruction generation module is used for calculating a focusing plane based on the future three-dimensional space position and the future radial speed of the target, judging the target interaction relation and determining the depth range by combining the dynamic space-time relation graph, and generating focusing and aperture control instructions; and the lens driving execution module is used for sending the focusing control instruction and the aperture control instruction to the lens driving unit and controlling the lens motor and the aperture blade to execute corresponding focusing and depth-of-field adjustment actions.
2. The automatic optical focal length adjustment control system based on intelligent sensor as claimed in claim 1, wherein the modules are realized by the following steps: s1, synchronously acquiring multi-source sensor data, extracting and fusing vision, geometry and motion modal characteristics, and generating a multi-modal characteristic diagram; S2, introducing Mamba a scanning and state updating mechanism of a network to improve a TRANSTRACK model, performing predictive modeling on a historical track, fusing dual contexts of a multi-mode feature map and a target detection result, and outputting a multi-target tracking result and a posterior tracking query matrix; s3, instantiating a multi-target tracking result as a graph node, decoding original relation scores in parallel through double branches and generating a weighted graph edge to form a dynamic space-time relation graph; S4, based on the dynamic space-time relation diagram, aggregating neighborhood information through a TGN network and updating node characteristics, generating a priority score of each target by utilizing a priority evaluation linear head and outputting a priority classification result; s5, based on the priority classification result, screening corresponding query vectors from the posterior tracking query matrix, and predicting the future three-dimensional space position and the future radial speed of the target through the LSTM network; S6, calculating a focusing plane based on a future three-dimensional space position and a future radial speed of the target, judging a target interaction relation and determining a depth of field by combining a dynamic space-time relation graph, and generating a focusing and aperture control instruction; and S7, sending a focusing control instruction and an aperture control instruction to the lens driving unit to control the lens motor and the aperture blade to execute corresponding focusing and depth-of-field adjustment actions.
3. The intelligent sensor-based optical focal length automatic adjustment control system according to claim 2, wherein S1 comprises: s11, acquiring a main image, an original distance map and inertial measurement unit data in real time through a sensor array; S12, inputting the main image into ResNet networks, and generating a first scale feature map, a second scale feature map, a third scale feature map and a fourth scale feature map step by step through initial convolution, residual block processing and multi-stage downsampling to jointly form a multi-scale visual feature map; S13, performing median filtering and bilinear interpolation up-sampling processing on the original distance map, projecting the original distance map to a main image coordinate system through an external reference for alignment, and generating a geometric feature map through convolution processing; s14, generating motion coding vectors through Kalman filtering and linear transformation based on the inertial measurement unit data, and constructing a motion characteristic diagram aligned with the visual characteristic diagram through space expansion and broadcasting operation; and S15, splicing the second scale feature map, the geometric feature map and the motion feature map along the channel dimension to generate an initial fusion feature map, and performing convolution processing to output a multi-mode feature map.
4. The intelligent sensor-based optical focal length automatic adjustment control system of claim 2, wherein the improved TRANSTRACK model comprises a query decoupling layer, a timing correlation modeling layer, a detection feature projection layer, a detection interaction coding layer, a tracking feature fusion projection layer, a tracking interaction coding layer and a prediction output generation layer: The query decoupling layer is used for receiving a posterior tracking query matrix output by the previous frame and a preset learnable detection query matrix of the current frame, inputting the posterior tracking query matrix into the time sequence association modeling layer, and inputting the detection query matrix into the detection feature projection layer; the time sequence association modeling layer is used for modeling the time sequence dependency relationship of the input posterior tracking query matrix through a scanning and state updating mechanism of Mamba networks, predicting and generating a priori tracking query matrix of the current frame; the detection characteristic projection layer is used for receiving detection query flow and a multi-mode characteristic diagram; mapping the detection query matrix into a detection query embedded matrix through a detection query projection layer; mapping the multi-mode feature map to a visual feature key matrix through a feature map key projection layer; The detection interactive coding layer is used for performing parallelized cross attention calculation on the detection query embedding matrix, the visual feature key matrix and the visual feature value matrix, distributing context information from the feature map for each query vector in the detection query embedding matrix, and generating an enhanced detection query matrix; The tracking feature fusion projection layer respectively maps the prior tracking query, the enhanced detection query and the multi-mode feature map into corresponding query, key and value matrixes through a plurality of parallel projection layers, fuses key value information of vision and detection semantics, and generates a fused upper and lower Wen Jian matrix and a value matrix for tracking interaction; The tracking interactive coding layer is used for performing parallelized cross attention calculation on the tracking query matrix, the fusion context Wen Jian matrix and the fusion context value matrix, distributing double context information from the feature map and the enhanced detection query matrix for each query vector in the tracking query matrix, and generating a posterior tracking query matrix of the current frame; The prediction output generation layer is used for decoding a posterior tracking query matrix of the current frame and generating a target instance for each query vector of the posterior tracking query matrix, taking the original posterior tracking query matrix as an embedded feature of each target instance and correlating with the corresponding target instance; and outputting a multi-target tracking result and a posterior tracking query matrix.
5. The optical focal length automatic adjustment control system based on intelligent sensor according to claim 2, wherein S3 specifically comprises: S31, taking each target instance in the multi-target tracking result as a graph node, and extracting and splicing the node feature vectors from the bounding boxes, the categories and the embedded features of the target instances; S32, combining the feature vectors of all the nodes into a node feature matrix, and performing parallelized matrix multiplication operation on the node feature matrix and a corresponding transpose matrix to generate an original relation score matrix; S33, inputting an original relationship score matrix into a relationship strength evaluation linear layer and a relationship type classification linear layer which are executed in parallel, wherein the relationship strength evaluation linear layer calculates and outputs a relationship strength matrix through a full connection layer and a Sigmoid activation function; S34, generating a weighted graph edge connected with the corresponding node based on the relation strength matrix and the relation type matrix; S35, constructing a dynamic space-time relation graph based on all graph nodes, node feature vectors and weighted graph edges.
6. The optical focal length automatic adjustment control system based on intelligent sensor according to claim 2, wherein S4 specifically comprises: S41, receiving dynamic space-time relation diagrams, aggregating neighborhood information of each diagram node through a message passing mechanism of a TGN (transmission gate name) network, updating initial node characteristics, and generating an updated node characteristic matrix; S42, inputting the updated node characteristic matrix into a priority evaluation linear head, and calculating and outputting a target priority score corresponding to each graph node through full-connection layer linear transformation and Sigmoid activation function; S43, sorting all targets in a descending order according to the target priority scores, dividing the targets into three priorities of a key level, an important level and a common level according to a preset score threshold value, and outputting target priority classification results.
7. The intelligent sensor-based optical focal length automatic adjustment control system according to claim 2, wherein the TGN network comprises a node memory module, a message encoder, a time sequence attention module, a neighborhood aggregator and an embedded function layer; The node memory module is used for maintaining a state vector evolving with time for each node in the graph and taking the state vector as a memory state vector of the node at the current moment; the message encoder is used for encoding the interaction event when any two nodes in the graph generate the interaction event, and generating an interaction message vector; the time sequence attention module is used for calculating the attention weight of the memory state vector of each node and the corresponding historical interaction message vector; The neighborhood aggregator is used for carrying out weighted aggregation on the historical message vector according to the attention weight to generate a neighborhood aggregated message vector; The embedded function layer is used for combining the current memory state vector of the node and the neighborhood aggregation message vector, updating the memory state of the node to generate updated node characteristics, and aggregating the final node characteristics of all the nodes at the current moment to jointly form an updated node characteristic matrix.
8. The optical focal length automatic adjustment control system based on the intelligent sensor according to claim 2, wherein the step S5 specifically includes: s51, receiving target identifiers of the posterior tracking query matrix and all key level and importance level targets, indexing and extracting corresponding query vectors from the posterior tracking query matrix according to the target identifiers, and generating a key level and importance level query vector set; S52, extracting corresponding historical position sequences and speed information according to each query vector in the key level and importance level query vector set; S53, organizing the historical position sequence and the speed information into an input sequence according to time steps, inputting the input sequence into an LSTM network, updating the input sequence through an internal state, and outputting a prediction result of the three-dimensional space position and the future radial speed in a future preset time period.
9. The optical focal length automatic adjustment control system based on intelligent sensor according to claim 2, wherein S6 specifically comprises: s61, receiving future three-dimensional space positions and future radial speeds of the importance level and key level targets and a dynamic space-time relation diagram; S62, projecting a future three-dimensional space position along a depth axis to obtain a future depth value, inputting the future depth value into a preset focusing distance mapping table, inquiring an index closest to the future depth value in the focusing distance mapping table, and extracting a corresponding initial focusing distance according to the index; s63, inputting the future radial velocity into a velocity compensation coefficient table, inquiring and extracting a corresponding velocity compensation coefficient, multiplying the future radial velocity by the velocity compensation coefficient, and generating a distance compensation quantity; S64, adding the initial focusing distance and the distance compensation amount to generate a future focusing plane position; s65, judging whether the key level target and the important level target form a strong interaction unit or not; S66, if the strong interaction unit is judged to be formed, extracting future depth values of the key level target and the important level target, calculating a depth maximum value and a depth minimum value, and generating an interaction depth range; s67, generating a focusing control instruction and an aperture control instruction according to the future focal plane position and the interactive depth-of-field range or the independent depth-of-field range.
10. The optical focal length automatic adjustment control system based on intelligent sensor according to claim 2, wherein S7 specifically comprises: s71, receiving a focusing control instruction and an aperture control instruction; s72, analyzing the driving distance and the driving direction of the target according to the focusing control instruction; s73, analyzing a target aperture value and a driving direction according to the aperture control instruction; S74, converting the target driving distance and the driving direction into a first pulse width modulation signal of the driving motor; s75, converting the target aperture value into a second pulse width modulation signal for driving the aperture blade; S76, the first pulse width modulation signal is sent to a lens motor to control the lens to focus, and the second pulse width modulation signal is sent to a diaphragm driving unit to control the diaphragm blades to adjust to the target depth of field.

Description

Optical focal length automatic regulating control system based on intelligent sensor Technical Field The invention relates to the technical field of optical engineering, in particular to an optical focal length automatic adjustment control system based on an intelligent sensor. Background With the depth fusion of intelligent visual perception and automatic control technology, the traditional optical imaging system faces challenges of increasingly complex dynamic scenes. In modern intelligent monitoring, automatic driving and man-machine interaction applications, clear imaging of fast moving, multi-target interaction scenes is not only an important means for capturing key information, but also a core link for improving the overall perception precision and decision reliability of the system. However, most of the currently mainstream optical focal length adjustment methods rely on passive triggering, local optimization algorithms based on contrast or phase detection, which have slow response speed and lack perspective. Although the traditional methods can meet the basic focusing requirement under static or simple scenes, the traditional methods are difficult to adapt to complex environments with multiple targets for rapid movement, frequent shielding and severe depth change due to the lack of global understanding of dynamic evolution of the scenes and the prediction capability of movement trend. The main limitations of the traditional optical focus adjustment method are response lag and decision blindness. The existing method generally carries out focusing judgment based on the image information of the current frame, and when the target moves rapidly or the system has large inertia, the focusing action is seriously lagged behind the actual position of the target, so that imaging blurring is caused. When multiple potential focusing targets exist in a scene, the importance of each target is difficult to intelligently evaluate by the traditional method, so that the targets are often focused on non-key targets or frequently switched among different targets, and the targets cannot be stably focused on a core focusing target. Especially, in the face of complex space-time interaction relation among targets, the traditional single-target or simple multi-target tracking method is difficult to efficiently and accurately predict future focuses, so that a focusing system is always in a passive catch-up state, and imaging quality and effectiveness of information acquisition are seriously affected. In addition, in the focusing and depth-of-field control process, the traditional method often breaks the target identification, motion prediction and camera physical control into independent serial modules, and ignores the tight coupling relation among the three. For example, in a multi-target interaction scene, a conventional focusing strategy cannot effectively fuse future motion tracks of targets, spatial relationships among the targets and physical motion characteristics of a lens, so that decision of a focusing plane and a depth of field range is lack of basis, and optimal imaging effect is difficult to realize. Even if a part of methods introduce simple target tracking, deep space-time relations and future states of targets cannot be fully excavated, intelligent control instructions with predictability, relation sensing and physical constraint cannot be generated, and efficient, accurate and self-adaptive optical focal length automatic adjustment is difficult to realize. Therefore, how to provide an optical focal length automatic adjustment control system based on an intelligent sensor is a problem that needs to be solved by those skilled in the art. Disclosure of Invention The invention provides an optical focal length automatic adjustment control system based on an intelligent sensor, which improves TRANSTRACK model through Mamba network, combines TGN network to realize effective fusion of target tracking, priority evaluation and future track prediction, greatly improves the perception and response precision of an optical system to a key target, improves the traditional passive focusing strategy, enables the system to calculate a focal plane in advance and intelligently determine the depth of field range, and effectively overcomes imaging challenges caused by motion blur and target interaction. The invention overcomes the limitations of response lag, blind decision and neglect of the association relation between targets in the traditional optical focal length adjustment method, and provides an efficient and accurate solution for intelligent imaging and automatic control in complex dynamic scenes. According to the embodiment of the invention, the optical focal length automatic adjustment control system based on the intelligent sensor comprises the following modules: the multi-modal feature fusion module is used for synchronously collecting multi-source sensor data, extracting and fusing vision, geometry and motion modal fe