CN-122020505-A - Marine pasture shellfish high-precision capturing method and equipment based on multi-source information and visual energization
Abstract
The invention discloses a high-precision fishing method and equipment for marine pasture shellfish based on multi-source information and visual energization, and belongs to the technical field of crossing of computer vision, underwater robot technology and intelligent marine equipment. The method constructs multi-mode input by fusing optical, sonar and vibration information, and extracts sand surface distortion characteristics by utilizing a deformable convolution and weighted bidirectional characteristic pyramid. Further actively strengthening shellfish related local textures through a deformable local texture strengthening module, and reasoning and strengthening regional perception robustness by combining a graph structure. Finally, using timing fusion and probability generation modeling, and outputting a high-precision probabilistic positioning result with uncertainty measurement. The invention effectively solves the problems of poor underwater vision quality, weak target characteristics and unreliable positioning results, remarkably improves the accuracy, robustness and decision intelligent level of the underwater robot for identifying and positioning buried shellfish, and is suitable for automatic accurate acquisition and capture operation of marine pastures.
Inventors
- JIA YINGXIN
- LIU BO
- XU BO
- LI XIN
- LI BAOLU
- WU MOGUANG
- GAO LINHE
- ZHANG JIANCHAO
Assignees
- 河北省机电一体化中试基地有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20251215
Claims (10)
- 1. The high-precision marine ranch shellfish fishing method based on multi-source information and visual energization is characterized by comprising the following steps of: S1, information acquisition and depth fusion Synchronously acquiring an optical image, a sonar density map and a vibration response map of a target area, performing spatial registration and scale normalization, and then splicing along a channel dimension to form an initial fusion tensor; S2, self-adaptive multi-scale feature extraction The weighted multi-mode input features are used as input, and a main network based on deformable convolution is utilized to extract multi-scale features; S3, area sensing Selecting a feature map from the enhanced multi-scale fusion features, and inputting the feature map to a deformable local texture enhancement module, wherein the module carries out self-adaptive convolution on the input features to enhance local texture response by dynamically generating deformable convolution kernels related to potential shellfish texture features and outputs texture enhancement features; S4, positioning and uncertainty estimation And modeling posterior probability distribution of a target position based on a conditional variation self-encoder and a mixed density network frame by taking the space-time fusion characteristic as a condition, and outputting a probabilistic positioning result comprising target coordinates and uncertainty measurement thereof.
- 2. The method for capturing marine ranching shellfish with high precision based on multi-source information and visual energy according to claim 1 is characterized in that in the step S1, the specific implementation of the channel attention module comprises the steps of carrying out global average pooling on the initial fusion tensor to obtain global statistical description vectors of all channels, inputting the description vectors into a two-layer fully-connected network with a bottleneck structure, learning importance weights of all channels, normalizing the importance weights to a (0, 1) interval by using a Sigmoid function, and weighting corresponding channels of the initial fusion tensor by using the normalized weights to obtain the weighted multi-mode input features.
- 3. The method for capturing marine ranch shellfish with high precision based on multi-source information and visual energy according to claim 1 or 2, wherein in step S2, the weighted bidirectional feature pyramid network introduces a learnable weight parameter for each input feature map during feature fusion, and performs feature fusion by means of normalized weighted sum, where the formula is expressed as follows: , Wherein, the For input feature maps from different resolution paths, To prevent numerical instability, =0.0001, For its corresponding weight to be learnable, Representing a convolution operation.
- 4. The method for capturing marine ranch shellfish with high precision based on multi-source information and visual enabling as set forth in claim 1, wherein in the step S3, the working process of the deformable local texture enhancement module includes: s31, constructing a learnable prototype convolution kernel library, wherein each kernel represents a potential local structure detector; S32, predicting a group of core offset and core weight for each spatial position and each prototype core according to the local context of the input feature map through a lightweight subnetwork; S33, deforming the prototype cores by using the predicted core offset, and performing weighted combination on the deformed cores by using core weights, and performing self-adaptive convolution operation on the input feature map to generate the texture enhancement features.
- 5. The method for capturing marine pasture shellfish with high precision based on multi-source information and visual energy according to claim 1 is characterized in that in the step S3, the inference process of the graph neural network is that a super-pixel area is used as a graph node, node characteristics are mean values of characteristics in the area, edge connection is established according to spatial proximity and characteristic similarity, message transmission is conducted through a graph convolution or graph annotation mechanism, node characteristics are updated, and finally the updated node characteristics are mapped back to original spatial dimensions to generate the area perception characteristics.
- 6. The method for capturing marine ranch shellfish with high precision based on multi-source information and visual energy according to claim 1, wherein in step S4, the condition-variable self-encoder and mixed density network framework comprises a priori network, a posterior network and a decoder; The prior network takes the space-time fusion characteristic as input and outputs prior distribution parameters of potential variables; The posterior network takes the space-time fusion characteristics and the real position labels as input during training and outputs posterior distribution parameters of potential variables; the decoder samples from the potential variable distribution and combines the space-time fusion characteristics to regress parameters of the Gaussian mixture model, wherein the parameters comprise the mixture weight, the mean vector and the covariance matrix of each component.
- 7. The method for capturing marine ranch shellfish with high precision based on multi-source information and visual energy according to claim 6, wherein in the step S4, the uncertainty measure is represented by a covariance matrix of a weight maximum component in the gaussian mixture model, and the final predicted target position is a mean vector of the component.
- 8. The method for capturing marine ranch shellfish with high precision based on multi-source information and visual energy according to claim 6, wherein in step S4, a loss function used for model training is: Wherein, the In order to be a feature of a temporal-spatial fusion, As a result of the true position, As a potential variable of the set of variables, In order to be a posterior network, For an a priori network, In order for the decoder to be a decoder, For the KL divergence, the average value of the power supply is calculated, Super parameters for the reconstruction loss and regularization term.
- 9. The method for capturing marine ranch shellfish with high precision based on multi-source information and visual energy according to claim 1, wherein in the step S4, the specific process of space-time fusion is to align the warpage of the feature map optimized at the previous time to the current frame coordinate system by using pose transformation information from a motion sensor or a visual odometer, and to splice the aligned historical features with the regional perception features of the current frame in a channel, and then to fuse the regional perception features by a convolution layer to generate the space-time fusion features.
- 10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor performs the steps of the method for high precision fishing of marine ranch shellfish based on multisource information and visual activation according to any of claims 1 to 9 when the program is executed.
Description
Marine pasture shellfish high-precision capturing method and equipment based on multi-source information and visual energization Technical Field The invention belongs to the technical field of intersection of computer vision, underwater robot technology and intelligent marine equipment, and particularly relates to a marine pasture shellfish high-precision capturing method and device based on multi-source information and vision energization, which are based on multi-mode sensing information depth fusion, deformable local texture enhancement and graph structure reasoning, space-time context fusion and probability generation modeling. Background With the development of underwater robot (such as ROV and AUV) technology, the adoption of robots for realizing fixed-point, fine and automatic acquisition and capture has become an industrial research hotspot. However, underwater vision systems remain a key bottleneck in achieving automated acquisition. The underwater environment has the characteristics that firstly, light rays can be seriously attenuated and scattered when being transmitted in a water body, so that the image contrast is low, the color distortion and the detail blurring are caused, secondly, a seabed sand bed is influenced by water flow, biological activity and robot disturbance, suspended particles are easy to generate, visual noise is formed, thirdly, a target shellfish (such as razor clam) is often buried under the sand layer, and the underwater environment is characterized by indirect characteristics of sand surface micro-pits, bulges or local texture distortion and the like formed after sand lying, the characteristics and surrounding sand background are extremely low in degree of distinction, and the visual detection difficulty is high. Most of the existing underwater identification and positioning methods depend on a single visual mode, lack of effective fusion of multi-source information such as sonar and vibration, and are prone to problems of large positioning error and insufficient robustness in complex submarine topography. In addition, the traditional convolutional neural network and the attention mechanism lack active and targeted perceptibility on local structural texture distortion (such as concave edges in specific directions, radial cracks and the like) caused by shellfish lying sand. In the aspect of positioning output, the mainstream method generally adopts deterministic coordinate regression, and positioning uncertainty caused by shielding, overlapping, characteristic blurring and other conditions cannot be quantitatively evaluated, so that the reliability requirements of underwater high-risk and high-precision operation are difficult to meet. Therefore, a visual identification and positioning method capable of effectively overcoming the bottleneck of underwater visual technology, realizing multi-source information depth fusion, having active local feature sensing capability, outputting a probabilistic and interpretable positioning result and uncertainty thereof is urgently needed at present so as to support the accurate, autonomous and reliable operation of intelligent acquisition equipment of the next-generation underwater robot. Disclosure of Invention The invention aims to provide a high-precision marine ranch shellfish capturing method based on multi-source information and visual energization, so as to realize rapid and accurate identification and positioning of marine ranch shellfish targets through multi-mode feature fusion and visual energization. It is a further object of the invention to provide an electronic device comprising a memory, a processor and a computer program stored on said memory and executable on said processor, said processor implementing the steps of the above method when said program is executed. The technical scheme adopted by the invention for realizing the purposes is as follows: the high-precision marine ranch shellfish fishing method based on multi-source information and visual energization comprises the following steps: S1, information acquisition and depth fusion Synchronously acquiring an optical image, a sonar density map and a vibration response map of a target area, performing spatial registration and scale normalization, and then splicing along a channel dimension to form an initial fusion tensor; S2, self-adaptive multi-scale feature extraction The weighted multi-mode input features are used as input, and a main network based on deformable convolution is utilized to extract multi-scale features; S3, area sensing Selecting a feature map from the enhanced multi-scale fusion features, and inputting the feature map to a deformable local texture enhancement module, wherein the module carries out self-adaptive convolution on the input features to enhance local texture response by dynamically generating deformable convolution kernels related to potential shellfish texture features and outputs texture enhancement features; S4, positioning and uncerta