CN-122023799-A - Projection-near-end agricultural instance segmentation method and system based on capacity limitation
Abstract
The invention provides a projection-near-end agricultural example segmentation method and system based on capacity limitation. The method comprises the steps of establishing a capacity-calculation contract, restraining model parameter drift and frame rate, introducing a orthogonal residual error aggregation module into a feature space, enhancing an edge consistency gradient through Cayley-type orthogonal disturbance, introducing an anisotropic offset correction module into a geometric space, utilizing conical clipping and group-level low-rank sharing stable gallery type scene sampling, introducing an overlapped prototype clamping module into an output space, and inhibiting fuzzy prediction through near-end updating and exclusive loss. The invention can meet the real-time reasoning requirement of the embedded equipment and simultaneously remarkably improve the segmentation precision and connectivity of crop rows, drivable areas and barriers in the unstructured environment.
Inventors
- QU DELIN
- JIN JING
- ZHANG JIERU
- GUO YING
Assignees
- 哈尔滨工业大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260126
Claims (10)
- 1. The projection-near-end agricultural example segmentation method based on capacity limitation is characterized by comprising a characteristic space orthogonal residual aggregation stage, a geometric space anisotropic offset correction stage and an output space overlapping prototype clamping stage; The method comprises a characteristic space orthogonal residual aggregation stage, a characteristic space orthogonal residual aggregation stage and a characteristic space orthogonal residual aggregation stage, wherein the characteristic space orthogonal residual aggregation stage is used for introducing a direction-controllable orthogonal residual path at a shallow layer characteristic position of a backbone network aiming at the boundary blurring problem caused by texture degradation in an agricultural scene, calculating depth separable direction responses along row and column directions, constructing a symbolized direction imbalance term, enhancing an edge consistency gradient and inhibiting false texture noise through Cayley-type orthogonal disturbance and light-weight projection, and providing clear boundary clues for subsequent segmentation without changing characteristic channel dimensions; The geometric space anisotropy offset correction stage is used for modeling the characteristic sampling offset of pyramid levels according to the longitudinal structure dependence and transverse sampling jitter problems in a gallery type agricultural scene, decomposing the original offset into components along the longitudinal direction and the transverse direction of a corridor, cutting the transverse components by utilizing conical aperture parameters, combining the low-rank transverse direction shared by group levels, inhibiting high-frequency transverse aliasing while keeping the longitudinal long-distance dependence, and realizing stable sampling of the long and thin crop rows and road boundaries; and in the stage, the overlapping neighborhood is mined by combining the Softmax probability, the prototype similarity and the region gating mechanism, the fuzzy Logit value of the mixed region is restrained through single-step near-end updating, the exclusive loss is introduced to force semantic separation, and the inter-class separability and the safety are improved.
- 2. The method according to claim 1, wherein the feature space orthogonal residual aggregation stage is specifically given a shallow feature tensor Firstly, carrying out direction response calculation to construct a direction imbalance term, and after the direction imbalance term is obtained, carrying out orthogonal disturbance and light-weight projection in Cayley form And obtaining output characteristics.
- 3. The method according to claim 2, characterized in that the build direction imbalance term is in particular: (1) Wherein, the Representing depth separable convolution responses along the feature map row direction, Representing depth separable convolution responses along a feature map column for capturing horizontal and vertical texture features in an agricultural scene; And The method is a learnable channel-by-channel scaling coefficient and is used for adaptively weighting edge information in different directions.
- 4. A method according to claim 3, characterized in that the output features are in particular: (2) wherein I is an identity matrix, To enhance the output characteristics; To control the super-parameters of disturbance intensity The method is decomposed into stable rational transformation channel by channel and position by position when in implementation, so that the calculation consumption of global matrix inversion is avoided; Is comprised of The convolved, normalized and nonlinear activated projection layer is injected as a near-end operator into the edge focus residual.
- 5. The method of claim 4, wherein the geometric spatial anisotropy offset correction stage is specifically configured to model pyramid level offset feature U and output a corrected sampling offset field: (3) Wherein, the For predicting the original dense offsets; as a basis vector along the longitudinal direction of the corridor, Is a basis vector along the corridor transverse direction; Representing the g-th group shared low rank transverse direction vector, The spatial variation weight after taper cutting is adopted; Is a micro-spacial transducer for applying the calculated anisotropic offset field to the feature map.
- 6. The method of claim 5, wherein the output spatial overlap prototype clamping stage is specifically configured to construct an exclusive loss function to penalize highly overlapping classes: (4) Wherein, the Is a network parameter; 、 Softmax probability maps for categories a and b, respectively; 、 a class prototype similarity graph obtained through calculation of a lightweight ProtoConv; The regional gating weight obtained through global average pooling is used for inhibiting background noise; representing element-by-element multiplication; Is an indication function; is a predefined binarization threshold; to perform a near-end clamp update of the Logit space in forward propagation of model reasoning and training based on overlapping proximity mined class a neighborhood sets: (5) Wherein, the For the original category Logits to be used, Is Logits updated; to calculate overlapping proximity weights based on mining IoU, A stability mask for preventing overcorrection in non-overlapping areas; For the proximal step size.
- 7. The method of claim 6, wherein the overall optimization objective of the method is specifically to combine baseline loss with an exclusivity constraint, and wherein the overall loss function is defined as: (6) Wherein, the In order to classify the loss of the device, In order for the distance to return to the loss, For the prediction of the loss of the mask, Balance coefficient as exclusive loss term, training process of whole model following capacity-calculation contract Ensuring that parameter drift and frame rate drop are strictly controlled within allowable ranges And (3) inner part.
- 8. A projection-near-end agricultural example segmentation system based on capacity limitation, which is characterized by comprising a characteristic space orthogonal residual error aggregation module, a geometric space anisotropic offset correction module and an output space overlapping prototype clamping module; The feature space orthogonal residual aggregation module is configured to receive shallow features output by a backbone network, calculate depth separable direction responses along row and column directions, construct a direction imbalance item, and generate output features for enhancing edge consistency through local Cayley transformation and light-weight projection; The geometric space anisotropy offset correction module is configured to predict the sampling offset in the feature pyramid decoder and decompose the sampling offset into longitudinal and transverse components, perform cone clipping and group-level low-rank sharing on the transverse components, and resample the feature map through the micro-space transformer; And the output space overlapping prototype clamping module is configured to mine overlapping neighborhoods at the output end of the classification head by combining Softmax probability, prototype similarity and region gating, calculate exclusive losses and execute proximal clamping update of Logit space to correct fuzzy prediction.
- 9. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1-7 when the computer program is executed.
- 10. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the method of any one of claims 1-7.
Description
Projection-near-end agricultural instance segmentation method and system based on capacity limitation Technical Field The invention relates to the technical fields of computer vision, artificial intelligence and agricultural robot perception, in particular to a projection-near-end agricultural example segmentation method and system based on capacity limitation. And in particular to a capacity-limited-based projection-near-end agricultural instance segmentation method and system for resource-limited edge computing equipment, which can realize high-precision perception in severe weather and unstructured environments. Background With the rapid development of intelligent agriculture, agricultural autopilot machinery (such as intelligent tractors, autonomous picking robots, plant protection unmanned vehicles and the like) plays an increasingly important role in modern agricultural operations. The visual perception system is used as an 'eye' of the agricultural robot, and the core task of the visual perception system is to accurately divide examples of key elements (such as crop rows, drivable areas, barriers, weeds, soil and the like) in a farmland environment so as to provide reliable semantic and geometric information for downstream path planning and operation control. However, compared with urban road autopilot scenes, agricultural unstructured scenes have significant specificity and challenges, and the existing general example segmentation technology faces a number of bottlenecks in practical application: First, the geometric specificity of agricultural scenes. The farm environment usually presents a typical "Long corridor" like structure (Long-corridor scenes) with strong anisotropy, i.e. large longitudinal depth and repetitive transverse texture. Existing feature extraction networks (such as CNNs based on square convolution kernels) are often difficult to adapt to the geometric features of long and thin, large depth spans, so that sampling at the far end of a crop row or at a narrow ridge is unstable, and the problems of segmentation fracture or discontinuous structure are easy to occur. Second, perceived robustness under severe weather conditions is poor. The agricultural machinery operation often needs all-weather operation, often faces fog, snow, strong light dazzling, low illumination, dust and other bad weather conditions. These factors can cause image contrast to be compressed and texture characteristics to degrade (Texture Degradation) such that the boundaries between the crop and the background (e.g., weeds, soil) become blurred. The traditional lightweight segmentation model lacks an enhancement mechanism for the degradation characteristic, and is extremely easy to generate omission or boundary positioning drift. Third, category overlap and semantic blur. In complex farm environments, objects of different categories tend to be highly overlapping or adjacent at the pixel level. For example, weeds may grow between crop rows and obstacles may be half-buried in the soil. Such spatial aliasing can lead to ambiguous predictions of the model in the logic space at the output end, i.e., multiple categories of high confidence are output for the same pixel, which affects the safety of the operation. Fourth, resource limitation of edge computing devices contradicts high precision requirements. To meet the low cost and low power consumption requirements of agricultural operations, agricultural robots are often equipped with embedded edge computing devices (e.g., NVIDIA Jetson series). The high-precision segmentation model (such as Mask2Former based on a transducer) of the current academic mainstream has huge parameter quantity and high calculation complexity, and cannot realize real-time reasoning on edge equipment, while the segmentation precision (especially boundary IoU) under the severe conditions and complex geometric structures cannot meet the requirement of safe navigation in order to sacrifice the speed of a lightweight model (such as a YOLO series original edition) with precision, although the speed meets the requirement. Therefore, there is a need to develop an agricultural instance segmentation method and system that can not only strictly adhere to edge device computing and storage capacity limitations (capacity limitations), but also address severe weather, long corridor geometry and category overlap issues through an efficient structural feature correction mechanism. Disclosure of Invention The invention aims to solve the problem that the existing agricultural automatic driving sensing system is difficult to consider the robustness and the high-precision boundary positioning under complex severe weather (such as fog, snow and strong light) under the condition of limited resources of edge computing equipment, and provides a projection-near-end agricultural example segmentation method and system based on capacity limitation. The method establishes a capacity-calculation contract, and improves the se