CN-122018677-A - Blind person's glasses identification system based on end-to-end vision detects

CN122018677ACN 122018677 ACN122018677 ACN 122018677ACN-122018677-A

Abstract

The invention discloses an end-to-end visual detection-based blind glasses identification system which comprises a visual perception module, a feature fusion module, a detection prediction module, a lightweight distillation deployment module, a distillation, quantization and pruning lightweight adaptation hardware and a linkage voice module, wherein the visual perception module is connected with a blind glasses high-definition camera and used for receiving an original image and extracting multi-scale features, the feature fusion module integrates shallow texture and deep semantic features and supports multi-scale feature parallel processing, the detection prediction module directly predicts the type, the position and the confidence of a target through an NMS-free end-to-end detection head, and the lightweight distillation deployment module is used for carrying out distillation, quantization and pruning lightweight adaptation hardware and linking the voice module for instant feedback. The invention improves the feature extraction efficiency by multi-module cooperation, does not have NMS direct prediction, lightweight adaptive hardware and instant voice feedback, and enhances the convenience and instantaneity of blind environment recognition.

Inventors

ZHANG HUI
ZHAO TIANCHENG
JIANG KELEI
HE XUAN
CHEN GAOYUN

Assignees

杭州联汇科技股份有限公司

Dates

Publication Date: 20260512
Application Date: 20251229

Claims (5)

1. The blind glasses identification system based on end-to-end visual detection is characterized by comprising: The visual perception module is connected with the high-definition cameras on the blind glasses, receives the original image input of the high-definition cameras of the blind glasses, and extracts multi-scale image features through a visual coding technology; The feature fusion module is connected with the visual perception module to receive the multi-scale features output by the visual perception module, and realizes the context integration of the shallow texture features and the deep semantic features through the hierarchical feature path design, and simultaneously, the different-scale features are processed in parallel at the detection head; the detection prediction module is connected with the feature fusion module to directly predict the category, the boundary frame position and the confidence of the target through an end-to-end detection head without NMS based on the optimized feature output by the feature fusion module; the light distillation deployment module is connected with the visual perception module, the characteristic fusion module and the detection prediction module so as to carry out light optimization on a complete model formed by the first three modules through model distillation, quantization and pruning, so that the complete model is adapted to the resource-limited hardware of the blind glasses, and meanwhile, the linkage voice interaction module realizes the instant feedback of the detection result.
2. The end-to-end vision detection-based blind glasses recognition system of claim 1 wherein the detection head output in the detection prediction module comprises three branches of category probability, position offset and target confidence, and the joint loss function is defined as: Wherein, the Representing a loss of classification, Representing the regression loss of the bounding box, Indicating a loss of confidence in the target, And (3) with Is a weighting coefficient.
3. The blind glasses recognition system based on end-to-end visual detection of claim 2, wherein the detection prediction module dynamically adjusts the proportion of positive samples according to the target size and the characteristic response in the training process by adopting a dynamic small target perception tag allocation method, and the core weight function is as follows: Wherein, the For the scale score of the i-th candidate region, And (3) with The distribution curve slope and the threshold are controlled separately.
4. The end-to-end vision detection based blind glasses identification system of claim 3 wherein the detection prediction module introduces a "spatial proximity compensation mechanism" in sample matching allowing adjacent grids to share part of the tags during small target detection to enhance recall.
5. The end-to-end vision detection-based blind glasses identification system of claim 4 wherein the lightweight distillation deployment module performs knowledge distillation using a teacher-student structure to enable a small model to inherit multi-scale semantic capabilities of a large model, and a distillation loss function is defined as follows: The student model is quantized by 8-bit and then is deployed at the chip end of the glasses for the blind, so that real-time detection and voice feedback can be realized, wherein, And Respectively a teacher model and a student model in the first place The output distribution of the class.

Description

Blind person's glasses identification system based on end-to-end vision detects Technical Field The invention relates to the field of artificial intelligence and computer vision, in particular to an end-to-end vision detection-based blind glasses identification system. Background The visual detection technology is increasingly widely applied in the field of mobile equipment, can realize rapid identification and detection of target objects, and provides technical support for various scenes. However, the existing visual detection algorithm has the problems of large calculation amount and high time delay when running on the mobile device, and the detection process depends on an NMS (non-maximum suppression) step in post-processing, so that the processing complexity is further increased, the algorithm is difficult to meet the requirement of the mobile device on real-time response, and the effective application of the visual detection technology in a mobile scene is limited. Disclosure of Invention Aiming at the defects existing in the prior art, the invention aims to provide an end-to-end detection method without NMS, which combines small target perception label distribution and a lightweight distillation training strategy, so that the blind glasses can realize rapid and accurate scene recognition and voice auxiliary feedback under limited computing resources. In order to achieve the purpose, the invention provides the following technical scheme that the blind glasses identification system based on end-to-end vision detection is characterized by comprising the following components: The visual perception module is connected with the high-definition cameras on the blind glasses, receives the original image input of the high-definition cameras of the blind glasses, and extracts multi-scale image features through a visual coding technology; The feature fusion module is connected with the visual perception module to receive the multi-scale features output by the visual perception module, and realizes the context integration of the shallow texture features and the deep semantic features through the hierarchical feature path design, and simultaneously, the different-scale features are processed in parallel at the detection head; the detection prediction module is connected with the feature fusion module to directly predict the category, the boundary frame position and the confidence of the target through an end-to-end detection head without NMS based on the optimized feature output by the feature fusion module; the light distillation deployment module is connected with the visual perception module, the characteristic fusion module and the detection prediction module so as to carry out light optimization on a complete model formed by the first three modules through model distillation, quantization and pruning, so that the complete model is adapted to the resource-limited hardware of the blind glasses, and meanwhile, the linkage voice interaction module realizes the instant feedback of the detection result. The end-to-end vision detection-based blind glasses recognition system of claim 1 wherein the detection head output in the detection prediction module comprises three branches of category probability, position offset and target confidence, and the joint loss function is defined as: Wherein, the Representing a loss of classification,Representing the regression loss of the bounding box,Indicating a loss of confidence in the target,And (3) withIs a weighting coefficient. The blind glasses recognition system based on end-to-end visual detection of claim 2, wherein the detection prediction module dynamically adjusts the proportion of positive samples according to the target size and the characteristic response in the training process by adopting a dynamic small target perception tag allocation method, and the core weight function is as follows: Wherein, the For the scale score of the i-th candidate region,And (3) withThe distribution curve slope and the threshold are controlled separately. The end-to-end vision detection based blind glasses identification system of claim 3 wherein the detection prediction module introduces a "spatial proximity compensation mechanism" in sample matching allowing adjacent grids to share part of the tags during small target detection to enhance recall. The end-to-end vision detection-based blind glasses identification system of claim 4 wherein the lightweight distillation deployment module performs knowledge distillation using a teacher-student structure to enable a small model to inherit multi-scale semantic capabilities of a large model, and a distillation loss function is defined as follows: The student model is quantized by 8-bit and then is deployed at the chip end of the glasses for the blind, so that real-time detection and voice feedback can be realized, wherein, AndRespectively a teacher model and a student model in the first placeThe output distribution of the class. The inventio