Search

CN-121982489-A - Video content intelligent analysis method based on edge computing platform

CN121982489ACN 121982489 ACN121982489 ACN 121982489ACN-121982489-A

Abstract

The invention provides an intelligent analysis method of video content based on an edge computing platform, which comprises the steps of collecting video streams through an image collecting interface of the edge computing platform, preprocessing video frames at a CPU end to obtain standardized image data, converting a pre-trained neural network model into a special format model adapting to NPU, adopting a hierarchical quantization strategy in the conversion process, quantizing at least one network layer responsible for basic feature extraction in the model by adopting a first numerical precision, quantizing the rest network layers responsible for classification and regression in the model by adopting a second numerical precision lower than the first numerical precision, dynamically scheduling the number of processing cores in an activated state in the NPU based on a processing load determined by analyzing the current video frame content represented by the standardized image data, utilizing the processing cores in the activated state to load the special format model to infer the standardized image data, and carrying out post-processing on an inference output result of the NPU to generate a structural analysis result.

Inventors

  • CHEN HUIXIANG
  • ZHANG ZHIYUAN
  • CHEN YOUWU
  • ZHENG JUNQIANG
  • SUN ZHIXIN
  • LUO HAIBO
  • LIU WEI

Assignees

  • 福建中锐网络股份有限公司

Dates

Publication Date
20260505
Application Date
20260122

Claims (10)

  1. 1. An intelligent analysis method for video content based on an edge computing platform is characterized in that the edge computing platform comprises a Central Processing Unit (CPU) and a neural Network Processing Unit (NPU) with a multi-processing core, and the method comprises the following steps: collecting video streams through an image collecting interface of the edge computing platform, and preprocessing video frames at a CPU end to obtain standardized image data; Converting a pre-trained neural network model into a special format model adapting to the NPU, wherein a layering quantization strategy based on network layer functions is adopted in the conversion process, namely, at least one network layer responsible for basic feature extraction in the model is quantized by adopting a first numerical precision, and the rest network layers responsible for classification and regression in the model are quantized by adopting a second numerical precision lower than the first numerical precision; Dynamically scheduling the number of processing cores in an activated state in the NPU based on a processing load determined by analyzing the content of the current video frame represented by the standardized image data, and loading the special format model by using the processing cores in the activated state to infer the standardized image data; and carrying out post-processing on the reasoning output result of the NPU to generate a structural analysis result.
  2. 2. The intelligent analysis method of video content based on the edge computing platform of claim 1, wherein the first numerical precision is floating-point precision, the second numerical precision is integer precision, the network layer responsible for basic feature extraction is an initial layer responsible for core feature extraction in a model backbone network, and the network layer responsible for classification and regression is a classification head layer and a regression head layer of the model.
  3. 3. The intelligent video content analysis method based on the edge computing platform as set forth in claim 1, wherein the processing load is determined by the number of targets to be detected in the current video frame, and the rule for dynamically scheduling the number of processing cores in an active state in the NPU is that NPU single-core operation is activated when the number of targets to be detected is smaller than a first threshold, NPU dual-core operation is activated when the number of targets to be detected is between the first threshold and a second threshold, and NPU triple-core operation is activated when the number of targets to be detected is larger than the second threshold.
  4. 4. The intelligent analysis method of video content based on the edge computing platform as set forth in claim 1, wherein the preprocessing of the CPU side comprises frame extraction, noise reduction, size normalization, color gamut conversion and numerical normalization, wherein the noise reduction adopts a Gaussian filter or other noise reduction algorithm adapting edge computing, the size normalization scales a video frame to a preset size adapting to a model through interpolation, the color gamut conversion converts a color gamut format of an original video frame to a color gamut format compatible with the model, the numerical normalization maps pixel values to a numerical range adapting to model reasoning, and the preprocessing task is distributed to a low-power core of the CPU for execution, and a high-performance core of the CPU is reserved for core task processing.
  5. 5. The intelligent analysis method for video content based on the edge computing platform according to claim 1, wherein: The model conversion is realized by a model conversion tool of an adaptive NPU, and a pre-trained ONNX-format or other common-format neural network model is converted into an NPU-compatible special-format model; The method comprises the steps of supporting dynamic switching of special format models with at least two different functions, caching at least one loaded model through a memory of an edge computing platform so as to shorten time consumption of model switching, wherein the neural network model comprises at least two of a target detection model, an industrial defect detection model and a behavior recognition model.
  6. 6. The intelligent analysis method of video content based on the edge computing platform as set forth in claim 1, wherein the post-processing comprises non-maximum suppression and coordinate inverse transformation, wherein the non-maximum suppression filters an overlapping detection frame by using a preset cross-ratio threshold, the coordinate inverse transformation restores normalized coordinates to pixel dimensions of an original video frame based on scaling of normalized image data and the original video frame, and the structured analysis result comprises a timestamp, a target type, a target position and confidence information.
  7. 7. The intelligent video content analysis method based on the edge computing platform is characterized by further comprising an exception handling mechanism, wherein when video acquisition is interrupted, a reconnection mechanism with a preset time interval is triggered, an alarm is output when reconnection fails, when NPU reasoning overtime or failure occurs, a current task is added into a retry queue, the preset times are at most retried, automatic degradation is performed to CPU lightweight model reasoning when retried fails, and when the available space of a storage medium of the edge computing platform is lower than a preset threshold value, a structural analysis result before the preset time interval is automatically deleted, so that storage overflow is avoided.
  8. 8. The intelligent video content analysis method based on the edge computing platform as set forth in claim 1, wherein the preprocessing, reasoning and post-processing tasks are executed in a pipeline parallel execution mode, wherein the preprocessing tasks are distributed to a low-power-consumption core of a CPU, the reasoning tasks are executed by an NPU, the post-processing tasks are distributed to a high-performance core of the CPU for execution, and collaborative scheduling among the tasks is achieved through a standardized communication interface and an event driven framework, so that overall processing throughput is improved.
  9. 9. The intelligent analysis method of video content based on the edge computing platform according to claim 1, wherein the structured analysis result supports two output modes of local storage and network uploading, wherein files are divided according to a preset period during the local storage and stored in a dedicated storage area of a storage medium, and the network uploading is realized through a high-speed Ethernet interface based on a preset communication protocol, so that uploading delay is controlled to be not more than a preset threshold.
  10. 10. An intelligent video content analysis system based on an edge computing platform is characterized in that the edge computing platform comprises a Central Processing Unit (CPU), a neural Network Processing Unit (NPU) with a multiprocessing core, an image acquisition interface, a memory and a storage medium, and the system comprises: The video acquisition and preprocessing module acquires a video stream through the image acquisition interface, performs preprocessing on a video frame on a low-power-consumption core of the CPU, and outputs standardized image data; The model conversion module is used for converting the pre-trained neural network model into a special format model which is adaptive to the NPU, wherein a layering quantization strategy based on network layer functions is adopted in the conversion process, namely, at least one network layer responsible for basic feature extraction in the model is quantized by adopting a first numerical precision, and other network layers responsible for classification and regression in the model are quantized by adopting a second numerical precision lower than the first numerical precision; The dynamic scheduling and reasoning module dynamically schedules the number of processing cores in an activated state in the NPU based on the processing load determined by analyzing the current video frame content represented by the standardized image data, loads the special format model by utilizing the activated processing cores, performs reasoning on the standardized image data and outputs a reasoning result; the post-processing module is used for carrying out post-processing on the reasoning result on the high-performance core of the CPU to generate and output a structural analysis result; The video acquisition and preprocessing module, the dynamic scheduling and reasoning module and the post-processing module form a pipeline, and the collaborative scheduling is realized through a standardized communication interface and an event-driven framework.

Description

Video content intelligent analysis method based on edge computing platform Technical Field The invention belongs to the technical fields of computer vision, embedded edge calculation and intelligent analysis, and particularly relates to an intelligent video content analysis method based on an edge calculation platform. Background Along with the deep fusion of artificial intelligence and the internet of things technology, the application of intelligent analysis of video content in the fields of security monitoring, industrial quality inspection, intelligent traffic and the like is becoming wider, and the intelligent analysis of video content becomes a key enabling technology for realizing industrial digitization and intelligent transformation. However, when a complex intelligent analysis task is sunk from the cloud to an edge device with limited resources, a series of technical challenges of interleaving coupling are commonly faced, and the existing scheme is difficult to achieve a good balance among analysis precision, real-time response, energy efficiency and cost. First, in terms of model deployment and computational accuracy, edge devices often carry specialized neural network processing units to accelerate reasoning, which requires the conversion of pre-trained floating point models into specific formats supported by the device and quantization compression. However, the general post-training quantization or full-model unified precision quantization strategy can obviously reduce the model volume and improve the reasoning speed, and simultaneously can easily introduce non-negligible precision loss in complex scenes. Particularly, for the task of fine granularity feature extraction (such as industrial micro defect detection), the basic feature map extracted by the shallow layer of the backbone network of the model is extremely sensitive to quantization errors, and the unified low-precision quantization can seriously damage the feature representation capability, so that the final analysis accuracy is greatly reduced, and the reliability requirement of practical application is difficult to meet. How to effectively inhibit the deployment precision loss of the model on the edge equipment on the premise of not obviously increasing the calculation cost is a problem to be solved. In the aspects of computing resource scheduling and energy efficiency, although the edge SoC chip with the multi-core NPU provides considerable peak computing power, the traditional static or simple polling scheduling strategy is difficult to adapt to the dynamically-changing computing load in video stream analysis. For example, the number, size, and density of targets in a video scene may fluctuate in real time, and employing a fixed NPU core activation strategy (e.g., full-time full-core operation) may result in idle computation and energy consumption during low load periods, which may cause processing queue accumulation and delay surges due to insufficient computation. The mismatch between the computing resource supply and the real-time analysis requirement causes the overall computing power utilization rate of the system to be low, and the energy efficiency to be optimal can not be realized while the real-time constraint (such as high-frame-rate video stream processing) is satisfied. Furthermore, in terms of system-level collaboration and task pipeline optimization, the edge computing platform integrates various processing units such as a CPU, an NPU and the like. The existing scheme usually adopts a serial execution mode, namely, after the CPU finishes all preprocessing, the data is handed over to the NPU for reasoning, and finally, the data is handed back to the CPU for post-processing. This mode fails to fully utilize heterogeneous characteristics of the core with the size inside the CPU and parallel processing capability between the CPU and the NPU, resulting in a large amount of idle latency for each processing unit, limited overall throughput of the system, and difficulty in further reducing end-to-end processing delay. The design of an efficient hardware and software collaboration mechanism to construct a refined task pipeline is a key to releasing the overall performance potential of the edge platform. In addition, when facing diversified application scenarios, existing edge analysis devices often solidify a single analysis model and processing strategy, and lack adaptive adjustment capability at runtime. The demand for model types and computing resources varies significantly from scene to scene (e.g., perimeter security versus line detection). The immobilized system can not flexibly switch the analysis model or adjust the resource allocation strategy according to the real-time scene content or the task instruction, so that the application range is narrow, and the deployment and maintenance cost is high. Therefore, the industry urgently needs a new intelligent video content analysis method for an edge computing platform, and