Search

CN-121996506-A - Deep learning model resource monitoring system and method for mobile terminal

CN121996506ACN 121996506 ACN121996506 ACN 121996506ACN-121996506-A

Abstract

A deep learning model resource monitoring system and method facing mobile terminal, the system includes device communication layer, data processing layer, service logic layer and user interface layer, data transmission and event notification are carried out between each layer through message bus, wherein the device communication layer is responsible for communication connection and command interaction with mobile terminal, the data processing layer is responsible for analyzing and standardizing monitoring data, the service logic layer is used as core dispatching center to coordinate task execution and data circulation, the user interface layer displays system running state and model performance result through visual component and provides task control entrance.

Inventors

  • ZHANG NING
  • MA CONG
  • LI QINGSHAN
  • LIU CHENGJI
  • LI FEI

Assignees

  • 西安电子科技大学

Dates

Publication Date
20260508
Application Date
20260126

Claims (9)

  1. 1. A deep learning model resource monitoring system for a mobile terminal is characterized by comprising a device communication layer, a data processing layer, a service logic layer and a user interface layer, wherein data transmission and event notification are carried out among the layers through a message bus, the device communication layer is responsible for communication connection and command interaction with the mobile terminal, the data processing layer is responsible for analyzing and standardizing monitoring data, the service logic layer serves as a core scheduling center to coordinate task execution and data circulation, and the user interface layer displays a system running state and model performance result through a visualization component and provides a task control inlet.
  2. 2. The deep learning model resource monitoring system for the mobile terminal according to claim 1, wherein the device communication layer is located at a system basic position and is used for establishing and maintaining ADB communication connection with the mobile terminal, issuing a control command to the mobile terminal through the communication connection, receiving a command execution result returned by the mobile terminal, continuously providing original monitoring data of a device running state, a performance log, a resource use index and model reasoning output to the data processing layer, wherein the inside of the system is composed of a connection management module, a task scheduling module and a device state maintenance module, providing a unified access interface and a data channel for the data processing layer, and forming a bidirectional communication relationship between the device communication layer and a service logic layer through a message bus, wherein the device communication layer receives a control command from the service logic layer and is used for triggering device access, command execution and task start-stop; The connection management module is used as an entrance submodule of the equipment communication layer and is responsible for processing the discovery and access of the mobile terminal, acquiring a current available equipment list by periodically calling an ADB device command, analyzing to acquire the serial numbers of all the equipment, registering the analyzed equipment serial numbers as equipment fingerprints in the system, if a user is connected with the mobile terminal in a Wi-Fi ADB mode, attempting to establish remote connection according to the IP input by the user and a port, and adding the successfully established node into the equipment pool; The task scheduling module is used as an execution control sub-module on the connection management module, shares equipment fingerprints and task context information with the equipment state maintenance module, is responsible for managing command execution requests issued by the business logic layer, selects a proper execution mode according to the type of the task, executes system-level monitoring commands through an ADB interface without code instrumentation or equipment Root permission acquisition of the application to be tested, adopts a synchronous execution mode for resource inquiry and state commands, directly acquires output through a short-life-cycle task sub-process and immediately returns the output; The device state maintenance module relies on the device fingerprint and the device state table provided by the connection management module to monitor the reliability of ADB connection during task execution and execute an automatic recovery strategy when communication is abnormal, maintains a lightweight heartbeat mechanism in the background, and confirms the connection stability by non-invasive interaction with the device through periodic attempt, when detecting command execution failure, sub-process interruption or device disappearance, the device state maintenance module marks the device state as unavailable and records the context of abnormality occurrence, including command content, thread information and last successful output segment, so as to facilitate subsequent log analysis, and meanwhile, the device state maintenance module reports abnormality to a service logic layer to enable the service logic layer to terminate the current task or attempt to reestablish connection.
  3. 3. The deep learning model resource monitoring system for the mobile terminal according to claim 1, wherein the data processing layer operates on the device communication layer, receives and analyzes original monitoring data from the device communication layer, converts performance indexes including CPU occupancy rate, memory occupancy, network rate, power consumption, temperature, model initialization time and reasoning delay into structured data in a unified format, the data processing layer analyzes, cleans, time aligns and aggregates system resource information and reasoning logs to construct a normalized multidimensional performance index time series record, the processed multidimensional performance index time series record is provided for the service logic layer, and the whole processing chain is composed of a multisource fusion analysis module, a time reference alignment module and a cache and statistics module, wherein each module operates in sequence; the multi-source fusion analysis module is used as an entry module of the data processing layer, is connected between the equipment communication layer and the time reference alignment module, and is used for carrying out unified analysis and fusion on multi-source original text output collected and forwarded by the task scheduling module of the equipment communication layer in the process of executing resource monitoring tasks and TensorFlow Lite Benchmark tasks, and the multi-source fusion analysis module maintains a set of configurable regular matching templates and analysis strategies according to output format differences caused by different ROM and Android versions on the one hand aiming at the output results of top and dumpsys meminfo commands periodically collected by resource monitoring threads, and carries out dynamic matching and extraction on key fields comprising the total occupancy rate of a CPU, the occupancy rate of each thread, the scheduling priority, the progress PSS, RSS, swap and anonymous pages of the system to obtain resource data related to the CPU and a memory; when detecting that the output field is missing or the format is not standard, the multisource fusion analysis module processes the abnormal sample through fault-tolerant logic, including discarding records which cannot be analyzed or supplementing the latest effective sampling value, guaranteeing the continuity and availability of data, carrying out field-level combination on the analyzed CPU resource data and memory resource data in the multisource fusion analysis module according to the timestamp and the process identifier, and combining the uplink and downlink rate information obtained by calculation of the network monitoring command to generate a system resource snapshot item containing a resource sampling timestamp, a CPU index, a memory index and a network index Average reasoning delay, minimum reasoning delay, preheating time and reasoning performance indexes of memory occupation, reserving millisecond-level time stamps or first frame time stamps output in a Benchmark log, and outputting the millisecond-level time stamps or first frame time stamps as time stamp fields of each reasoning performance record to a time reference alignment module for processing; The time reference alignment module is connected with the multisource fusion analysis module and is used for carrying out time alignment and resampling on the system resource snapshot and the reasoning performance record under a unified time reference, the time reference alignment module firstly analyzes a system resource snapshot sequence with a resource sampling time stamp and a reasoning performance record sequence with a Benchmark time stamp output by the multisource fusion analysis module, and compares the system resource snapshot sequence with the reasoning performance record sequence with the Benchmark time stamp with a local resource monitoring sampling time axis by taking the millisecond time stamp or the first frame time stamp output by the Benchmark tool as a reference, calculates the time offset between the two time stamps, records the time offset in an internal time offset management unit, applies the time offset to a resource sampling link, maps two time sequence data on the same time axis, and combines the aligned system resource index and the reasoning performance index according to the unified time axis after the time mapping and resampling are finished, so as to generate a multi-dimensional performance index time sequence record, wherein each record corresponds to one time sampling point and comprises the system resource index and the reasoning performance index on the time point; The caching and statistics module receives the multi-dimensional performance index time sequence record which is output by the time reference alignment module and mapped to the unified time axis, performs structural encapsulation on the multi-dimensional performance index time sequence record and manages the life cycle of the multi-dimensional performance index time sequence record, maintains a circular caching area which can be circularly written in the caching and statistics module, is used for storing time sequence data which comprises CPU occupancy rate, memory occupancy, network rate, temperature change, power consumption change, model initialization time, reasoning time, preheating time and model reasoning memory occupancy in the last period, updates global statistics information comprising average value, median P50, P90, maximum value and mutation detection result according to new points in each writing, automatically triggers abnormal rollback logic if abnormal fluctuation occurs to the data, replaces abnormal points with the last effective value, and outputs the structural index after processing to the service logic layer in a unified JSON structure.
  4. 4. The deep learning model resource monitoring system for the mobile terminal according to claim 1, wherein the service logic layer is used as a core scheduling center of the system and is responsible for executing resource monitoring, performance analysis and task arrangement and scheduling decision based on the structured data provided by the data processing layer, and simultaneously integrating TensorFlow Lite Benchmark tool sets to realize automatic test and evaluation of model reasoning performance; The task scheduler is responsible for generating, organizing and executing resource monitoring tasks and deep learning model Benchmark test tasks, and after a user configures model paths, delegate types, thread numbers and preheating times at a user interface layer, the task scheduler packages the parameters into task description objects and generates task sequences according to a preset strategy; The Benchmark management module is responsible for TensorFlow Lite Benchmark running control of the tool, automatically constructs command line parameters according to a task description object of a task scheduler, and completes starting, log capturing and exit state monitoring of the Benchmark tool; When the Benchmark task is triggered, the state synchronization module starts the resource monitoring thread and records a synchronization point at the same time, and in the process of executing the task, the state synchronization module continuously acquires the latest system resource snapshot from the data processing layer and integrates the latest system resource snapshot into the task context, and after the acquisition link is finished, the state synchronization module extracts corresponding index fragments according to a time window of the task so as to support subsequent result analysis; The result aggregation module is responsible for integrating and counting all data after task execution, firstly extracts an inference time sequence during the task, calculates key performance indexes of the initialization time, single inference time consumption, average and minimum inference delay, preheating time and memory occupation of an extraction model, simultaneously, aggregates the peak occupancy rate of a CPU, the peak occupancy rate of the memory, network flow consumption and temperature change conditions to form a complete multi-dimensional performance summary, and packages and archives relevant logs and execution context if abnormal execution is detected, generates corresponding alarm events and feeds the alarm events back to a user interface layer through a message bus.
  5. 5. The deep learning model resource monitoring system for the mobile terminal according to claim 1, wherein the user interface layer is positioned at the uppermost layer of the system, the user interface layer provides interaction and visualization functions for users, and equipment monitoring, model management, task configuration and result display are realized based on feedback of a service logic layer; The device management interface is in charge of displaying all the mobile terminals connected with the current system and displaying the online state, the device serial number, the model and the authorization condition of the device in real time, a user can complete Wi-Fi ADB connection, connection refreshing and disconnection through the device management interface, and the device management interface receives an online state update event sent by the service logic layer through the message bus so as to keep synchronization with the real state of the mobile terminal connection state maintained by the device communication layer; the resource monitoring interface displays monitoring data including CPU, memory and network flow consumption data when the equipment operates, reads the latest data in the annular cache area from the service logic layer every 200 ms and above, updates a line diagram through an embedded graphic renderer, compresses a large number of data points by using a LOD-3 downsampling strategy (Level of Detail 3 Level), and realizes balance between Detail reservation and real-time performance; After the parameters are set by the user, the model configuration interface generates a structured task description object and transmits the structured task description object to a business logic layer through a message bus, and the business logic layer triggers a corresponding execution flow; The Benchmark control interface is used for starting, suspending or stopping a Benchmark test flow, displaying an reasoning state, phase conversion and current equipment load in real time, updating a reasoning progress bar according to a timestamp and a label fed back by a service logic layer in the test process, and rendering an execution state on the interface; The report center interface is responsible for displaying performance results after task execution is finished, including delay statistics, resource peaks, temperature change curves and execution abstracts, supports the derivation of report files in CSV and JSON formats, and provides complete performance diagnosis capability for developers by collecting key indexes and visual charts.
  6. 6. The deep learning model resource monitoring method for the mobile terminal is characterized by comprising the following steps of: After the connection is established successfully, the equipment communication layer respectively starts a resource monitoring thread and a Benchmark execution thread on a target mobile terminal according to a task command issued by a service logic layer, executes a system resource sampling command and TensorFlow Lite Benchmark test tasks, continuously collects original monitoring data generated in the equipment running state, the system resource use index and the model reasoning process, and forwards the original monitoring data to a data processing layer; The data processing layer receives multi-source original text output generated by a resource monitoring thread and an inference Benchmark execution thread and forwarded by the equipment communication layer, analyzes, cleans and constructs original monitoring data formed by the multi-source original output text, extracts system resource indexes and inference performance indexes including CPU occupancy rate, memory occupancy, network rate, temperature, power consumption, model initialization time and inference delay, performs time alignment and aggregation on the system resource indexes and the inference performance indexes based on uniform time references, constructs a normalized multi-dimensional performance index time sequence record, and outputs the multi-dimensional performance index time sequence record to the service logic layer through a standardized interface; And step 3, the business logic layer receives the multidimensional performance index time sequence record output by the data processing layer, performs cache management and statistical analysis on time sequence data, calculates the performance evaluation results of model initialization time, reasoning delay statistics and system resource peak values, completes the performance analysis of the reasoning task based on the performance evaluation results, and performs visual display and result output on the real-time monitoring information and the evaluation results through the user interface layer.
  7. 7. The method for monitoring the resources of the deep learning model for the mobile terminal according to claim 6, wherein the specific method in step 1 comprises the following steps: Step 1.1 device discovery and Access Periodically calling an adb device command by a connection management module of a device communication layer, acquiring a currently connected device list, analyzing to obtain serial numbers of all devices, registering the serial numbers of the devices as device fingerprints in a system, and recording an online state, an authorized state and a timestamp of last successful communication in a device state table; When a user configures Wi-Fi ADB connection parameters at a user interface layer, a connection management module establishes remote ADB connection according to an IP address and a port number input by the user, adds successfully connected equipment into an equipment pool, and synchronously updates an equipment state table; Step 1.2 Command execution channel establishment The business logic layer generates a resource monitoring task and a deep learning model Benchmark test task according to task configuration issued by a user interface layer and sends the resource monitoring task and the deep learning model Benchmark test task to a task scheduling module of an equipment communication layer through a message bus, wherein the task scheduling module executes an ADB command in a synchronous mode in an independent subprocess and immediately returns a result aiming at a short-time command of CPU occupancy rate inquiry, memory occupancy inquiry and equipment state inquiry; step 1.3 connection reliability maintenance The device state maintenance module of the device communication layer monitors the connection reliability of the registered device by depending on the device state table, and confirms whether the ADB channel is stable by periodically sending a non-invasive detection command to the device, when detecting that the command is failed to be executed, the subprocess is abnormally terminated or the device disappears from the ADB devices list, the device state maintenance module marks the corresponding device as unavailable, records the command content, the thread identification and the last effective output fragment when the abnormality occurs, reports the service logic layer through a message bus, and decides to terminate the current task or trigger the reconnection flow by the service logic layer.
  8. 8. The method for monitoring the resources of the deep learning model for the mobile terminal according to claim 6, wherein the specific method in step 2 comprises the following steps: Step 2.1 multisource fusion resolution The multi-source fusion analysis module of the data processing layer receives multi-source original text output which is acquired and forwarded by a task scheduling module forwarded by a device communication layer in the process of executing resource monitoring tasks and TensorFlow Lite Benchmark tasks, wherein the top output and dumpsys meminfo output which are periodically acquired by a resource monitoring thread are analyzed according to a regular matching template which is pre-configured in the system design, and key fields including the total occupancy rate of a CPU (Central processing Unit), the occupancy rate and scheduling priority of each thread, a process PSS, RSS, swap and an anonymous page of the system are analyzed to generate CPU resource data and memory resource data; Calculating the uplink rate and the downlink rate between adjacent sampling points by combining with the output of the network statistics command to generate network resource data; aiming at the log output of TensorFlow Lite Benchmark tools, extracting model initialization time, single reasoning time-consuming sequence, average reasoning delay, minimum reasoning delay, duration of a preheating stage and reasoning performance indexes occupied by a memory in a reasoning process according to keywords and format rules, and reserving time mark information in the Benchmark output; when field missing or format abnormality is detected, the multisource fusion analysis module discards records which cannot be analyzed or uses the last effective sampling value to complement, so that continuity of monitoring data on a time sequence is guaranteed, after analysis is completed, the module performs field-level combination on CPU resource data, memory resource data and network resource data according to a time stamp and a process identifier to form a system resource snapshot containing CPU, memory and network information, and outputs the system resource snapshot to the time reference alignment module together with an inference performance record with the time stamp; Step 2.2 time reference alignment The time reference alignment module receives a system resource snapshot sequence with a resource sampling time stamp and an inference performance record sequence with a Benchmark time stamp, which are output by the multi-source fusion analysis module, analyzes the relation between the resource sampling time stamp and the Benchmark log time stamp, takes a millisecond time stamp or a first frame time stamp output by the Benchmark as a time reference, compares the millisecond time stamp or the first frame time stamp with a local resource monitoring sampling time axis, calculates the time offset between the millisecond time stamp or the first frame time stamp and the first frame time stamp, writes the time offset into an internal time offset management unit, is applied to all subsequent resource monitoring samples, uniformly maps the resource monitoring sequence and the inference log sequence to the same time axis, and when sampling gaps exist in adjacent time points or sampling time are not completely coincident, the time reference alignment module adopts a linear interpolation or neighbor maintenance mode according to a preset strategy, combines the aligned system resource index and the inference performance index according to a uniform time axis after the time axis to generate a multi-dimensional performance index time sequence record, so that the time sequence is kept continuous on the time axis; step 2.3 caching and statistical processing The cache and statistics module carries out structured encapsulation and life cycle management on the multi-dimensional performance index time sequence records which are output by the time reference alignment module and mapped to a unified time axis, internally maintains a circular cache area which can be circularly written in, is used for storing CPU occupancy rate, memory occupancy, network rate, temperature change, power consumption change, model initialization time, single reasoning time consumption, average reasoning delay, minimum reasoning delay, preheating time and time sequence data of reasoning process memory occupancy in the last period, and when new data points are written in, the cache and statistics module calculates the statistical characteristics in a monitoring window in an increment mode, wherein the statistical characteristics comprise average value, median P50, P90, maximum value and mutation detection result of each index, and carries out mutation detection according to the value change amplitude and threshold value, when a unit configuration error, value mutation or physically unreasonable negative value appears, triggers abnormal rollback logic, replaces abnormal points with the latest normal value or interpolation result, marks abnormal events, and the data processing layer encapsulates the multi-dimensional performance index and the information thereof into the multi-dimensional performance index time sequence records with a unified JSON structure through the cache and statistics processing module after the cache and statistics processing is provided to the service layer through the service layer.
  9. 9. The method for monitoring the resources of the deep learning model for the mobile terminal according to claim 6, wherein the specific method in the step 3 comprises the following steps: Step 3.1 resource monitoring task and Benchmark task orchestration The task scheduler of the business logic layer encapsulates relevant parameters into task description objects according to the model file path, delegate type, thread number and preheating times configured by the user interface layer, and generates a task sequence comprising a resource monitoring task and a TensorFlow Lite Benchmark test task; In a task execution stage, a Benchmark management module automatically constructs command line parameters of the Benchmark tool according to task description objects provided by a task scheduler, completes starting and running control of the Benchmark tool and captures log output in a running process; Step 3.2 monitoring data synchronization, result aggregation and Performance assessment When the Benchmark test task is triggered, the state synchronization module synchronously starts a resource monitoring thread and records synchronization point information so as to ensure the consistency of a resource monitoring time window and a Benchmark execution time window; On the basis, the result aggregation module performs summarization analysis on the multidimensional performance index time sequence during the task, extracts the model initialization time, single reasoning time consumption, average and minimum reasoning delay, preheating time and reasoning process memory occupied reasoning performance index, and combines CPU peak occupancy rate, memory peak occupancy and temperature change condition to generate a complete performance evaluation result; step 3.3 visual display and data derivation The user interface layer displays on-line state, equipment serial number, model and authorization information of connected equipment on the equipment management interface according to real-time monitoring data and aggregation results pushed by the service logic layer, displays changes of CPU occupancy rate, memory occupancy, network rate, temperature and power consumption along with time on the resource monitoring interface in a line diagram form, displays reasoning progress and stage switching state on the Benchmark control interface, displays model initialization time, reasoning delay statistical results, system resource peak value and temperature change curve on the reporting center interface, and provides functions of exporting monitoring data and evaluation results into CSV files and JSON files.

Description

Deep learning model resource monitoring system and method for mobile terminal Technical Field The invention belongs to the technical field of resource monitoring, and particularly relates to a deep learning model resource monitoring system and method for a mobile terminal. Background The rapid popularization of the artificial intelligence of the mobile terminal is not separated from the synchronous evolution of three forces, namely the jump of the hardware capacity of the terminal, the integration of the GPU, the ISP and the special NPU by the SoC of the mainstream flagship, the release of calculation force by heterogeneous scheduling, the continuous breakthrough of a deep learning model in pruning, quantization and distillation technology, the compression of a large-scale network to be within tens of megameters and the real-time reasoning of the mobile terminal, and the maturation of lightweight frames such as TensorFlow Lite, pyTorch Mobile, MNN and NCNN, and the like, and the unified operation of model deployment is provided. However, the hardware bottleneck still exists in that the mobile terminal is limited by heat dissipation and battery capacity, the frequencies of a CPU and a GPU are required to be dynamically reduced, the system memory is usually only one tenth of that of a desktop platform, and the network environment has large delay jitter and is sensitive to cost. If the developer lacks quantitative cognition on the coupling relation between model reasoning, system load and energy consumption, fine framework and parameter selection cannot be made, and finally, application heating, clamping and even application store auditing stages are refused. The solution must simultaneously compromise three major elements of "non-intrusive monitoring, cross-ROM compatibility, hardware accelerated assessment". Several tools have been proposed in the academia and industry, but there are limitations in each. Google introduced Profiler in Android Studio in 2017, JVMTI and ART run time hooks were adopted to record Java method and memory allocation information, users could check thread stack and object leakage in IDE, however Profiler could not recognize the utilization rate of independent NPU in SoC, could not make frame-by-frame reason for AI workload under GPU loader programming model, and lacked structured linkage with deep learning framework. Perfetto and Systrace capture scheduling, context switching and I/O events through KERNEL TRACE, are finer in granularity, but output as hundreds of megabytes of data files, need to be matched with web-based UI offline parsing, and are difficult to use in daily iterations. The third party monitors the SDK trend to collect general indexes such as first frame, blocking and network request, the internal code injection has an intrusion risk to the commercial APK, and the local reasoning delay and the energy consumption data are not talking from each other. In the field of model Benchmark testing, TFLite team provides Benchmark Tool, and the number of threads, the delete type and the number of norm-up can be configured through a command line, and finally average reasoning time delay and initialization time are output. PyTorch Mobile speed Benchmark, NCNN, benchncnn script also has similar functionality. However, the tools exist in an offline executable program mode, a developer needs to manually add the executable file and the model, splice parameters in Shell, and automatically analyze stdout and copy the stdout to a table after the test is finished. If the CPU temperature or the memory curve is to be observed at the same time, a terminal must be opened to execute the top or dumpsys meminfo, and finally the time stamp is aligned by naked eyes. The serial connection of multiple tools causes lengthy links and data disconnection, is difficult to enter a CI pipeline, and prevents non-professional algorithm engineers from participating in performance tuning. On the industrial side, the high-pass Snapdragon Profiler can read the PMU counter and the GPU frequency point, so that strong support is provided for SoC kernel level optimization, but the SoC kernel level optimization only supports a self-contained chip, and an authorization driver is required to be loaded in a developer mode, so that Kirin and MTK ecology cannot be covered. Arm Mobile Studio Profiling Suite attack Mali GPU, and are absent from Adreno and natural. The manufacturer has the disadvantage that a unified tool chain is difficult to land, and cross-platform comparison becomes a pseudo proposition. In summary, the current approach to the goal is the semi-automated approach of "ADB script + TFLite Benchmark". The method relies on ADB to acquire partial system information under the condition of no Root, and simultaneously infers performance by means of an official Benchmark tool test model, but still needs manual operation and cannot automatically correlate two types of data, and further lacks real-time visualization