CN-121998683-A - AI equipment full life cycle management system
Abstract
The invention provides a full life cycle management system of an AI device, which relates to the technical field of product life cycle management and comprises an acquisition module, a judgment module, a synthesis module, a quantization module, a correction module, an extrapolation module and an output module, wherein the acquisition module is used for acquiring operation parameters of the AI device, the judgment module is used for judging whether the AI device is normal or not by combining a bifurcation theory based on a jacobian matrix, the synthesis module is called if the AI device is normal, the synthesis module is used for synthesizing the approach degree of the current state and the fault state of the AI device by combining the maximum real part characteristic value of the jacobian matrix, the quantization module is used for quantizing the transient fault probability of the AI device under the approach degree through generalized extremum distribution, the correction module is used for correcting the transient fault probability by combining Chebyshev inequality, the extrapolation module is used for extrapolating the transient fault probability of different time steps to obtain the predicted transient fault probability of a plurality of time steps in the future, and the output module is used for outputting the residual life interval of the AI device under the preset confidence.
Inventors
- LI YONGCHUN
- LI HAIWEI
Assignees
- 达和晟控股集团有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260205
Claims (10)
- 1. An AI device full lifecycle management system, comprising: the acquisition module is used for acquiring the operation parameters of the AI equipment in a preset time period; the judging module is used for judging whether the AI equipment is normal or not according to the operation parameters and in combination with a bifurcation theory based on a jacobian matrix, if so, the synthesizing module is called, and otherwise, the collecting module is called; A synthesizing module, configured to synthesize a degree of proximity between a current state and a fault state of the AI device in combination with a maximum real part eigenvalue related to the jacobian matrix; the quantization module is used for quantizing the transient fault probability of the AI equipment under the proximity degree through generalized extremum distribution; the correction module is used for correcting the transient fault probability by combining the Chebyshev inequality; The extrapolation module is used for extrapolating the instantaneous fault probabilities of different time steps to obtain predicted instantaneous fault probabilities of a plurality of time steps in the future; And the output module is used for outputting the residual life interval of the AI equipment under the preset confidence degree by combining each predicted instantaneous fault probability.
- 2. The AI device full lifecycle management system of claim 1, wherein the operational parameters include throughput, error rate, output delay, and instantaneous power consumption.
- 3. The AI device full life cycle management system of claim 1, wherein the combination in the determination module determines whether the AI device is normal based on a bifurcation theory of a jacobian matrix, specifically for: Calculating the operating parameter change rate of each operating parameter, wherein the operating parameter change rate is specifically the derivative of the operating parameter with respect to time; Establishing an operation parameter description equation of the operation parameters and the operation parameter change rate by combining the jacobian matrix, wherein the jacobian matrix is a matrix of four rows and four columns, and an ith row and an jth column element in the jacobian matrix represent an influence coefficient of unit change of the jth operation parameter on the ith operation parameter change rate; Estimating each element in the jacobian matrix to obtain a jacobian estimation matrix; performing eigenvalue decomposition on the jacobian estimation matrix to obtain a plurality of eigenvalues; extracting real part characteristic values of the characteristic values; judging whether the maximum real part characteristic value is smaller than zero, if yes, determining that the AI equipment is normal according to the bifurcation theory, and if not, determining that the AI equipment is abnormal.
- 4. The AI device full lifecycle management system of claim 1, wherein the composition module is specifically configured to: calculating standard deviation of all maximum real part characteristic values extracted in the preset time length; Taking the maximum real part characteristic value as a fixed critical value, and calculating the standardized stable deviation degree of the AI equipment, wherein the standardized stable deviation degree is specifically the quotient of the maximum real part characteristic value and the standard deviation; establishing a noise suppression item of the standardized stable deviation degree so as to avoid fault misjudgment caused by the jump of the operation parameters; And taking the product of the normalized steady deviation degree and the noise suppression term to obtain the proximity degree.
- 5. The AI device full lifecycle management system of claim 1, wherein the quantization module is specifically configured to: acquiring historical operation parameters of the AI equipment; determining the proximity sample value of a plurality of fault occurrence moments under the historical operation parameters; Estimating shape parameters of the generalized extremum distribution from each proximity sample value, wherein the shape parameters comprise position parameters describing a proximity concentrated trend, scale parameters describing a proximity discrete trend and shape parameters describing a proximity heavy tail distribution; And calculating the transient fault probability through the generalized extremum distribution based on the estimated shape parameter.
- 6. The AI device full life cycle management system of claim 5, wherein said estimating shape parameters of said generalized extremum distribution from each of said proximity sample values comprises: Establishing a log-likelihood function with respect to the generalized extremum distribution; initializing the position parameter, the scale parameter and the shape parameter, wherein the initial value of the position parameter is the median of each adjacent degree sample value, the initial value of the scale parameter is the standard deviation of each adjacent degree sample value, and the initial value of the shape parameter is a minimum value larger than zero; and outputting a position parameter estimated value, a scale parameter estimated value and a shape parameter estimated value by taking the maximum log likelihood function as a target based on the position parameter initial value, the scale parameter initial value and the shape parameter initial value.
- 7. The AI device full life cycle management system of claim 1, wherein the correction module is specifically configured to: Calculating accumulated energy consumption of the AI device in the preset duration, wherein the accumulated energy consumption is an integral value of a square value of an Euclidean norm of an operation parameter change rate vector in the preset duration; Calculating an accumulated energy consumption mean value and an accumulated energy consumption variance, wherein the accumulated energy consumption mean value is a quotient of the accumulated energy consumption and the preset duration; Regarding the accumulated energy consumption as a random variable, estimating the vulnerability probability of the AI equipment exceeding the equipment average aging level through the Chebyshev inequality, wherein the vulnerability probability is specifically a ratio of the accumulated energy consumption variance to the accumulated energy consumption mean square; multiplying the instantaneous fault probability by the vulnerability probability to obtain a corrected fault probability.
- 8. The AI device full lifecycle management system of claim 1, wherein the extrapolation module is specifically configured to: And extrapolating the instantaneous fault probabilities of different time steps through a self-adaptive weighted moving average model to obtain the predicted instantaneous fault probabilities of a plurality of time steps in the future.
- 9. The AI device full lifecycle management system of claim 1, wherein the output module is specifically configured to: Accumulating the predicted instantaneous fault probabilities to obtain accumulated fault probabilities of faults of the AI equipment before different future moments, wherein the absolute value of the difference between the accumulated fault probabilities and a value I is the survival probability of normal operation of the AI equipment after the same future moment; Acquiring distribution data of the accumulated fault probability; Combining the distribution data under the preset confidence coefficient, and reversely deducing the survival probability to obtain an AI equipment service life upper limit value and an AI equipment service life lower limit value; And respectively removing the running time of the AI equipment from the AI equipment service life upper limit value and the AI equipment service life lower limit value to obtain an AI equipment residual service life upper limit value and an AI equipment residual service life lower limit value, wherein the AI equipment residual service life upper limit value and the AI equipment residual service life lower limit value form the AI equipment residual service life section.
- 10. The AI device full lifecycle management system of claim 9, further comprising: and the early warning module is used for executing early warning under the condition that the lower limit value of the residual service life of the AI equipment is smaller than the preset lower limit value of the residual service life of the AI equipment.
Description
AI equipment full life cycle management system Technical Field The invention relates to the technical field of product life cycle management, in particular to an AI equipment full life cycle management system. Background AI devices refer to hardware devices for performing artificial intelligence related tasks, typically including, but not limited to, computers, sensors, robots, automation devices, intelligent terminals, and the like. These devices are commonly used in the fields of machine learning, data analysis, automation control, etc., to help accomplish complex computational tasks or intelligent decisions. It is very important to perform full life cycle management of AI devices, and first, AI devices generally need to be maintained and optimized continuously, so as to ensure long-term efficient operation of AI devices. The full life cycle management can effectively monitor the use condition, performance change and possible problems of the equipment, so as to repair or update in time. In addition, with the continuous development of the AI technology, the updating and updating of the equipment are also quick, and the timely management of the life cycle of the equipment is helpful for ensuring the continuous innovation of the technology and the stable operation of the system. Meanwhile, the service life of the equipment can be prolonged and the running cost can be reduced by good life cycle management. However, in the prior art, the fault prediction of the AI device depends on simple threshold value alarm and traditional Gaussian distribution, nonlinear dynamic characteristics and heavy tail extreme events of the device operation cannot be effectively processed, so that the fault detection precision is low, the residual life estimation is unreliable, and reliable maintenance and management of the AI device are difficult to realize. Disclosure of Invention In view of the above-mentioned shortcomings of the prior art, an object of an embodiment of the present invention is to provide an AI device full life cycle management system, which can solve the technical problems in the prior art that in AI device fault prediction, the AI device is dependent on simple threshold alarm and traditional gaussian distribution, nonlinear dynamic characteristics and heavy tail extreme events of device operation cannot be effectively processed, resulting in low fault detection accuracy, unreliable residual life estimation, and difficulty in reliable maintenance and management of AI devices. The embodiment of the invention provides an AI equipment full life cycle management system, which comprises: the acquisition module is used for acquiring the operation parameters of the AI equipment in a preset duration; The judging module is used for judging whether the AI equipment is normal according to the operation parameters and in combination with the bifurcation theory based on the Jacobian matrix, if yes, the synthesizing module is called, and if not, the collecting module is called; The synthesis module is used for combining the maximum real part eigenvalue of the jacobian matrix to synthesize the proximity degree of the current state and the fault state of the AI equipment; The quantization module is used for quantizing the transient fault probability of the AI equipment under the proximity degree through generalized extremum distribution; the correction module is used for correcting the transient fault probability by combining the Chebyshev inequality; the extrapolation module is used for extrapolating the instantaneous fault probability of different time steps to obtain the predicted instantaneous fault probability of a plurality of time steps in the future; And the output module is used for combining each predicted instantaneous fault probability and outputting the residual life interval of the AI equipment under the preset confidence. The technical scheme provided by the embodiment of the invention has the beneficial effects that at least: In the embodiment of the invention, the bifurcation theory based on the jacobian matrix is utilized to judge the inherent dynamic stability of the AI equipment, the initial judgment avoids the residual life estimation under all working conditions, and the resource consumption is effectively reduced under the condition of ensuring the normal operation of the AI equipment. Early signs of the AI device tending to destabilize (i.e., fail) are acutely captured by calculating the maximum real feature value, thereby synthesizing a fail-near indicator. Then, aiming at the heavy tail characteristic of fault data, a generalized extremum distribution is adopted instead of the traditional Gaussian distribution in combination with a proximity index to accurately quantify the instantaneous probability of the occurrence of an extreme abnormal event, and then the chebyshev inequality is utilized to conservatively correct the probability so as to ensure the reliability of early warning. Finally, by extrapol