CN-121977875-A - Intelligent coffee machine running state monitoring method and system based on reinforcement learning

CN121977875ACN 121977875 ACN121977875 ACN 121977875ACN-121977875-A

Abstract

The invention discloses an intelligent coffee machine running state monitoring method and system based on reinforcement learning, comprising the following steps of S1, collecting data and preprocessing to form a running parameter sequence, S2, carrying out dimension reduction processing on the running parameter to construct the running state sequence, S3, carrying out multi-step prediction on the running state sequence through a DLinear model and generating a state prediction sequence through a linear predictor, S4, calculating an instant rewarding value of each regulation and control action according to the running state sequence and the state prediction sequence, S5, adopting an A3C algorithm, generating and executing a regulation and control instruction according to the instant rewarding value, S6, constructing a state transition group and writing an experience data set, and S7, updating DLinear model and A3C algorithm parameters according to the experience data set. The method fuses the Markov model, the principal component analysis, the DLinear model and the A3C algorithm, and has the advantages of strong adaptability, high prediction precision and good stability.

Inventors

WU XUCHAO

Assignees

慈溪市起源电器有限公司

Dates

Publication Date: 20260505
Application Date: 20260123

Claims (10)

1. The intelligent coffee machine running state monitoring method based on reinforcement learning is characterized by comprising the following steps of: s1, acquiring multidimensional operation data of an intelligent coffee machine according to a preset frequency and preprocessing the multidimensional operation data to form an operation parameter sequence; s2, in a Markov model, performing dimension reduction on the operation parameters of each time step by adopting a principal component analysis method, and extracting principal component characteristics to construct an operation state sequence; s3, performing multi-step prediction operation on the running state sequence through a DLinear model, constructing a sliding time window, extracting a long-term trend component and a short-term disturbance component, and generating a state prediction sequence by adopting a linear predictor; S4, constructing a state rewarding function according to the characteristic difference value between the running state sequence and the state prediction sequence, and calculating the instant rewarding value of each regulation and control action in a preset action library; S5, executing a regulation instruction selection operation according to the instant rewarding value by adopting an A3C algorithm, generating a regulation instruction of each time step through a strategy network and executing the regulation instruction; S6, forming a state transition group by the running state sequence, the regulation command, the rewarding value and the running state sequence which is generated in real time after the regulation command is executed, executing time difference error calculation and priority binding operation, and writing a binding result into an existing experience data set; And S7, selecting a training sample set from the experience data set according to a priority experience playback mechanism, and updating DLinear the parameters of the model and the A3C algorithm through a back propagation mechanism.
2. The method of claim 1, wherein the multi-dimensional operation data includes temperature, fluid pressure, water flow rate, current values and operation duration, the preprocessing includes time alignment, noise filtering and value normalization operations, the sliding time window represents continuous time segments truncated by sliding in a fixed step size and length in the operation state sequence, the preset behavior library represents a set of candidate behaviors defined in advance for regulating the operation state of the coffee machine, and the experience data set represents a set of state transition groups including priority weights continuously accumulated during the operation.
3. The method for monitoring the operation state of an intelligent coffee machine based on reinforcement learning according to claim 1, wherein S2 specifically comprises: S21, expanding an operation parameter sequence in a Markov model according to a time sequence, organizing operation parameters corresponding to each time step into parameter vectors, and constructing a state transition diagram based on the parameter vectors between adjacent time steps, wherein the method specifically comprises the following steps: Dividing the operation parameter sequence into a plurality of time slices according to time sequence, wherein the time slices comprise a group of operation parameter values arranged according to a preset index; combining the operation parameter values in each time segment into parameter vectors according to a fixed sequence to form a parameter vector sequence arranged according to time steps; In the Markov model, the parameter vectors of adjacent time steps are paired in pairs, and the state transfer relation between each pair of paired pairs is recorded, specifically comprising: pairing two adjacent parameter vectors in the parameter vector sequence to construct a state transfer pair set; Numbering each state transition pair and assigning a directed connection mark, wherein the directed connection mark represents a unidirectional evolution relationship between the state transition pairs; marking the parameter vector of the previous time step in the state transition pair as a current state node, and marking the parameter vector of the next time step as a target state node; based on the directional connection marks between each current state node and the corresponding target state node, recording the formed state transition relation; Taking all the parameter vectors as state nodes, and constructing a state transition diagram by combining corresponding state transition relations; S22, performing an average value removing operation on the parameter vectors by adopting a principal component analysis method, calculating a correlation matrix between the parameter vectors after the average value removing, and extracting a principal direction set based on the correlation matrix; s23, performing projective transformation on the parameter vectors corresponding to each time step according to the main direction set, and extracting the main component characteristics of each time step; S24, sequentially associating the main component characteristics generated in each time step according to the state transition diagram, and constructing an operation state sequence arranged according to the time steps.
4. A method for monitoring the operation state of an intelligent coffee machine based on reinforcement learning according to claim 3, wherein the extraction process of the principal component features specifically comprises: constructing a parameter vector sequence into a two-dimensional parameter matrix according to parameter dimensions, wherein each row of the parameter matrix corresponds to a parameter vector of one time step; Performing mean value calculation on each column of the parameter matrix to obtain a mean value vector of each parameter dimension in all time steps; subtracting the corresponding mean value vector from the parameter vector of each row in the parameter matrix to generate a zero-mean value matrix; Calculating a covariance matrix between each parameter dimension based on the zero mean matrix, wherein elements in the covariance matrix represent the degree of cooperative variation between two parameter dimensions; performing eigenvalue decomposition operation on the covariance matrix, and extracting all eigenvalues and corresponding eigenvector sets; sorting all the characteristic values in a descending order, selecting characteristic vectors corresponding to a plurality of characteristic values in front in a sorting result according to a preset accumulated duty ratio, and constructing a main direction set; and projecting each parameter vector into the main direction set to obtain the main component characteristics corresponding to each time step.
5. The method for monitoring the operation state of an intelligent coffee machine based on reinforcement learning according to claim 1, wherein S3 specifically comprises: S31, constructing a sliding time window with a fixed length based on an operation state sequence, dividing the operation state sequence into a plurality of state fragments with preset overlapping ranges, wherein the state fragments represent the operation state evolution process of the intelligent coffee machine in the time period; S32, inputting each state segment into DLinear models respectively, performing trend modeling and disturbance modeling operation along the time dimension, and extracting long-term trend components and short-term disturbance components respectively; s33, performing splicing operation on the long-term trend component and the short-term disturbance component of each state segment in the feature dimension to form a fusion feature matrix, wherein the fusion feature matrix represents the comprehensive evolution feature in the state segment; S34, inputting the fusion feature matrix into a linear predictor to execute multi-step prediction operation, carrying out future time coding on prediction, and establishing a time mapping relation between a coding result and time steps in an operation state sequence to generate a state prediction sequence, wherein the method specifically comprises the following steps of: dividing the fusion feature matrix into a plurality of sub-windows, wherein each sub-window comprises a fixed number of fusion feature vectors with continuous time steps; Inputting each sub-window into a linear predictor, and executing state prediction operation of setting step length based on the overall change trend of the fusion feature vector in the sub-window to generate a prediction state vector set consistent with the step length; adding future time step coding information to each prediction state vector, wherein the time step coding information represents a prediction time index corresponding to the prediction state; And arranging all the prediction state vectors with the time step codes according to the time index sequence, and establishing a time mapping relation with the time steps in the running state sequence to generate a state prediction sequence consistent with the running state sequence structure.
6. The method for monitoring the operation state of an intelligent coffee machine based on reinforcement learning according to claim 5, wherein S32 specifically comprises: S321, arranging the operation state vectors in the state fragments in time sequence to construct a state fragment matrix, wherein each row of the state fragment matrix corresponds to the operation state vector of one time step, and each column corresponds to different characteristic dimensions in the operation state vector; s322, regarding each column of the state segment matrix as a time evolution sequence of a corresponding characteristic dimension, executing least square fitting operation on the state segment matrix according to the columns in a trend modeling channel of the DLinear model, extracting a trend estimation sequence corresponding to the time evolution sequence of each characteristic dimension, and generating a long-term trend matrix by column splicing, wherein the method specifically comprises the following steps: Extracting each column in the state segment matrix, and constructing a state characteristic value corresponding to the column into a time evolution sequence according to a time sequence; constructing a corresponding time index sequence for each time evolution sequence, forming a coordinate pair by each time index and a corresponding state characteristic value, and constructing a fitting sample set of the characteristic dimension; Calculating a slope value and an intercept value by adopting a least square method based on a fitting sample set, and calculating a trend fitting value of each time step based on the slope value and the intercept value to generate a trend estimation sequence, wherein the method specifically comprises the following steps: For each feature dimension, calculating a time index mean value and a state feature mean value in a corresponding fitting sample set; Subtracting the state characteristic mean value from the state characteristic value of each time step to obtain a corresponding state characteristic deviation value, and subtracting the time index mean value from the time index of each time step to obtain a corresponding time index deviation value; Multiplying the state characteristic deviation value corresponding to each time step with the time index deviation value, and calculating the average value of all the product results to obtain covariance between the characteristic dimension and the time index; averaging the square results of all the time index deviation values to obtain the variance of the time index; dividing the covariance by the variance to obtain a slope value of the feature dimension; subtracting the product of the slope value and the time index mean value from the state characteristic mean value to obtain an intercept value of the characteristic dimension; Multiplying the time index by the slope value and adding the time index and the intercept value to obtain a trend fitting value of a corresponding time step; Arranging all trend fitting values in time sequence to form a trend estimation sequence of the feature dimension; splicing trend estimation sequences of all feature dimensions according to columns, and constructing a long-term trend matrix consistent with the state segment matrix structure; S323, in the disturbance modeling channel, performing difference calculation on the state segment matrix and the long-term trend matrix according to element positions, and obtaining residual variation values of each time step in each characteristic dimension to construct a short-term disturbance matrix; s324, performing structural reorganization operation on the long-term trend matrix and the short-term disturbance matrix according to time steps, and marking characteristic dimension sources respectively to form a long-term trend component and a short-term disturbance component.
7. The method for monitoring the operation state of an intelligent coffee machine based on reinforcement learning according to claim 1, wherein S4 specifically comprises: s41, performing time alignment processing on the running state sequence and the state prediction sequence according to a time mapping relation to construct a state alignment sequence; s42, performing characteristic difference value calculation operation on each group of running state vectors and predicted state vectors in the state alignment sequence to generate a characteristic difference value sequence; S43, mapping each characteristic difference value vector into a state offset score value according to a preset mapping rule, wherein the state offset score value represents the deviation degree between an operation state vector and a prediction state vector; s44, constructing a state rewarding function according to the state offset scoring value, wherein the state rewarding function comprises a dense rewarding component and a sparse rewarding component, and specifically comprises the following steps: Calculating continuous change rate and prediction error amplitude according to the state offset score value of each time step to form a dense rewarding component; identifying a mutation time step in the state offset scoring value, marking the mutation time step with the state offset scoring value exceeding a set reference threshold value as a high-risk time step according to the identification result, and setting a fixed rewarding excitation or punishment value in the high-risk time step to form a sparse rewarding component; Overlapping and fusing the dense rewarding component and the sparse rewarding component according to time steps to construct a state rewarding function; S45, calculating the corresponding instant rewards value of each regulation and control action in the preset action library at each time step according to the state rewards function.
8. The method for monitoring the operation state of an intelligent coffee machine based on reinforcement learning according to claim 1, wherein S5 specifically comprises: s51, initializing parameters of a strategy network by adopting an A3C algorithm; S52, performing behavior probability calculation operation on the running state vector of each time step in the strategy network, generating behavior probability distribution corresponding to each time step, performing weighting processing on the behavior probability distribution according to the instant rewarding value, selecting regulation and control behaviors according to the weighting result, and generating a regulation and control instruction set, wherein the method specifically comprises the following steps: performing forward propagation operation on the running state vector of each time step in the strategy network, obtaining probability scores of all regulation and control behaviors under the current time step, and constructing behavior probability distribution; taking the instant rewarding value as a weighting factor, and carrying out weighting treatment on probability scores of corresponding time pace control behaviors in the behavior probability distribution according to a preset weighting mechanism; Selecting the regulation and control action with the highest probability score from the weighted action probability distribution as the optimal regulation and control action of the time step; Extracting behavior description content corresponding to the optimal regulation behavior from a preset behavior library, constructing regulation instructions of the time step, and summarizing all the regulation instructions into a regulation instruction set; and S53, executing the regulation instruction set in the intelligent coffee machine according to the time sequence, and triggering corresponding regulation actions at each time step.
9. The method for monitoring the operation state of an intelligent coffee machine based on reinforcement learning according to claim 1, wherein S6 specifically comprises: s61, acquiring multidimensional operation data of a state prediction sequence corresponding to a time step in real time, and generating an operation state sequence after the execution of a regulation instruction as a state comparison sequence; S62, combining an operation state sequence before the execution of the regulation instruction, a regulation instruction set, all instant rewards values and a state comparison sequence into a state transition group; s63, based on the instant reward value and the state comparison sequence in the state transition group, executing time difference error calculation operation to generate a time difference sequence; s64, carrying out Min-Max normalization processing on the time difference sequence to obtain a priority grading value as priority weight, binding the priority grading value to a corresponding state transition group, and writing the state transition group containing the priority weight into the experience data set.
10. An intelligent coffee machine operation state monitoring system based on reinforcement learning, which performs an intelligent coffee machine operation state monitoring method based on reinforcement learning as claimed in any one of claims 1 to 9, comprising: The data acquisition module is used for acquiring multidimensional operation data of the intelligent coffee machine according to a preset frequency and preprocessing the multidimensional operation data to form an operation parameter sequence; The feature extraction module is used for performing dimension reduction processing on the operation parameters of each time step by adopting a principal component analysis method in the Markov model, and extracting principal component features to construct an operation state sequence; the state prediction module is used for executing multi-step prediction operation on the running state sequence through a DLinear model, constructing a sliding time window, extracting a long-term trend component and a short-term disturbance component, and generating a state prediction sequence by adopting a linear predictor; the rewarding calculation module is used for constructing a state rewarding function according to the characteristic difference value between the running state sequence and the state prediction sequence and calculating the instant rewarding value of each regulation and control action in the preset action library; the instruction generation module is used for executing the regulation instruction selection operation according to the instant rewarding value by adopting an A3C algorithm, generating the regulation instruction of each time step through a strategy network and executing the regulation instruction; The data storage module is used for forming a state transition group from the running state sequence, the regulation and control instruction, the reward value and the running state sequence which is generated in real time after the regulation and control instruction is executed, executing time difference error calculation and priority binding operation, and writing a binding result into the existing experience data set; And the feedback updating module is used for selecting a training sample set from the experience data set according to a priority experience playback mechanism, and updating DLinear the parameters of the model and the A3C algorithm through a back propagation mechanism.

Description

Intelligent coffee machine running state monitoring method and system based on reinforcement learning Technical Field The invention relates to the technical field of equipment state monitoring, in particular to an intelligent coffee machine running state monitoring method and system based on reinforcement learning. Background Along with the continuous development of the Internet of things and the embedded intelligent equipment, the intelligent coffee machine is used as a typical intelligent equipment facing to a consumption terminal, and gradually has the operating characteristics of multifunction, multimode and multi-parameter adjustability. In the actual use process, the running stability and the regulation accuracy of the intelligent coffee machine have remarkable influence on the quality of finished products, so that higher requirements are provided for real-time monitoring and intelligent optimization of the running state of the intelligent coffee machine. The current common running state monitoring mode mainly depends on threshold judgment, a fixed rule base or a strategy template established by manual experience. The method is generally used for static judgment based on a single parameter or a small number of related parameters, lacks the perceptibility of the running state change trend, and cannot be used for finely modeling the fluctuation of the equipment performance, the environmental disturbance or the tiny degradation caused by long-term running. In addition, due to strategy static state and feedback lag, the system is easy to generate misjudgment, missed judgment or response delay under a multi-state complex switching scene, and the self-adaptive regulation and control of the operation behavior are difficult to realize. In some high-end devices, research has been attempted to introduce a predictive model or a control algorithm to perform forward estimation and rule triggering control on the running state of the device, but most methods still stay at the level of constant value regression or short time window modeling, and cannot capture the long-term evolution trend of the running state. Meanwhile, the existing method generally lacks a feedback closed-loop mechanism based on reward driving, and lacks the dynamic updating capability of a state monitoring strategy. The reinforcement learning is used as a decision optimization method based on rewarding driving, has strong self-adaptive capacity in the aspects of environment modeling, action selection and strategy updating, and provides a new technical path for solving the problems of instantaneity, flexibility and long-term adaptability in the monitoring of the running state of the intelligent coffee machine. However, the prior art lacks a complete solution that fuses multidimensional data dimension reduction analysis, time sequence trend modeling and intelligent strategy optimization methods based on rewards driving. Therefore, how to provide a method and a system for monitoring the operation state of an intelligent coffee machine based on reinforcement learning is a problem to be solved by those skilled in the art. Disclosure of Invention The invention aims to provide an intelligent coffee machine running state monitoring method and system based on reinforcement learning, which fully fuses Markov state modeling, principal component analysis, DLinear prediction models and A3C reinforcement learning algorithms, constructs an intelligent state monitoring flow with state modeling, trend prediction, rewarding evaluation and strategy self-updating capacity, and details the whole process from multidimensional running data acquisition, state extraction, trend prediction, rewarding function construction to regulation strategy generation and feedback optimization, and has the advantages of strong adaptability, high prediction precision and good long-term stability. According to the embodiment of the invention, the intelligent coffee machine running state monitoring method based on reinforcement learning comprises the following steps: s1, acquiring multidimensional operation data of an intelligent coffee machine according to a preset frequency and preprocessing the multidimensional operation data to form an operation parameter sequence; s2, in a Markov model, performing dimension reduction on the operation parameters of each time step by adopting a principal component analysis method, and extracting principal component characteristics to construct an operation state sequence; s3, performing multi-step prediction operation on the running state sequence through a DLinear model, constructing a sliding time window, extracting a long-term trend component and a short-term disturbance component, and generating a state prediction sequence by adopting a linear predictor; S4, constructing a state rewarding function according to the characteristic difference value between the running state sequence and the state prediction sequence, and calculating the instant rewar