CN-121786593-B - Personalized federal learning method for fault diagnosis task of oil and gas field equipment

CN121786593BCN 121786593 BCN121786593 BCN 121786593BCN-121786593-B

Abstract

The invention belongs to the technical field of fault diagnosis of oil and gas field equipment, and discloses a personalized federal learning method for fault diagnosis tasks of oil and gas field equipment. In the aspect of model structure, a parameter decoupling strategy is adopted to divide a complete model into a shared feature extractor and a client private classifier, so that the adaptability of the model to heterogeneous data distribution of each client is improved. In the aspect of client selection, utility scores based on update direction differences are introduced, clients with more representativeness or explorability in the current training turn are dynamically identified, and training efficiency and model generalization performance are improved. In the parameter aggregation strategy, a weighted aggregation method driven by difference is adopted, the aggregation weight is dynamically distributed according to the utility score uploaded and updated by the client, the negative influence of the abnormal or low-quality client on the global model is restrained, and the stability and the convergence speed of the system are improved. The method is suitable for privacy protection federal modeling aiming at fault diagnosis tasks of oil and gas field equipment in oil and gas field production scenes.

Inventors

WANG DANXIN
Sun Ruige
ZHOU HUANYU
JIN LONG

Assignees

中国石油大学(华东)

Dates

Publication Date: 20260508
Application Date: 20260303

Claims (9)

1. The personalized federal learning method for the fault diagnosis task of the oil and gas field equipment is characterized by comprising the following steps of: step 1, constructing a global model at a server side, decoupling parameters of the global model into a Base layer Base and a classifier Head, wherein the Base is a shared feature extractor; step 2, the selected client executes the double-stage local training; In the personalized training stage, freezing the Base parameter and only updating the Head parameter; in the shared training stage, freezing the Head parameter, updating the Base parameter, and then recording the parameter difference value between the Base of the current round and the Base of the previous round to obtain a local update vector; step 3, uploading the trained Base parameter update to a server side by each client side, and uploading the local update vector of the round at the same time, wherein the Head parameter is kept not to be uploaded by the client side; Step 4, after the server receives the uploaded local update vector and Base parameters, calculating utility scores of each client, wherein the utility scores are defined as the difference between the local update vector of the client and the global update vector of the previous round; giving corresponding weights to the selected clients based on the calculated utility scores, and carrying out weighted aggregation on Base parameters uploaded by the selected clients by using the weights to update the global Base to form a current round of updated global model; based on the parameter difference value of the global model updated in the current round and the previous round, obtaining a global updating vector of the current round; the step 4 specifically comprises the following steps: step 4.1. The utility score is calculated as follows: ; Wherein the method comprises the steps of Representing the utility score of client i after the t-th round of training, The local update vector representing the client i round t i.e. the present round, The t-1 th round is the global update vector of the upper round, and cos () represents cosine similarity; Step 4.2, the server gives weight to the selected client according to the calculated utility score and performs normalization processing; the weight calculation formula of the client i is as follows: ; ; Wherein, the Is a regularization constant; Representing the weight of the client i, The final weight is normalized for client i, k represents the number of clients, i=1. And 4.3, weighting and aggregating Base parameters uploaded by all clients, updating the global model and storing model copies for the next round of calculation of global update vectors, wherein the calculation formula for updating the global model is as follows: ; Wherein, the A client set selected to participate in training for the round; representing a Base parameter of the client i after the training of the t-th round; Representing an updated global model of the t-th round; and 4.4. Calculating a global update vector of the t-th round, namely the round, wherein the calculation formula is as follows: ; Wherein the method comprises the steps of Representing the global update vector for the t-th round or this round, An updated global model representing the t-1 th round; step 5, the server selects the proportion of the clients participating in the next round of training based on the calculated utility score, and the server transmits the Base parameter of the global model to the selected clients, wherein the Head part is reserved locally at the clients; And 6, repeating the steps 2 to 5 until the preset training round is reached or the model precision is converged, so that personalized federal learning for the fault diagnosis task of the oil and gas field equipment is realized.
2. The personalized federal learning method for fault diagnosis tasks of oil and gas field equipment according to claim 1, wherein in the step 2, the optimized loss function of the personalized training phase is: ; Wherein the method comprises the steps of In order for the parameters of the classifier to be chosen, Y is the real label of the sample; Optimizing the loss function in the personalized training stage for the loss function, wherein x is input data, and min is the minimization of the loss function; The optimized loss function of the shared training phase is: ; Wherein the method comprises the steps of In order to extract the parameters of the feature extractor, Representing the characteristic representation of the input sample x taken at the current Base parameter, Representing the optimized loss function of the shared training phase.
3. The personalized federal learning method for fault diagnosis tasks of oil and gas field equipment according to claim 1, wherein in the step 2, the calculation formula of the local update vector is as follows: ; Wherein the method comprises the steps of Representing the local update vector of client i round t, Representing the Base parameter of client i after the t-th round of training, Representing the Base parameter of client i after the t-1 th round of training.
4. The personalized federal learning method for fault diagnosis tasks of oil and gas field equipment according to claim 1, wherein in the step 5, the higher the difference between the local update vector of the client and the global update vector of the previous round, the larger the calculated utility score, which indicates that the update direction of the client is more exploratory, and therefore is preferentially selected; Before each round of training, the server terminal screens based on the utility scores of all clients in the historical training data, and selects the first K% of clients according to a proportion to participate in the next round of training, wherein 0<K is less than or equal to 1, and k=the total number of clients is multiplied by K%; In step 5, the server side sends the latest global sharing Base model to the selected client side, and the selected client side continues to use the Head model reserved locally to enter the next training round.
5. The personalized federal learning method for fault diagnosis tasks of oil and gas field equipment according to claim 1, wherein in the step 6, after model training is finished, model accuracy measured on the test data set by the final model is output; The final model comprises a commonly shared Base feature extractor which is deployed in a server or edge equipment, and the Head reserved by each client is used as a personalized component to be deployed according to local business requirements.
6. The personalized federal learning method for fault diagnosis task of oil and gas field equipment according to claim 1, wherein in the step 1, an initial neural network model is built at a server side as a global model, and parameters of the global model are structurally divided into two parts, namely a Base layer and a Head layer; After the model is initialized, the server sets training round number T, client selection proportion K and balance weight factor The global model copies are then distributed to clients participating in the training of the round.
7. The personalized federal learning system for realizing the personalized federal learning method for the fault diagnosis task of the oil and gas field equipment, which is disclosed in claim 1, comprises a server side and a plurality of clients, and is characterized in that the server side is used for global model updating, and the clients are used for realizing local model updating; The method comprises the steps of constructing a global model at a server side, decoupling global model parameters into a Base layer Base and a classifier Head, wherein the Base is a shared feature extractor; the selected client performs a two-stage local training, as follows: In the personalized training stage, freezing the Base parameter and only updating the Head parameter; in the shared training stage, freezing the Head parameter, updating the Base parameter, and then recording the parameter difference value between the Base of the current round and the Base of the previous round to obtain a local update vector; Uploading the trained Base parameter update to a server side by each client side, and uploading a local update vector of the round at the same time, wherein the Head parameter is reserved on the client side and is not uploaded as a basis for subsequent weighted aggregation; After receiving the local update vector and the Base parameter uploaded by the client, the server calculates the utility score of each client, wherein the utility score is defined as the difference between the local update vector of the client and the global update vector of the previous round; Giving corresponding weight to the selected client based on the calculated utility score, and carrying out weighted aggregation on the Base parameter uploaded by the selected client so as to update the global Base and form a round of updated global model; based on the parameter difference value of the global model updated in the current round and the previous round, obtaining a global updating vector of the current round; The server side is used for selecting the proportion of the clients participating in the next training based on the calculated utility score, and transmitting the Base parameters of the global model to the selected clients, wherein the Head part is reserved locally on the clients.
8. Computer device comprising a memory and one or more processors, characterized in that executable code is stored in the memory, which when executed by the processor is adapted to carry out the steps of the personalized federal learning method for fault diagnosis tasks of an oil and gas field device according to any of the preceding claims 1 to 6.
9. A computer readable storage medium having stored thereon a program, characterized by the steps of the personalized federal learning method for performing a fault diagnosis task for an oil and gas field device according to any of the preceding claims 1 to 6 when the program is executed by a processor.

Description

Personalized federal learning method for fault diagnosis task of oil and gas field equipment Technical Field The invention belongs to the technical field of fault diagnosis of oil and gas field equipment, and particularly relates to a personalized federal learning method for fault diagnosis tasks of oil and gas field equipment, in particular to privacy protection federal modeling for the fault diagnosis tasks of the oil and gas field equipment. Background With the deep integration of artificial intelligence and big data technology in the oil and gas industry, the construction of an intelligent oil field has become an important direction of industry development. The deep learning model has great potential in the fields of oil reservoir dynamic analysis, production condition diagnosis, equipment fault early warning and the like. However, the traditional centralized intelligent model needs to intensively upload mass production data distributed in different oil wells, sites or blocks to a cloud or a central server for training, which not only brings about huge network communication overhead, but also causes serious data security problems. Therefore, federal learning (FEDERATED LEARNING, FL) is generated as an emerging distributed machine learning framework, and allows each client to locally utilize own data to perform model training and only upload encrypted model parameters or gradients to aggregate, so that a global intelligent model with higher precision is cooperatively built on the premise of not sharing original data, and the safety and compliance of oilfield production data are ensured. While FL presents great potential in securing data, the current mainstream methods still face many challenges in complex and diverse hydrocarbon production environments. First, sensor state data generated during long-term operation of an oil and gas field production facility has significant Non-independent co-distribution (Non-IID) characteristics. The different oil and gas field blocks, production units or equipment have larger differences in equipment type, operation working condition, load condition, service life, maintenance strategy and the like, so that the local equipment state data collected by each client is inconsistent in distribution. Under the background, the unified global model is difficult to adapt to complex operation characteristics of different production equipment at the same time, so that the local diagnosis precision and the personalized adaptation capability of the model in the fault diagnosis task of the oil and gas field production equipment are affected. Secondly, in the existing majority of federal modeling methods, when selecting clients participating in joint training, a random selection or polling scheduling strategy is generally adopted, and the actual diagnostic value or information contribution degree of each device in the current training round cannot be fully considered. For example, production equipment in steady-state operation is significantly different from equipment that experiences abnormal vibrations, temperature fluctuations, or sudden pressure changes, and its corresponding sensor data contributes significantly to the optimization of the fault diagnosis model. The indiscriminate device selection mode is easy to cause waste of calculation and communication resources, reduces model training efficiency, and can influence the perception capability of the model on potential device faults due to failure in timely introducing key abnormal data. In addition, the existing and commonly used model aggregation algorithm (such as FedAvg) generally adopts an equal weight or a weighted mode based on the number of samples for the model update parameters uploaded by each client, and lacks comprehensive evaluation on the quality, running state and representativeness of the data of the equipment. When the sensors of part of oil and gas field production equipment fail, large noise exists in collected data, or the equipment is in atypical working conditions such as overhaul and shutdown, the corresponding abnormal model updating parameters easily interfere the global model, so that the performance of the fault diagnosis model is reduced, the convergence rate is lowered, and even the problem of unstable training occurs. In order to cope with the above problems, some improvements have been proposed by research, such as a model individuation method for single equipment, an equipment scheduling strategy based on operation conditions, and a robust aggregation mechanism for improving noise immunity of the model. However, most of the above approaches are improved against a single problem, and the device-level model personalization, client selection efficiency, and global model aggregation optimization have not been systematically fused. Therefore, in the technical field of oil and gas field production, a unified and efficient personalized federal learning framework is provided, and on the premise of guaran