CN-122021809-A - Federal feature selection method, device and storage medium based on evolutionary multitasking optimization

CN122021809ACN 122021809 ACN122021809 ACN 122021809ACN-122021809-A

Abstract

The application relates to the field of federal learning, and provides a federal feature selection method based on evolutionary multitasking optimization. The method comprises the steps that each client generates an initial feature selection population based on local data of the client, uploads a current optimal feature subset to a server, the server performs client layering and calculates similarity among tasks through cross evaluation based on the current optimal feature subset uploaded by each client, each client updates the local population based on layering strategies of the server, in the local optimization process, knowledge migration operation of similarity perception is executed according to the task similarity and migration strategies guided by the server, meanwhile population diversity is dynamically maintained, each client uploads an optimized result to the server, and the server screens global excellent solutions and updates system states for next iteration.

Inventors

LIANG JUNWEI
HUANG JIEHONG
YANG GENG
HUANG WEIPENG
CAI TIE
JIANG KAI

Assignees

深圳信息职业技术大学

Dates

Publication Date: 20260512
Application Date: 20260206

Claims (10)

1. The federal feature selection method based on evolutionary multitasking optimization is characterized in that a server cooperates with a plurality of clients to execute iterative optimization, and a single iteration comprises the following steps: Each client generates an initial feature selection population based on the local data of the client, and uploads a current optimal feature subset to the server; the server performs client layering and calculates the similarity between tasks through cross evaluation based on the current optimal feature subsets uploaded by the clients; each client updates a local population based on the layering strategy of the server, and in the local optimization process, performs knowledge migration operation of similarity perception according to the task similarity and migration strategy guided by the server, and dynamically maintains population diversity; and uploading the optimized result to the server by each client, and screening the global excellent solution and updating the system state by the server for the next iteration.
2. The evolutionarily multitasking optimization-based federal feature selection method of claim 1, wherein each client generates an initial feature selection population based on its local data comprising: calculating a Filter type first evaluation index and a Filter type second evaluation index of the local data characteristics; fusing the Filter type first evaluation index and the Filter type second evaluation index to obtain a feature importance score; and screening the important feature subset according to the feature importance scores, and guiding population initialization based on the subset, so that important features appear with higher probability in an initial solution.
3. The method for federal feature selection based on evolutionary multitasking optimization of claim 1, wherein the server performs client layering by cross-evaluation based on the current optimal feature subset uploaded by each client, comprising: the server organization client performs cross-validation on the optimal feature subsets of each other; ranking the performances of the clients according to the average verification accuracy; The highest ranked client is determined to be the optimal client and the remaining clients are divided into a high performance group, a medium performance group, and a low performance group.
4. The evolutionarily multitasking optimization-based federal feature selection method of claim 3, wherein each client updates a local population based on the server's hierarchical policy comprising: Mixing and initializing a high-performance group client by adopting a local history optimal solution and a global history optimal solution; the method comprises the steps that a local historical optimal solution and a global historical optimal solution of a medium performance group client are adopted to carry out mixed initialization on the medium performance group client and similar partners; and initializing the low-performance group client by adopting a mode of combining global history optimal solution guidance and random exploration.
5. The method for federal feature selection based on evolutionary multitasking optimization of claim 1, wherein performing similarity-aware knowledge migration operations from the server-directed task similarity and migration policies comprises: setting an adaptive migration probability for each client; extracting feature selection information of the shared feature domain from the selected migration partner according to the adaptive migration probability; And taking the task similarity as a weight, and integrating the feature selection information into an optimized individual of the current client.
6. The method for federal feature selection based on evolutionary multitasking optimization of claim 5, wherein the adaptive migration probability dynamically adjusts the effectiveness of individual quality improvement based on knowledge migration in historical iterations.
7. The method for federal feature selection based on evolutionary multitasking optimization of claim 1, wherein dynamically maintaining population diversity comprises: Calculating the dissimilarity among individuals in the population; Dynamically adjusting a dissimilarity threshold according to the statistical features of the feature quantity selected by the current population; removing redundant solutions with dissimilarity between individuals of the population below the dissimilarity threshold and supplementing new individuals.
8. A federal feature selection apparatus based on evolutionary multitasking optimization, wherein iterative optimization is performed by a server in cooperation with a plurality of clients, a single iteration being performed by the apparatus comprising: The initialization module is used for generating an initial feature selection population by each client based on the local data of the client, and uploading the current optimal feature subset to the server; the layering module is used for layering the clients and calculating the similarity among tasks through cross evaluation based on the current optimal feature subsets uploaded by the clients by the server; the optimization module is used for updating the local population by each client based on the layering strategy of the server, executing knowledge migration operation of similarity perception according to the task similarity and the migration strategy guided by the server in the local optimization process, and dynamically maintaining population diversity; and the updating module is used for uploading the optimized result to the server by each client, and the server screens the global excellent solution and updates the system state for the next iteration.
9. An apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method according to any one of claims 1 to 7 when the computer program is executed by the processor.
10. A storage medium storing a computer program which, when executed by a processor, implements the steps of the method according to any one of claims 1 to 7.

Description

Federal feature selection method, device and storage medium based on evolutionary multitasking optimization Technical Field The application relates to the field of federal learning, in particular to a federal feature selection method, a federal feature selection device and a federal feature selection storage medium based on evolutionary multitasking optimization. Background With the advent of the big data age, data is becoming increasingly valuable as a key element driving the development of artificial intelligence. However, due to privacy regulations and data security requirements, data often is dispersed in "islands" in different institutions or terminal equipment, making it difficult to directly focus on model training. Federal learning (FEDERATED LEARNING, FL) serves as an emerging distributed machine learning paradigm, allowing multiple participants to co-train a global model without exchanging raw local data, thereby achieving knowledge sharing and model performance improvement while protecting data privacy. Feature selection (Feature Selection) is an important preprocessing technique in machine learning, and the purpose of the feature selection is to select a feature subset with low redundancy, which significantly contributes to model performance, from the original feature set. In the federal learning scenario, how to realize efficient, accurate and privacy-preserving joint feature selection under the condition that data of each client is not independently distributed in the same way (non-IID) has become a key challenge. The prior art scheme mainly can be divided into two types, namely federal feature selection based on a filtering method or an embedded method, the core idea is that each client side independently calculates feature importance, and then a server side carries out simple aggregation, the method has low communication expense, but ignores the correlation of tasks among the client sides, and under a non-IID scene, a global feature subset obtained by simple aggregation can not adapt to local data distribution of all the client sides, so that the performance of a model is reduced. Secondly, optimizing algorithms such as evolutionary computation and the like are introduced into a federal framework to model the optimizing algorithms as a distributed optimizing problem, but the existing method generally regards the feature selection of each client as an independent or homogenous task, adopts unified initialization, updating and aggregation strategies, lacks personalized guidance for data distribution heterogeneity of different clients, and also fails to effectively utilize potential similarity among tasks to conduct guided knowledge migration. The optimization process is low in efficiency and easy to fall into local optimum, and the generalization capability of each local model under the self data distribution can not be improved by fully utilizing the advantages of federal collaboration. The method has the following defects that 1) effective measurement and utilization of task similarity between clients are lacked, a knowledge sharing mechanism is stiff, efficient cross-task knowledge migration is difficult to achieve, the optimization effect is poor in a non-IID scene, 2) a 'one-tool' strategy is adopted for the clients in the optimization process, differentiated guidance cannot be carried out according to model performance and data characteristics of the clients, and therefore partial clients are slow in convergence or poor in effect and overall cooperative efficiency is low. Disclosure of Invention The application provides a federal feature selection method, a federal feature selection device and a federal feature selection storage medium based on evolution multitask optimization, which effectively improve generalization capability and optimization efficiency of federal feature selection under non-independent co-distributed data through a co-evolution and layered knowledge migration mechanism. In one aspect, the application provides a federal feature selection method based on evolutionary multitasking optimization, wherein a server and a plurality of clients cooperatively execute iterative optimization, and a single iteration comprises the following steps: Each client generates an initial feature selection population based on the local data of the client, and uploads a current optimal feature subset to the server; the server performs client layering and calculates the similarity between tasks through cross evaluation based on the current optimal feature subsets uploaded by the clients; each client updates a local population based on the layering strategy of the server, and in the local optimization process, performs knowledge migration operation of similarity perception according to the task similarity and migration strategy guided by the server, and dynamically maintains population diversity; and uploading the optimized result to the server by each client, and screening the global excellen