CN-121981212-A - Prostate cancer multi-modal data federal learning method and system

CN121981212ACN 121981212 ACN121981212 ACN 121981212ACN-121981212-A

Abstract

The invention provides a multi-modal data federal learning method and system for prostate cancer, which relate to the technical field of intelligent medical treatment and distributed artificial intelligence intersection, wherein the method comprises the steps of selecting a plurality of key characteristic areas in a characteristic space as reference standard based on updated global characteristic parameters to construct a characteristic distribution topological structure; the method comprises the steps of carrying out partition division on a characteristic distribution topological structure to generate a plurality of characteristic subspaces, calculating a spatial characteristic index according to characteristic density and distribution characteristics of each characteristic subspace, generating a weight adjustment coefficient based on the spatial characteristic index, and carrying out dynamic optimization on an updated global characteristic parameter by using the weight adjustment coefficient to obtain an optimized federal learning model parameter. According to the invention, a complete prostate cancer multi-modal data federal learning framework is constructed through multi-modal data feature alignment, gradient compression transmission, global gradient aggregation and parameter dynamic optimization, and the model performance and cross-mechanism cooperative efficiency are improved.

Inventors

CHEN CUIYING
LI CHENG
XU LEI
SUN CHENG

Assignees

先思达（南京）生物科技有限公司
江苏先思达生物科技有限公司

Dates

Publication Date: 20260505
Application Date: 20260407

Claims (10)

1. A method of federally learning multi-modal data for prostate cancer, the method comprising: Step 100, carrying out feature extraction and isomerism alignment processing on image data, pathological data, clinical data and gene data, and mapping data of different modes into a unified feature space to obtain an aligned multi-mode feature vector; step 200, performing local training based on the aligned multi-mode feature vectors to generate model gradient information, performing importance evaluation on the model gradient information to obtain priority weights, and realizing gradient compression by adopting a layered quantization strategy according to the priority weights to obtain a compressed gradient data packet; Step 300, decompressing and reconstructing the compressed gradient data packet to obtain reconstructed gradient information, calculating node data weight based on the reconstructed gradient information, and carrying out weighted aggregation on the node data weight and the reconstructed gradient information to generate updated global characteristic parameters; Step 400, selecting a plurality of key feature areas as reference standard in a feature space based on the updated global feature parameters to construct a feature distribution topological structure, partitioning the feature distribution topological structure to generate a plurality of feature subspaces, calculating a space feature index according to the feature density and the distribution feature of each feature subspace, generating a weight adjustment coefficient based on the space feature index, and dynamically optimizing the updated global feature parameters by using the weight adjustment coefficient to obtain optimized federal learning model parameters.
2. The method for federally learning multi-modal data for prostate cancer according to claim 1, wherein said step 100 comprises: extracting the tissue structure characteristics of the pathological data to obtain pathological characteristics containing dimension information; Based on the dimensionality characteristics of the image characteristics and the pathological characteristics, clinical data are converted into clinical feature vectors through single-heat coding and standardization processing, and gene data are converted into gene feature vectors through sequence coding and dimensionality reduction; The method comprises the steps of respectively carrying out dimension reduction treatment on image features, pathological features, clinical feature vectors and gene feature vectors through a principal component analysis algorithm, unifying the dimensions of the image features, the pathological features, the clinical feature vectors and the gene feature vectors to preset common dimensions to obtain dimension-reduced standardized features, calculating correlation weights among the image features, the pathological features, the clinical feature vectors and the gene feature vectors through typical correlation analysis based on the dimension-reduced standardized features, and carrying out weighting splicing on the dimension-reduced standardized features by applying the correlation weights to obtain fusion feature vectors; and carrying out standardization processing and dimension unification on the fusion feature vectors, and obtaining the aligned multi-modal feature vectors through feature weighted fusion.
3. The method of federally learning multi-modality data for prostate cancer according to claim 2, wherein said step 200 comprises: Performing forward propagation calculation based on the aligned multi-mode feature vectors to obtain a forward reasoning result; calculating a forward reasoning result and a loss function gradient of the local label data through a back propagation algorithm to obtain complete model gradient information; Carrying out gradient amplitude evaluation on the model gradient information, and calculating the model length value of each gradient component; the importance of the gradient components is ordered based on the magnitude of the modular length value of each gradient component, and the priority weight of the gradient importance degree is obtained; Dividing gradient components into gradient sets with different importance levels according to priority weights of gradient importance levels, respectively carrying out quantization treatment on the gradient sets with different importance levels by adopting numerical value representation methods with different accuracies, and finishing layered compression of gradients to obtain layered compressed gradient data; performing entropy coding optimization on the gradient data after layered compression to obtain optimized coding data for eliminating statistical redundancy; performing self-adaptive partitioning processing on the optimized coded data to obtain standardized data fragments; and combining and packaging the standardized data fragments and the transmission control signaling to obtain a compression gradient data packet.
4. The method for federally learning multi-modal data for prostate cancer of claim 3, wherein the step 300 comprises: carrying out protocol analysis on the compressed gradient data packet, and separating transmission control signaling and standardized data fragments; performing sequential recombination on the data fragments based on the sequence information in the transmission control signaling to reconstruct a complete entropy coding data stream; performing Huffman decoding processing on the entropy coding data stream to obtain intermediate structure data before statistical compression; and based on the numerical distribution characteristics of the intermediate structure data, performing high-precision numerical reconstruction on the high-importance gradient component by combining with a layered quantization strategy, performing standard-precision numerical reconstruction on the low-importance gradient component, and completing the precision recovery of the gradient numerical value to obtain the reconstructed gradient information.
5. The method of federally learning multi-modality data for prostate cancer of claim 4, wherein said step 300 further comprises: calculating the overall norm of a gradient matrix formed by all gradient data of each node based on the reconstructed gradient information; evaluating the overall amplitude of gradient update of each node according to the overall norm of the gradient matrix, and carrying out normalized weighted calculation on the overall amplitude and the number of data samples of each node to obtain a node data weight coefficient; Inputting the node data weight coefficient and the reconstructed gradient information into a weighted average calculation process, and obtaining an aggregated global gradient update amount by carrying out weighted superposition on the node gradient information according to the data weight coefficient; based on the aggregated global gradient updating quantity, the current global characteristic parameters are adjusted through parameter updating operation, and updated global characteristic parameters are obtained.
6. The method of federally learning multi-modality data for prostate cancer according to claim 5, wherein said step 400 comprises: performing density-based clustering analysis on the feature vector set, and identifying density peak points as initial clustering centers by calculating local density distribution of sample points in a feature space; Based on the initial clustering center, carrying out clustering division on the feature vectors through an iterative optimization algorithm to form a plurality of high-density feature areas; constructing a k-d tree space index structure covering a characteristic space by taking the key characteristic region reference standard as a space anchor point; performing nearest neighbor search based on the k-d tree space index structure, and distributing nearest space anchor points for each feature point to form initial Veno region division; Performing optimization processing based on boundary smoothing on the initial Veno region division, and reducing irregular geometric shapes by adjusting region boundaries to form a feature distribution topological structure with regular boundaries; Based on the feature distribution topological structure with regular boundaries, the adjacent Veno regions are combined according to the spatial proximity and feature similarity, so that the whole feature space is partitioned, and a plurality of feature subspaces are obtained.
7. The method of federally learning multi-modality data for prostate cancer according to claim 6, wherein said step 400 further comprises: Calculating the average distance from all feature points in the current feature subspace to the mass center of the subspace as a distribution dispersion index; Respectively carrying out normalization treatment on the characteristic density value and the distribution dispersion index to obtain a normalized characteristic density value and a normalized distribution dispersion index; The normalized characteristic density value and the distribution dispersion index are weighted and summed according to a preset weight coefficient to obtain a comprehensive space characteristic index of each characteristic subspace; And carrying out weighted adjustment on the updated global characteristic parameters based on the weight adjustment coefficients, completing dynamic parameter optimization through matrix multiplication operation, and finally obtaining optimized federal learning model parameters.
8. A prostate cancer multimodal data federal learning system implementing the method of any one of claims 1 to 7, comprising: the alignment module is used for carrying out feature extraction and isomerism alignment processing on the image data, the pathological data, the clinical data and the gene data, and mapping the data of different modes into a unified feature space to obtain an aligned multi-mode feature vector; the gradient compression module is used for carrying out local training based on the aligned multi-mode feature vectors to generate model gradient information, carrying out importance evaluation on the model gradient information to obtain priority weight, and adopting a layered quantization strategy to realize gradient compression according to the priority weight to obtain a compressed gradient data packet; The updating module is used for decompressing the compressed gradient data packet and reconstructing the structure to obtain reconstructed gradient information, calculating node data weight based on the reconstructed gradient information, and carrying out weighted aggregation on the node data weight and the reconstructed gradient information to generate updated global characteristic parameters; The optimizing module is used for selecting a plurality of key feature areas from a feature space as reference standard based on the updated global feature parameters to construct a feature distribution topological structure, partitioning the feature distribution topological structure to generate a plurality of feature subspaces, calculating a space feature index according to feature density and distribution characteristics of each feature subspace, generating a weight adjustment coefficient based on the space feature index, and dynamically optimizing the updated global feature parameters by using the weight adjustment coefficient to obtain optimized federal learning model parameters.
9. A computing device, comprising: One or more processors; Storage means for storing one or more programs which when executed by the one or more processors cause the one or more processors to implement the method of any of claims 1to 7.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a program which, when executed by a processor, implements the method according to any of claims 1 to 7.

Description

Prostate cancer multi-modal data federal learning method and system Technical Field The invention relates to the technical field of intelligent medical treatment and distributed artificial intelligence intersection, in particular to a multi-mode data federal learning method and system for prostate cancer. Background Accurate diagnosis and treatment of prostate cancer relies on comprehensive analysis of multimodal data, including medical imaging, pathological sections, clinical indices and genomic data. Federal learning is used as a distributed machine learning technology, and can coordinate a plurality of medical institutions to jointly model on the premise of guaranteeing data privacy, so that a feasible path is provided for solving the problem of medical data island. However, in the practice of multi-modal data federal learning, the data processing link has several key drawbacks, which restrict further improvement of the model performance. Firstly, in a characteristic alignment stage, a traditional method, such as typical correlation analysis, can only capture linear correlation among modes, is difficult to fully model complex nonlinear relation between image texture characteristics and gene expression spectrums, can cause alignment deviation of characteristic space and continuously accumulate in an iterative process, secondly, in a characteristic space topological structure construction process, a space division-based method is sensitive to characteristic density change, geometric distortion is easily generated in a high-low density transition region, characteristic subspace boundary inaccuracy can be caused, so that discrimination capability of continuous pathological characteristic change is affected, and in a parameter dynamic optimization stage, the granularity of a weight adjustment strategy based on integral statistical characteristics of the characteristic subspace is insufficient, cannot adapt to the highly heterogeneous characteristics of prostate cancer, molecular subtype differences in the same grading are ignored, and generalization capability of the model on complex cases can be limited. Disclosure of Invention The invention aims to solve the technical problem of providing a multi-modal data federation learning method and system for prostate cancer, which are used for constructing a complete multi-modal data federation learning framework for prostate cancer by means of multi-modal data feature alignment, gradient compression transmission, global gradient aggregation and parameter dynamic optimization, and improving model performance and cross-mechanism collaborative efficiency. In order to solve the technical problems, the technical scheme of the invention is as follows: In a first aspect, a method of federally learning multi-modal data for prostate cancer, the method comprising: Feature extraction and isomerism alignment processing are carried out on the image data, the pathological data, the clinical data and the gene data, and the data of different modes are mapped into a unified feature space to obtain an aligned multi-mode feature vector; Performing local training based on the aligned multi-modal feature vectors to generate model gradient information, performing importance evaluation on the model gradient information to obtain priority weights, and realizing gradient compression by adopting a layered quantization strategy according to the priority weights to obtain a compressed gradient data packet; calculating node data weight based on the reconstructed gradient information, and carrying out weighted aggregation on the node data weight and the reconstructed gradient information to generate updated global characteristic parameters; Based on the updated global characteristic parameters, selecting a plurality of key characteristic areas from a characteristic space as a reference standard to construct a characteristic distribution topological structure, partitioning the characteristic distribution topological structure to generate a plurality of characteristic subspaces, calculating a space characteristic index according to the characteristic density and the distribution characteristic of each characteristic subspace, generating a weight adjustment coefficient based on the space characteristic index, and dynamically optimizing the updated global characteristic parameters by using the weight adjustment coefficient to obtain optimized federal learning model parameters. In a second aspect, a prostate cancer multimodal data federal learning system comprising: the alignment module is used for carrying out feature extraction and isomerism alignment processing on the image data, the pathological data, the clinical data and the gene data, and mapping the data of different modes into a unified feature space to obtain an aligned multi-mode feature vector; the gradient compression module is used for carrying out local training based on the aligned multi-mode feature vectors to generate model gradient