CN-122020724-A - Multi-mode biomedical data safety fusion query treatment method and system

CN122020724ACN 122020724 ACN122020724 ACN 122020724ACN-122020724-A

Abstract

The application discloses a multimode biomedical data safety fusion inquiry treatment method and a system, wherein the method comprises the steps of receiving a fusion inquiry request; the method comprises the steps of verifying validity through a blockchain intelligent contract, decomposing a multi-mode metadata ontology model into sub-query tasks and distributing the sub-query tasks, generating an intermediate result marked by a unified pseudonym identifier in a local privacy protection computing environment by each data holder node, executing privacy protection data alignment and aggregation operation, returning a fusion result and recording key events in the blockchain intelligent contract. The application realizes the integrated management of invisible data, flexible inquiry, auditable process and excitable contribution, and is suitable for the safe collaborative analysis of multi-mode biomedical data such as genome, images, electronic medical records and the like.

Inventors

LIU YAO
LIAO YING
YANG ZAILIN
QI JIANGTAO

Assignees

重庆华信英翡智能科技研究院有限公司

Dates

Publication Date: 20260512
Application Date: 20260416

Claims (7)

1. A method for secure fusion query management of multimodal biomedical data, the method comprising: Receiving a fusion query request from a querying party, wherein the fusion query request is associated with at least two heterogeneous biomedical data modalities; verifying the validity of the fusion query request through a blockchain intelligent contract; Analyzing and decomposing the fusion query request passing through the validity verification into a plurality of sub-query tasks based on a preset multi-mode metadata ontology model, and distributing the sub-query tasks to corresponding data holder nodes; Each data holder node performs a corresponding sub-query task based on the original biomedical data stored locally in the local privacy-preserving computing environment thereof, generates an intermediate result identified by a uniform pseudonym identifier, and the original biomedical data does not leave the local storage position thereof; performing privacy-protected data alignment and aggregation operation on the intermediate result to obtain a fusion query result; Returning the fusion query result to the query party, and recording key events in the query execution process in the blockchain intelligent contract to realize verifiable compliance management; Wherein, the The analyzing and decomposing the fusion query request passing the validity verification into a plurality of sub-query tasks based on a preset multi-mode metadata ontology model comprises the following steps: identifying each data modality field involved in the fusion query request, and determining a logic relationship between each data modality field; According to the multi-mode metadata ontology model, mapping each data mode field to a data resource descriptor registered by a corresponding data holder node; based on the logic relation and the mapping result, generating a structured sub-query task for each target node, and attaching a unified pseudonym identifier generation rule; Each data holder node performs a corresponding sub-query task based on the original biomedical data stored locally in the local privacy protection computing environment, and the sub-query task comprises the following steps: each data holder node loads a sub-query script subjected to signature verification in a local privacy protection computing environment, wherein the privacy protection computing environment is a hardware-based trusted execution environment or a cryptography-based encryption computing environment; Reading corresponding original biomedical data from a local database of each data holder node, and executing query logic corresponding to the sub-query script in the privacy-preserving computing environment; applying a unified deterministic hash function to the patient identification in the query result to generate a unified pseudonym identifier; Outputting an intermediate result comprising only the unified pseudonym identifier and the query field, and clearing the original biomedical data in the privacy-preserving computing environment; The data alignment and aggregation operation for performing privacy protection on the intermediate result includes: each data holder node extracts a unified pseudonym identifier from the generated intermediate result to form respective unified pseudonym identifier sets; Based on respective unified kana identifier sets, each data holder node cooperatively executes a privacy set intersection protocol to determine a common kana identifier subset corresponding to a cross-modal common patient, wherein the privacy set intersection protocol is realized through secure multi-party calculation or homomorphic encryption; Based on the common pseudonym identifier subset, each data holder node extracts a query field value corresponding to the common pseudonym identifier subset in a local privacy protection computing environment, and performs statistical aggregation or desensitization processing on the query field value through secure multiparty computation or homomorphic encryption to generate a fusion query result; The privacy set exchange protocol and the aggregation process are completed on the premise that original biomedical data, real identity information of a patient or private data of other nodes are not exposed to plaintext of any participant or third party.
2. The method for managing multi-modal biomedical data security fusion query according to claim 1, wherein the verifying the validity of the fusion query request by a blockchain smart contract comprises: Reading a preset access control strategy by the blockchain intelligent contract, wherein the access control strategy comprises at least one of an inquirer identity, a data modality type, a scientific research purpose and a patient informed consent state; checking whether the converged query request meets the access control policy; If yes, judging that the fusion query request is legal, allowing the fusion query request to enter a subsequent analysis flow, otherwise, rejecting the fusion query request and recording an audit log.
3. The method of claim 1, wherein recording key events of the query execution process in the blockchain smart contract comprises: After the fusion query is completed, each participating data holder node submits an operation log abstract to the blockchain intelligent contract, wherein the operation log abstract comprises a task receiving time, sub-query execution time consumption and a hash value of an intermediate result; The blockchain intelligent contract automatically generates a query audit record based on the received operation log abstract and stores the query audit record on the blockchain, wherein the query audit record comprises a query ID, a participation node list, a hash value of a fusion query result, a time stamp and contribution scores of all nodes; The blockchain intelligent contract further automatically executes excitation distribution logic according to a preset contribution degree model, and distributes platform digital equity rewards corresponding to the contribution degree to each data holder node, wherein the platform digital equity rewards are used for representing data contribution values and can be used for resource exchange or treatment authority acquisition of subsequent query services.
4. A method of multimodal biomedical data security fusion query governance according to any of claims 1 to 3 and also comprising: If any data holder node does not return an intermediate result within a preset time limit, triggering a part of fusion flow by the inquirer or the blockchain intelligent contract, executing privacy-protected data alignment and aggregation operation only based on the returned intermediate result, and marking a missing data mode type and a corresponding data coverage rate in the fusion inquiry result; And/or When the privacy protection computing environment or sub-query task of any data holder node is detected to fail the security verification, the node automatically stops the sub-query task, clears the generated intermediate result in the local privacy protection computing environment, and submits the failure of the security verification as a key event to the blockchain intelligent contract for recording.
5. A multimodal biomedical data security fusion query governance system, the system comprising: The query receiving module is used for receiving a fusion query request from a query party, wherein the fusion query request is associated with at least two heterogeneous biomedical data modalities; the validity verification module is used for verifying the validity of the fusion inquiry request through a blockchain intelligent contract; The task analysis and distribution module is used for analyzing and decomposing the fusion query request passing through the validity verification into a plurality of sub-query tasks based on a preset multi-mode metadata ontology model and distributing the sub-query tasks to corresponding data holder nodes; The intermediate result generation module is deployed at each data holder node and is used for executing corresponding sub-query tasks based on the locally stored original biomedical data in the local privacy protection computing environment of the data holder node to generate an intermediate result marked by a unified pseudonym identifier, and the original biomedical data does not leave the local storage position of the data holder node; the data fusion module is used for executing privacy-protected data alignment and aggregation operation on the intermediate result to obtain a fusion query result; the result returning and uplink module is used for returning the fusion query result to the query party and recording key events in the query execution process in the blockchain intelligent contract so as to realize verifiable compliance management; Wherein, the The analyzing and decomposing the fusion query request passing the validity verification into a plurality of sub-query tasks based on a preset multi-mode metadata ontology model comprises the following steps: identifying each data modality field involved in the fusion query request, and determining a logic relationship between each data modality field; According to the multi-mode metadata ontology model, mapping each data mode field to a data resource descriptor registered by a corresponding data holder node; based on the logic relation and the mapping result, generating a structured sub-query task for each target node, and attaching a unified pseudonym identifier generation rule; Each data holder node performs a corresponding sub-query task based on the original biomedical data stored locally in the local privacy protection computing environment, and the sub-query task comprises the following steps: each data holder node loads a sub-query script subjected to signature verification in a local privacy protection computing environment, wherein the privacy protection computing environment is a hardware-based trusted execution environment or a cryptography-based encryption computing environment; Reading corresponding original biomedical data from a local database of each data holder node, and executing query logic corresponding to the sub-query script in the privacy-preserving computing environment; applying a unified deterministic hash function to the patient identification in the query result to generate a unified pseudonym identifier; Outputting an intermediate result comprising only the unified pseudonym identifier and the query field, and clearing the original biomedical data in the privacy-preserving computing environment; The data alignment and aggregation operation for performing privacy protection on the intermediate result includes: each data holder node extracts a unified pseudonym identifier from the generated intermediate result to form respective unified pseudonym identifier sets; Based on respective unified kana identifier sets, each data holder node cooperatively executes a privacy set intersection protocol to determine a common kana identifier subset corresponding to a cross-modal common patient, wherein the privacy set intersection protocol is realized through secure multi-party calculation or homomorphic encryption; Based on the common pseudonym identifier subset, each data holder node extracts a query field value corresponding to the common pseudonym identifier subset in a local privacy protection computing environment, and performs statistical aggregation or desensitization processing on the query field value through secure multiparty computation or homomorphic encryption to generate a fusion query result; The privacy set exchange protocol and the aggregation process are completed on the premise that original biomedical data, real identity information of a patient or private data of other nodes are not exposed to plaintext of any participant or third party.
6. An electronic device comprising a memory and a processor, wherein the memory has a computer program stored thereon, and the processor, when executing the computer program, implements the multimodal biomedical data security fusion query governance method of any of claims 1-4.
7. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the multimodal biomedical data security fusion query governance method of any of claims 1-4.

Description

Multi-mode biomedical data safety fusion query treatment method and system Technical Field The application belongs to the technical field of crossing biomedical big data and privacy calculation, and particularly relates to a platform treatment method, a system, electronic equipment and a storage medium for realizing cross-source and fusion inquiry of multi-mode biomedical data such as genome, image, electronic medical record and the like on the premise of ensuring data privacy and safety. Background Currently, biomedical research has fully entered into the multi-modal era, and it is difficult for a single data dimension (such as genome only or image only) to reveal the intrinsic mechanism of complex diseases, and there is a need for joint analysis of heterogeneous data such as gene mutation, medical image (such as CT/MRI), electronic medical record, pathological report, etc. However, these data are widely dispersed throughout medical institutions and research institutions, forming serious "data islands" that prevent cross-domain collaborative research. The existing data sharing mode mainly depends on data copying or a centralized data warehouse is constructed, but the method has obvious privacy leakage risk, and even though the method is subjected to desensitization treatment, patient re-identification can be realized through cross comparison, and the increasingly strict laws and regulations of GDPR (GENERAL DATA Protection Regulation, general data protection regulations), HIPAA (Health Insurance Portability and Accountability Act, health insurance portability and responsibility act) and the like in China are violated. In order to cope with privacy challenges, the industry tries multiple technical paths, but has obvious limitations that firstly, a simple encryption query scheme usually needs to decrypt data before calculation and cannot radically stop leakage risk, and secondly, a privacy calculation method represented by federal learning can realize 'data immobility model movement', but the design is focused on model collaborative training for the first time, and flexible and impromptu multidimensional fusion query is difficult to support (for example, a patient with pathological diagnosis of diffuse large B cell lymphoma, immunohistochemical result prompt MYC and BCL2 double expression is found, and PET/CT evaluation shows that a patient with high metabolic lesion exists). More importantly, most of the current schemes of heavy technology and light management lack a unified, automatic and credible technology management framework for the whole life cycle (including authorization, access, calculation and audit) of data, and can not realize verifiable, traceable and sustainable excitation of query behaviors. Therefore, an innovative treatment method for a vertical large model platform is needed, which can truly realize the invisible data availability at the technical bottom layer, support complex and flexible multi-mode fusion inquiry, and embed a regular, transparent and automatic treatment mechanism. Disclosure of Invention The application aims to provide a multi-mode biomedical data safety fusion query treatment method and system, which solve the problems of data island, privacy disclosure, insufficient query flexibility, lack of treatment mechanism and the like in the prior art, and realize the integrated treatment targets of no data output domain, availability invisible, auditable process and stimulated contribution. The first object of the application is to provide a multi-mode biomedical data security fusion query management method. The first object of the present application is achieved by the following technical solutions: a multi-modal biomedical data security fusion query governance method, the method comprising: Receiving a fusion query request from a querying party, wherein the fusion query request is associated with at least two heterogeneous biomedical data modalities; verifying the validity of the fusion query request through a blockchain intelligent contract; Analyzing and decomposing the fusion query request passing through the validity verification into a plurality of sub-query tasks based on a preset multi-mode metadata ontology model, and distributing the sub-query tasks to corresponding data holder nodes; Each data holder node performs a corresponding sub-query task based on the original biomedical data stored locally in the local privacy-preserving computing environment thereof, generates an intermediate result identified by a uniform pseudonym identifier, and the original biomedical data does not leave the local storage position thereof; performing privacy-protected data alignment and aggregation operation on the intermediate result to obtain a fusion query result; and returning the fusion query result to the query party, and recording key events in the query execution process in the blockchain intelligent contract to realize verifiable compliance management. Preferably, the verifying