Search

US-20260128129-A1 - METHOD FOR CONSTRUCTING DRUG RESISTANCE DATABASE, DRUG RESISTANCE TESTING METHOD, APPARATUS, AND DEVICE

US20260128129A1US 20260128129 A1US20260128129 A1US 20260128129A1US-20260128129-A1

Abstract

Disclosed are a method and apparatus for constructing a drug resistance database, a drug resistance testing method and apparatus, and a device. The method comprises: acquiring a preset classification object set of a target drug corresponding to each of at least one classification object dimension; acquiring, for each classification object dimension, an object feature set of the target drug separately corresponding to each preset classification object in the preset classification object set of the classification object dimension; screening, based on an object weight and a sample mutation vector set in each object feature set, the preset classification object set, to obtain a target classification object set; and based on at least one target classification object set, constructing in a drug resistance database a standard drug resistance mutation information set corresponding to the target drug, wherein each classification object dimension comprises a gene dimension and/or a mutation site dimension. Embodiments of the present invention improve the efficiency and accuracy of constructing a drug resistance database.

Inventors

  • Yanxiang Chen
  • Letian ZHOU

Assignees

  • GENEMIND BIOSCIENCES COMPANY LIMITED

Dates

Publication Date
20260507
Application Date
20251231
Priority Date
20240301

Claims (17)

  1. 1 . A method for constructing a drug resistance database, characterized by comprising: acquiring a preset classification object set of a target drug corresponding to each of at least one classification object dimension; acquiring, for each classification object dimension, an object feature set of the target drug separately corresponding to each preset classification object in the preset classification object set of the classification object dimension; screening, based on an object weight and a sample mutation vector set in each object feature set, the preset classification object set, to obtain a target classification object set; and based on at least one target classification object set, constructing in a drug resistance database a standard drug resistance mutation information set corresponding to the target drug, wherein each classification object dimension comprises a gene dimension and/or a mutation site dimension, the sample mutation vector set comprises at least two object mutation vectors, each object mutation vector comprises at least one mutation site identifier of a strain sample corresponding to the preset classification object, and the mutation site identifier represents whether reference drug resistance mutation information in a reference drug resistance mutation information set exists in a sample mutation information set corresponding to the strain sample.
  2. 2 . The method according to claim 1 , wherein the screening, based on an object weight and a sample mutation vector set in each object feature set, the preset classification object set, to obtain a target classification object set, comprises: adding a preset classification object having the greatest object weight in the preset classification object set to a reference classification object set as a current classification object, and removing the current classification object from the preset classification object set; training, based on a sample mutation vector set corresponding to each of at least one preset classification object in the reference classification object set, a first initial model to obtain a current first target model; acquiring previous first classification performance of a previous first target model in a previous iteration step; when current first classification performance of the current first target model is better than the previous first classification performance, iteratively performing the step of adding a preset classification object having the greatest object weight in the preset classification object set to the reference classification object set as a current classification object; and using the reference classification object set as the target classification object set until the preset classification object set is empty.
  3. 3 . The method according to claim 2 , wherein the screening, based on an object weight and a sample mutation vector set in each object feature set, the preset classification object set to obtain a target classification object set further comprises: removing the current classification object from the reference classification object set when the current first classification performance of the current first target model is not better than the previous first classification performance; and iteratively performing the step of adding a preset classification object having the greatest object weight in the preset classification object set to the reference classification object set as a current classification object.
  4. 4 . The method according to claim 1 , wherein the screening, based on an object weight and a sample mutation vector set in each object feature set, the preset classification object set to obtain a target classification object set comprises: adding a preset classification object having the greatest object weight in the preset classification object set to a current screened classification object set; adding at least one preset classification object that does not exist in the current screened classification object set to the current screened classification object set, to obtain at least one current reference classification object set; and screening, based on each current reference classification object set and at least two sample mutation vector sets, the preset classification object set, to obtain the target classification object set.
  5. 5 . The method according to claim 4 , wherein the screening, based on each current reference classification object set and at least two sample mutation vector sets, the preset classification object set to obtain the target classification object set comprises: training, for each current reference classification object set, based on a sample mutation vector set corresponding to each of at least one preset classification object in the current reference classification object set, a second initial model to obtain a current second target model; acquiring previous second classification performance of a previous second target model corresponding to a previous screened classification object set in a previous iteration step; determining, based on the previous second classification performance and current second classification performance separately corresponding to at least one current second target model, a next screened classification object set, and using the next screened classification object set as the current screened classification object set; iteratively performing the step of adding at least one preset classification object that does not exist in the current screened classification object set to the current screened classification object set, to obtain at least one current reference classification object set; and using the current screened classification object set as the target classification object set until each current second classification performance is not better than the previous second classification performance.
  6. 6 . The method according to claim 1 , wherein when the classification object dimension is the gene dimension, the target classification object set is a target drug resistance gene set; accordingly, the acquiring a preset classification object set of a target drug corresponding to each of at least one classification object dimension comprises: When each classification object dimension comprises the gene dimension and the mutation site dimension, based on the target drug resistance gene set corresponding to the gene dimension, performing a filtering operation on the reference drug resistance mutation information set to obtain a preset drug resistance mutation information set of the target drug corresponding to the mutation site dimension.
  7. 7 . The method according to claim 1 , wherein, when the classification object dimension is the gene dimension, the target classification object set is a target drug resistance gene set, and when the classification object dimension is the mutation site dimension, the target classification object set is a target drug resistance mutation information set; accordingly, the constructing in a drug resistance database, based on at least one target classification object set, a standard drug resistance mutation information set corresponding to the target drug comprises: acquiring, when each classification object dimension comprises a gene dimension and a mutation site dimension having a parallel relationship, for each target drug resistance gene in the target drug resistance gene set corresponding to the gene dimension, a gene drug resistance mutation information set consisting of at least one piece of target drug resistance mutation information corresponding to the target drug resistance gene in the reference drug resistance mutation information set; and based on at least one gene drug resistance mutation information set and the target drug resistance mutation information set corresponding to the mutation site dimension, constructing in the drug resistance database the standard drug resistance mutation information set corresponding to the target drug.
  8. 8 . The method according to claim 1 , wherein the method further comprises: acquiring sample nucleic acid sequence data separately corresponding to each strain sample; performing, for each strain sample, a mutation processing operation on the sample nucleic acid sequence data corresponding to the strain sample, to obtain a sample mutation information set of the strain sample; screening, based on a drug resistance label separately corresponding to each strain sample, each sample mutation information set, to obtain at least two initial drug resistance mutation information sets; and sequentially performing a union operation and a gene filtering operation on each initial drug resistance mutation information set, to obtain the reference drug resistance mutation information set corresponding to the target drug.
  9. 9 . The method according to claim 1 , wherein, when the classification object dimension is the gene dimension, the preset classification object set is a preset drug resistance gene set, and the object weight is a gene weight; accordingly, the acquiring an object feature set of the target drug separately corresponding to each preset classification object in the preset classification object set of the classification object dimension comprises: acquiring a sample mutation vector set of the target drug separately corresponding to each preset drug resistance gene in the preset drug resistance gene set of the gene dimension; training, for each preset drug resistance gene, based on the sample mutation vector set corresponding to the preset drug resistance gene, a third initial model, to obtain a trained third target model; determining, based on third classification performance corresponding to the third target model, a gene weight corresponding to the preset drug resistance gene; and adding the sample mutation vector set and the gene weight corresponding to the preset drug resistance gene to an object feature set corresponding to the preset drug resistance gene.
  10. 10 . The method according to claim 1 , wherein when the classification object dimension is the mutation site dimension, the preset classification object set is a preset drug resistance mutation information set, the object weight is a mutation site weight, and each object mutation vector is a site mutation vector; accordingly, the acquiring an object feature set of the target drug separately corresponding to each preset classification object in the preset classification object set of the classification object dimension comprises: acquiring sample mutation features respectively corresponding to at least two strain samples, wherein each sample mutation feature comprises a site mutation vector of the strain sample corresponding to each piece of reference drug resistance mutation information in the reference drug resistance mutation information set; training, based on each sample mutation feature, a fourth initial model, to obtain a trained fourth target model; using, for each piece of reference drug resistance mutation information in the preset drug resistance mutation information set, a model weight corresponding to the reference drug resistance mutation information in the fourth target model as a mutation site weight; and adding the sample mutation vector set and the mutation site weight corresponding to the reference drug resistance mutation information to an object feature set corresponding to the reference drug resistance mutation information.
  11. 11 . A drug resistance testing method, characterized by comprising: acquiring a mutation information set to be tested for a strain to be tested, wherein the mutation information set to be tested comprises at least one piece of mutation information to be tested; acquiring in a drug resistance database a standard drug resistance mutation information set corresponding to a target drug, wherein the standard drug resistance mutation information set comprises at least one piece of standard drug resistance mutation information; and determining, based on overlapping data corresponding to the mutation information set to be tested and the standard drug resistance mutation information set, a target drug resistance result of the strain to be tested against the target drug, wherein the drug resistance database is obtained using the method for constructing a drug resistance database according to claim 1 .
  12. 12 . The method according to claim 11 , wherein the standard drug resistance mutation information set further comprises a mutation score separately corresponding to each piece of standard drug resistance mutation information, and the overlapping data comprises an overlap rate; accordingly, the determining, based on overlapping data corresponding to the mutation information set to be tested and the standard drug resistance mutation information set, a target drug resistance result of the strain to be tested against the target drug comprises: performing a union operation on the mutation information set to be tested and the standard drug resistance mutation information set, to obtain an overlapping mutation information set; determining, based on the mutation score separately corresponding to each piece of standard drug resistance mutation information in the overlapping mutation information set, an overlap rate of the strain to be tested; and determining, based on the overlap rate, the target drug resistance result of the strain to be tested against the target drug.
  13. 13 . The method according to claim 11 , wherein, before acquiring in a drug resistance database a standard drug resistance mutation information set corresponding to a target drug, the method further comprises: acquiring sample mutation features respectively corresponding to at least two strain samples, wherein each sample mutation feature comprises a site mutation vector of the strain sample separately corresponding to each piece of reference drug resistance mutation information in the reference drug resistance mutation information set; separately training, based on each sample mutation feature, at least two fifth initial models, to obtain at least two trained fifth target models; determining, based on the mutation information set to be tested and the reference drug resistance mutation information set, a mutation feature to be tested corresponding to the strain to be tested; separately inputting the mutation feature to be tested into the at least two fifth target models to obtain predicted drug resistance results respectively outputted by the respective fifth target models; and using, when at least two predicted drug resistance results are the same, the predicted drug resistance results as the target drug resistance result of the strain to be tested against the target drug.
  14. 14 . An apparatus for constructing a drug resistance database, characterized by comprising: a preset classification object set acquisition module, configured to acquire a preset classification object set of a target drug corresponding to each of at least one classification object dimension; an object feature set acquisition module, configured to acquire, for each classification object dimension, an object feature set of the target drug separately corresponding to each preset classification object in the preset classification object set of the classification object dimension; a preset classification object set screening module, configured to screen, based on an object weight and a sample mutation vector set in each object feature set, the preset classification object set, to obtain a target classification object set; and a drug resistance database construction module, configured to, based on at least one target classification object set, construct in a drug resistance database a standard drug resistance mutation information set corresponding to the target drug, wherein each classification object dimension comprises a gene dimension and/or a mutation site dimension, the sample mutation vector set comprises at least two object mutation vectors, each object mutation vector comprises at least one mutation site identifier of a strain sample corresponding to the preset classification object, and the mutation site identifier represents whether reference drug resistance mutation information in a reference drug resistance mutation information set exists in a sample mutation information set corresponding to the strain sample.
  15. 15 . A drug resistance testing apparatus, characterized by comprising: a mutation information set to be tested acquisition module, configured to acquire a mutation information set to be tested for a strain to be tested, wherein the mutation information set to be tested comprises at least one piece of mutation information to be tested; a standard drug resistance mutation information set acquisition module, configured to acquire in a drug resistance database a standard drug resistance mutation information set corresponding to a target drug, wherein the standard drug resistance mutation information set comprises at least one piece of standard drug resistance mutation information; and a target drug resistance result determination module, configured to determine, based on overlapping data corresponding to the mutation information set to be tested and the standard drug resistance mutation information set, a target drug resistance result of the strain to be tested against the target drug, wherein the drug resistance database is obtained using the method for constructing a drug resistance database according to claim 1 .
  16. 16 . An electronic device, characterized in that the electronic device comprises: at least one processor; and a memory communicatively connected to the at least one processor, wherein the memory stores a computer program executable by the at least one processor, and the computer program is executed by the at least one processor so as to enable the at least one processor to perform the method for constructing a drug resistance database according to claim 1 .
  17. 17 . A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer instructions configured to, when executed by a processor, cause the processor to implement the method for constructing a drug resistance database according to claim 1 .

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a Continuation of International Application No. PCT/CN2025/078835, filed on Feb. 24, 2025, which claims priority of Chinese patent application No.202410239951.8 filed on Mar. 1, 2024, the entire contents of each of which are hereby incorporated by reference in its entirety. TECHNICAL FIELD The present invention relates to the field of bioinformatics, and in particular to a method for constructing a drug resistance database, a drug resistance testing method, apparatus, and a device. BACKGROUND OF THE INVENTION If a patient misses a dose of medication, takes medication late, changes medication without authorization, or stops medication without authorization during treatment, a pathogenic strain may develop drug resistance to a therapeutic drug, rendering an original treatment scheme ineffective. A phenotypic resistance test for analyzing the drug resistance of a pathogenic strain often takes several weeks. Waiting for results of the drug resistance test before administering medication will greatly delay treatment of a disease. Therefore, rapid testing of the drug resistance of the pathogenic strain is of positive significance for prevention and control of the disease. Currently, a relatively commonly used method for testing the drug resistance of a pathogenic strain is to employ gene sequencing means to acquire mutation site information of the pathogenic strain, compare the mutation site information with drug resistance mutation information corresponding to each of a plurality of therapeutic drugs organized in a drug resistance database, and determine, according to comparison results, whether the pathogenic strain is resistant to a certain therapeutic drug. The described drug resistance testing method places quite high requirements on the drug resistance database. However, current drug resistance databases rely on manual organization, resulting in problems such as untimely updates, incomplete drug resistance mutation information, and inaccurate drug resistance mutation information. SUMMARY Embodiments of the present invention provide a method and apparatus for constructing a drug resistance database, a drug resistance testing method and apparatus, and a device, to solve the problem of conventional drug resistance databases requiring manual organization, thereby improving the efficiency, comprehensiveness, and accuracy of constructing a drug resistance database. According to an embodiment of the present invention, a method for constructing a drug resistance database is provided, the method comprising: Acquiring a preset classification object set of a target drug corresponding to each of at least one classification object dimension;Acquiring, for each classification object dimension, an object feature set of the target drug separately corresponding to each preset classification object in the preset classification object set of the classification object dimension;Screening, based on an object weight and a sample mutation vector set in each object feature set, the preset classification object set to obtain a target classification object set; andBased on at least one target classification object set, constructing in a drug resistance database a standard drug resistance mutation information set corresponding to the target drug, where Each classification object dimension comprises a gene dimension and/or a mutation site dimension, the sample mutation vector set comprises at least two object mutation vectors, each object mutation vector comprises at least one mutation site identifier of a strain sample corresponding to the preset classification object, and the mutation site identifier represents whether reference drug resistance mutation information in a reference drug resistance mutation information set exists in a sample mutation information set corresponding to the strain sample. According to an embodiment of the present invention, a drug resistance testing method is provided, the method comprising: Acquiring a mutation information set to be tested for a strain to be tested, where the mutation information set to be tested comprises at least one piece of mutation information to be tested;Acquiring in a drug resistance database a standard drug resistance mutation information set corresponding to a target drug, where the standard drug resistance mutation information set comprises at least one piece of standard drug resistance mutation information; andDetermining, based on overlapping data corresponding to the mutation information set to be tested and the standard drug resistance mutation information set, a target drug resistance result of the strain to be tested against the target drug, where The drug resistance database is obtained using the method for constructing a drug resistance database according to any embodiment of the present invention. According to another embodiment of the present invention, an apparatus for constructing a drug resistance database