Search

EP-4738370-A1 - CLASSIFICATION METHOD, CLASSIFICATION DEVICE, CLASSIFICATION SYSTEM, CLASSIFICATION PROGRAM, AND RECORDING MEDIUM

EP4738370A1EP 4738370 A1EP4738370 A1EP 4738370A1EP-4738370-A1

Abstract

There is provided a classification method including: measuring a base sequence of a nucleic acid molecule in a measurement sample, as a measurement sequence; and classifying the measurement sequence according to a plurality of groups, into each of which base sequences of a plurality of nucleic acid molecules are grouped according to a specific rule, based on a similarity between a representative sequence and the measurement sequence that has been measured in the measuring, each of the representative sequences being set as a base sequence representing one of the respective groups.

Inventors

  • UCHIYAMA, MAKOTO

Assignees

  • ARKRAY, Inc.

Dates

Publication Date
20260506
Application Date
20240522

Claims (13)

  1. A classification method, comprising: measuring a base sequence of a nucleic acid molecule in a measurement sample, as a measurement sequence; and classifying the measurement sequence according to a plurality of groups, into each of which base sequences of a plurality of nucleic acid molecules are grouped according to a specific rule, based on a similarity between a representative sequence and the measurement sequence that has been measured in the measuring, each of the representative sequences being set as a base sequence representing one of the respective groups.
  2. The classification method according to Claim 1, further comprising counting a number of molecules of the measurement sequence that has been classified according to the respective groups.
  3. The classification method according to Claim 2, further comprising: after classifying the measurement sequence, subclassifying the measurement sequence, which has been classified according to the respective groups, according to the respective base sequences included in the groups, wherein, in counting the number of molecules of the measurement sequence, the number of molecules of the measurement sequence that has been subclassified according to the base sequences is counted for each of the base sequences.
  4. The classification method according to Claim 2, further comprising performing specific determination of the measurement sample based on the number of molecules counted in the counting.
  5. The classification method according to Claim 1, wherein the specific rule is based on a similarity of each of the base sequences grouped into the plurality of groups to the representative sequences or a similarity between the base sequences grouped into the plurality of groups.
  6. The classification method according to Claim 1, wherein the representative sequences are set as a longest base sequence among the base sequences included in the groups.
  7. The classification method according to Claim 1, wherein the representative sequences are set as a base sequence having a highest expression level among the base sequences included in the groups.
  8. The classification method according to Claim 1, wherein, in the classifying, the measurement sequence is classified as belonging to a group in which the similarity between the representative sequence and the measurement sequence is equal to or higher than a threshold value or a group in which the similarity between the representative sequence and the measurement sequence is highest.
  9. The classification method according to Claim 1, wherein, in the measuring, a base sequence of a nucleic acid molecule in the measurement sample is measured as a measurement sequence using next-generation sequencing.
  10. A classification device, comprising a processor, wherein the processor is configured to classify a measurement sequence, which has been measured, according to a plurality of groups into each of which base sequences of a plurality of nucleic acid molecules are grouped according to a specific rule, based on a similarity between a representative sequence and the measurement sequence, each of the representative sequences being set as a base sequence representing one of the respective groups.
  11. A classification system comprising: a measurement unit that measures a base sequence of a nucleic acid molecule in a measurement sample as a measurement sequence; and a classifier that classifies the measurement sequence according to a plurality of groups into each of which base sequences of a plurality of nucleic acid molecules are grouped according to a specific rule, based on a similarity between a representative sequence and the measurement sequence that has been measured in the measurement unit, each of the representative sequences being set as a base sequence representing one of the respective groups.
  12. A classification program executable by a computer to perform classification processing comprising classifying a measurement sequence, which has been measured, according to a plurality of groups into each of which base sequences of a plurality of nucleic acid molecules are grouped according to a specific rule, based on a similarity between a representative sequence and the measurement sequence, each of the representative sequences being set as a base sequence representing one of the respective groups.
  13. A non-transitory recording medium storing a classification program, the classification program being executable by a computer to perform classification processing comprising classifying a measurement sequence, which has been measured, according to a plurality of groups into each of which base sequences of a plurality of nucleic acid molecules are grouped according to a specific rule, based on a similarity between a representative sequence and the measurement sequence, each of the representative sequences being set as a base sequence representing one of the respective groups.

Description

Technical Field The present disclosure relates to a classification method, a classification device, a classification system, a classification program, and a recording medium. Background Art In recent years, comprehensive quantitative analysis of nucleic acids is represented by next-generation sequencing (hereinafter, may be abbreviated as NGS), and has been applied to determination of diseases including cancer. With development of comprehensive analysis technologies, the number of analyzable molecular species has dramatically increased. However, as the number of molecular species to be analyzed increases, a calculation amount required for analysis explosively increases. As a result, serious problems, such as an increase in facility cost and a decrease in throughput due to the necessity of high-performance computing devices and long-time analysis, often occur. In the related art, in order to prevent such an increase in calculation amount, there has been proposed a technology of reducing the amount of calculation by reducing the number of molecules to be analyzed through selection of important molecules using statistical analysis or machine learning (refer to Non-Patent Documents 1 and 2). Non-Patent Document 1: Jin_et_al, 2017, Clinical Cancer Research, Evaluation of Tumor-Derived Exosomal miRNA as Potential Diagnostic Biomarkers for Early-Stage Non-Small Cell Lung Cancer Using Next-Generation Sequencing Non-Patent Document 2: Asakura_et_al, 2020, Communications Biology, A miRNA-based diagnostic model predicts resectable lung cancer in humans with high accuracy SUMMARY OF INVENTION Technical Problem In an analysis process using comprehensive molecular species analysis, the following two steps are exemplified as processes with a large processing amount (specifically, calculation amount). Step 1: Classification of target nucleic acid moleculesStep 2: Disease determination/prediction Here, for example, approximately 2600 types of known microRNAs exist in microRNAs in human blood, and several millions of microRNAs exist in a few µL of a blood sample. In addition, in a case of performing the above-described step 1 on several millions of microRNAs, for example, processing of comparing the several millions of microRNAs with each of approximately 2600 known microRNAs and classifying the several millions of microRNAs is performed. As a result, the processing amount (specifically, the calculation amount) is enormous. In the method of selecting important molecules using statistical analysis or machine learning (refer to Non-Patent Documents 1 and 2), processing of comprehensively detecting molecular species and then reducing the number of molecular species to be analyzed is performed. As a result, the processing amount (specifically, the calculation amount) in subsequent machine learning or statistical analysis is reduced, and thus, the processing amount in the above-described step 2 can be reduced. However, in the methods of Non-Patent Documents 1 and 2, the processing amount in the above-described step 1 cannot be reduced, and there still remain problems related to an increase in facility cost and a decrease in throughput due to the necessity of high-performance computing devices and long-time analysis. An object of the present disclosure is to provide a classification method, a classification device, a classification system, a classification program, and a recording medium capable of reducing a processing amount when classifying a base sequence of a nucleic acid molecule. Solution to Problem According to an aspect of the present disclosure, there is provided a classification method including: measuring a base sequence of a nucleic acid molecule in a measurement sample, as a measurement sequence; and classifying the measurement sequence according to a plurality of groups, into each of which base sequences of a plurality of nucleic acid molecules are grouped according to a specific rule, based on a similarity between a representative sequence and the measurement sequence that has been measured in the measuring, each of the representative sequences being set as a base sequence representing one of the respective groups. Advantageous Effects of Invention According to the present disclosure, it is possible to reduce a processing amount when classifying a base sequence of a nucleic acid molecule. BRIEF DESCRIPTION OF DRAWINGS Fig. 1 is a flowchart illustrating an example of each step of a classification method according to the present embodiment.Fig. 2 is a block diagram illustrating an example of a computer that functions as a classification device according to the present embodiment.Fig. 3 is a block diagram illustrating an example of a functional configuration of the classification device according to the present embodiment.Fig. 4 is a list showing 56 types of microRNAs to be classified in the present example.Fig. 5 is a list showing each group obtained by grouping the 56 types of microRNAs shown in Fig. 4 and a represe