EP-3572805-B1 - ANALYSIS DATA ANALYTICS METHOD AND ANALYSIS DATA ANALYTICS DEVICE

EP3572805B1EP 3572805 B1EP3572805 B1EP 3572805B1EP-3572805-B1

Inventors

TAMAI, Yusuke
KAJIHARA, SHIGEKI
FUJITA, SHIN
AISU, Ryota

Dates

Publication Date: 20260513
Application Date: 20170119

Claims (7)

An analytical data analysis method using machine learning of analysis result data (S1) measured by an analyzer (1) which performs analysis of a sample, the analytical data analysis method comprising: generating (S2) a plurality of simulated data by adding a data variation corresponding to a specific variation factor associated with measurement by the analyzer to a plurality of analysis result data within a range in which the result of the discrimination is not reversed when the data variation is added; performing (S3) the machine learning using the plurality of analysis result data and the plurality of generated simulated data; and performing (S5) discrimination using a discrimination criterion obtained through the machine learning; wherein each of the plurality of analysis result data is a spectrum obtained by the analyzer; and the generating of the plurality of simulated data includes; generating the plurality of simulated data (32) by varying a value of an intensity of the spectrum (31) according to a ratio of change of the intensity of the spectrum caused by the sample, generating the plurality of simulated data (41) by giving, to a baseline of the spectrum, a variation corresponding to a variation in the baseline generated at a time of measuring the plurality of analysis result data (40), generating the plurality of simulated data (52) by adding a difference in individual difference data of each of a plurality of analyzers, generating the plurality of simulated data by adding a peak (71a) of an impurity to the spectrum (70) according to the impurity detected at a time of the measurement by the analyzer.
The analytical data analysis method according to claim 1, wherein the plurality of simulated data are generated by adding the data variation within a range of variation in the plurality of analysis result data caused by the specific variation factor.
The analytical data analysis method according to claim 2, comprising: acquiring the variation in the plurality of analysis result data caused by the specific variation factor; and generating the plurality of simulated data by adding the acquired variation in the plurality of analysis result data caused by the specific variation factor.
The analytical data analysis method according to claim 1, wherein the ratio of change of the intensity of the spectrum caused by the sample increases or decreases at a substantially constant rate as a mass of the sample or a wavelength absorbed by the sample increases, and the plurality of simulated data are generated by multiplying the value of the intensity of the spectrum by the ratio (30) of change of the intensity of the spectrum caused by the sample.
The analytical data analysis method according to claim 1, wherein the machine learning is performed, using the plurality of simulated data, on the plurality of analysis result data measured by a mass spectrometer that generates a mass spectrum as the analyzer.
The analytical data analysis method according to claim 5, wherein the plurality of analysis result data include the mass spectrum of a biological sample collected from a subject, and the performing of the discrimination includes performing cancer discrimination on the plurality of analysis result data of the sample using the discrimination criterion.
An analytical data analyzer system comprising: a learning device; and an analytical data analyzer; wherein the learning device is configured to generate simulated data by adding a data variation corresponding to a specific variation factor associated with measurement by another analyzer to analysis result data acquired from the analyzer within a range in which the result of the discrimination is not reversed when the data variation is added and the analysis result data, and to generate a discrimination criterion through machine learning using the simulated data generated and the analysis result data; and the analytical data analyzer includes: a data input that acquires the analysis result data obtained by the analyzer; a storage that stores the discrimination criterion generated by the learning device and a discrimination algorithm for the machine learning; and an arithmetic unit that discriminates the analysis result data acquired by the data input according to the discrimination algorithm using the discrimination criterion; wherein the analysis result data is a spectrum obtained by the analyzer; and the simulated data is configured to be generated by varying a value of an intensity of the spectrum according to a ratio of change of the intensity of the spectrum caused by the sample, to be generated by giving, to a baseline of the spectrum, a variation corresponding to a variation in the baseline generated at a time of measuring the analysis result data, to be generated by adding a difference in individual difference data of each of a plurality of analyzers, to be generated by adding a peak of an impurity to the spectrum according to the impurity detected at a time of the measurement by the analyzer.

Description

Technical Field The present invention relates to an analytical data analysis method, and more particularly, it relates to an analytical data analysis method using machine learning and an analytical data analyzer using machine learning. Background Art Conventionally, an analytical data analysis method using machine learning is known. Such an analytical data analysis method is disclosed in Japanese Patent Laid-Open No. 2016-28229, for example. Japanese Patent Laid-Open No. 2016-28229 discloses an analytical data analysis method for analyzing spectral data using machine learning. In machine learning, it is necessary to perform learning using a large amount of data (a large number of patterns). In Japanese Patent Laid-Open No. 2016-28229, spectral components are thinned out from the spectral data such that the data amount of individual learning data is reduced. CONLIN A K ET AL, "Data augmentation: an alternative approach to the analysis of spectroscopic data", CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, ELSEVIER SCIENCE PUBLISHERS B.V. AMSTERDAM, NL, (19981214), vol. 44, no. 1-2, discusses an approach for analysis of spectroscopic data. SAIZ-ABAJO M J ET AL, "Ensemble methods and data augmentation by noise addition applied to the analysis of spectroscopic data", ANALYTICA CHIMICA ACTA, ELSEVIER, AMSTERDAM, NL, vol. 533, no. 2, discusses an approach for analysis of spectroscopic data. ISIDRO CORTES-CIRIANO ET AL, "Improved Chemical Structure-Activity Modeling Through Data Augmentation", JOURNAL OF CHEMICAL INFORMATION AND MODELING, US, (20151211), vol. 55, no. 12, discusses an approach for augmentation of data. Prior Art Patent Document Patent Document 1: Japanese Patent Laid-Open No. 2016-28229 Summary of the Invention Problems to be Solved by the Invention However, in an analytical data analysis method using machine learning such as the analytical data analysis method using machine learning disclosed in Japanese Patent Laid-Open No. 2016-28229, it is difficult to acquire a large amount of data suitable for machine learning (typical data to be discriminated). For example, it is difficult to acquire several thousands of analysis result data of a biological sample. When the amount of data used for machine learning is small, there is a problem that the accuracy of machine learning is easily reduced due to a data variation. The present invention has been proposed in order to solve the aforementioned problems, and an object of the present invention is to provide an analytical data analysis method and an analytical data analyzer each capable of improving the accuracy of machine learning even when analytical data, in which it is difficult to acquire a large amount of typical data to be discriminated, is discriminated using machine learning. Means for Solving the Problems In order to attain the aforementioned object, an analytical data analysis method according to a first aspect of the present invention is defined by claim 1 herein. The method uses machine learning of analysis result data measured by an analyzer, and includes generating a plurality of simulated data in which a data variation has been added to a plurality of analysis result data within a range that does not affect identification, performing the machine learning using the plurality of generated simulated data, and performing discrimination using a discrimination criterion obtained through the machine learning. In the present invention, the "range that does not affect identification" is defined as a range in which the result of the discrimination is not reversed when the data variation is added. As described above, the analytical data analysis method according to the first aspect of the present invention includes the generating of the plurality of simulated data by adding the data variation within the range that does not affect discrimination, the performing of the machine learning using the plurality of generated simulated data, and the performing of the discrimination using the discrimination criterion obtained through the machine learning. Accordingly, the plurality of simulated data in which the variation has been added within the range that does not affect identification can be generated. Consequently, the amount of data used for the machine learning can be increased, and thus the accuracy of the machine learning can be improved. Here, in the field of image recognition, it is easy to increase the amount of data by adding a conversion to the acquired image, but in the case of scientific analysis data, it is difficult to identify a range in which the data can be varied. When data is only increased, learning is performed on training data, but there is a possibility that the discrimination accuracy may be decreased due to over-fitting, which is a state in which fit (generalization) to unknown data (data to be discriminated) cannot be established. Therefore, in the aforementioned analytical data analysis method according to the first aspect, the ran