EP-3454058-B1 - MASS SPECTROMETRIC DATA ANALYZER, MASS SPECTROMETRIC DATA ANALYZING PROGRAM AND COMPUTER-IMPLEMENTED MASS SPECTROMETRIC DATA ANALYZING METHOD
Inventors
- YAMADA, YOSHIHIRO
- FUNATSU, SHINJI
- SHIMA, KEISUKE
Dates
- Publication Date
- 20260513
- Application Date
- 20160314
Claims (9)
- A mass spectrometric data analyzer (1) configured to search an element selectable as a marker contributing to separation between a plurality of groups based on mass spectrometric data obtained through mass spectrometry of a plurality of samples each belonging to any one of the plurality of groups, the mass spectrometric data analyzer (1) comprising: a) a peak matrix generator (13) configured to arrange mass-to-charge ratio values of peaks on a mass spectrum in a row or column direction and arrange pieces of information for distinguishing the plurality of samples in the row or column direction based on the mass spectrometric data of given ones of the plurality of samples so as to generate a peak matrix in which signal strength values of the peaks are put as the elements; b) a multivariate analyzer (14) configured to apply a predetermined multivariate analysis to the peak matrix generated by the peak matrix generator and render a multivariate analysis result in a graphical representation, the predetermined multivariate analysis being applied to group the plurality of samples or calculate a distance between the plurality of samples; c) a display processor (19) configured to display the peak matrix and a loading plot as the graphical representation of the multivariate analysis result on a display screen, wherein in the loading plot, plotted points respectively indicate peaks in the peak matrix, and one plotted point corresponds to one peak in the peak matrix; d) a means for selecting (2) one or more plotted points on the graphical representation of the multivariate analysis result; e) a selected peak indicator configured to specify, in response to desired one or more plotted points being designated by a user on the graphical representation of the multivariate analysis result displayed on the display screen using the means for selecting (2), a row or a column in the peak matrix indicative of a peak corresponding to the desired one or more plotted points and discriminate the row or the column specified in the peak matrix displayed on the display screen; and characterized in that the mass spectrometric data analyzer additionally comprises f) a peak-to-be-excluded designator configured to allow the user to designate a peak corresponding to the row or the column discriminated on the peak matrix as a marker candidate to be desirably excluded from multivariate analysis targets by checking or unchecking a checkbox (55) associated with the row or the column on the peak matrix displayed on the display screen, and wherein, after one or more peaks are excluded by the peak-to-be-excluded designator (17), the multivariate analyzer (14) is configured to apply the predetermined multivariate analysis to a peak matrix obtained by excluding the peak designated by the peak-to-be-excluded designator and render a multivariate analysis result in a graphical representation.
- The mass spectrometric data analyzer (1) according to claim 1, further comprising a group-to-be-excluded designator (18) configured to allow a user to designate desired one or more than one of the plurality of groups to be desirably excluded from multivariate analysis targets, wherein: the peak matrix generator (13) is configured to generate a peak matrix based on the mass spectrometric data of any samples but samples included in the desired one or more than one of the plurality of groups designated by the group-to-be-excluded designator (18), and the multivariate analyzer (14) is configured to apply a predetermined multivariate analysis to the peak matrix generated after the samples included in the desired one or more than one of the plurality of groups are excluded or a peak matrix obtained by excluding from the peak matrix the peak designated by the peak-to-be-excluded designator (18).
- The mass spectrometric data analyzer (1) according to claim 1 or 2, wherein: the multivariate analysis applied to the peak matrix by the multivariate analyzer is one of PCA and PLS-DA, and the graphical representation displayed on the display screen by the display processor is a loading plot or a score plot.
- A mass spectrometric data analyzing program for use in search of an element selectable as a marker contributing to separation between a plurality of groups based on mass spectrometric data obtained through mass spectrometry of a plurality of samples belonging to any one of the plurality of groups, the mass spectrometric data analyzing program causing a computer to carry out the following steps: a) a peak matrix generating step of arranging mass-to-charge ratio values of peaks on a mass spectrum in a row or column direction and arranging pieces of information for distinguishing the plurality of samples in the row or column direction based on the mass spectrometric data of given ones of the plurality of samples so as to generate a peak matrix in which signal strength values of the peaks are put as the elements; b) a multivariate analyzing step of applying a predetermined multivariate analysis to the peak matrix generated in the peak matrix generating step and of rendering a multivariate analysis result in a graphical representation, the predetermined multivariate analysis being applied to group the plurality of samples or calculate a distance between the plurality of samples; c) a display processing step of displaying the peak matrix and a loading plot as the graphical representation of the multivariate analysis result on a display screen, wherein in the loading plot, plotted points respectively indicate peaks in the peak matrix, and one plotted point corresponds to one peak in the peak matrix; d) a selecting step of selecting one or more plotted points on the graphical representation of the multivariate analysis result; e) a selected peak indicating step of specifying, in response to desired one or more plotted points being designated by a user on the graphical representation of the multivariate analysis result displayed on the display screen in the selecting step, a row or a column in the peak matrix indicative of a peak corresponding to the desired one or more plotted points and of discriminating the row or the column specified in the peak matrix displayed on the display screen; characterized in that the mass spectrometric data analyzing program additionally comprises f) a peak-to-be-excluded designating step of allowing the user to designate a peak corresponding to the row or the column discriminated on the peak matrix as a marker candidate to be desirably excluded from multivariate analysis targets by checking or unchecking a checkbox (55) associated with the row or the column on the peak matrix displayed on the display screen; and g) a multivariate analysis reapplying step wherein, after one or more peaks are excluded in the peak-to-be-excluded designating step, the predetermined multivariate analysis is applied to a peak matrix obtained by excluding from the peak matrix a peak designated in the peak-to-be-excluded designating step and render a multivariate analysis result in a graphical representation.
- The mass spectrometric data analyzing program according to claim 4, further causing to carry out a group-to-be-excluded designating step of allowing the user to designate desired one or more than one of the plurality of groups to be desirably excluded from multivariate analysis targets, wherein: the peak matrix generating step generates a peak matrix based on the mass spectrometric data of any samples but samples included in the desired one or more than one of the plurality of groups designated in the group-to-be-excluded designating step, and the multivariate analysis applying step and/or multivariate analysis reapplying step applies a predetermined multivariate analysis to the peak matrix generated after the samples included in the desired one or more than one of the plurality of groups are excluded or a peak matrix obtained by excluding from the peak matrix the peak designated in the peak-to-be-excluded designating step.
- The mass spectrometric data analyzing program according to claim 4 or 5, wherein: the multivariate analysis applied to the peak matrix in the multivariate analysis applying step and/or the multivariate analysis reapplying step is one of PCA and PLS-DA, and the graphical representation displayed on the display screen in the display processing step is a score plot and a loading plot.
- A computer-implemented mass spectrometric data analyzing method of searching for an element selectable as a marker contributing to separation between a plurality of groups based on mass spectrometric data obtained through mass spectrometry of a plurality of samples belonging to any one of the plurality of groups, the method comprising the following steps: a) a peak matrix generating step of arranging mass-to-charge ratio values of peaks on a mass spectrum in a row or column direction and arranging pieces of information for distinguishing the plurality of samples in the row or column direction based on the mass spectrometric data of given ones of the plurality of samples so as to generate a peak matrix in which signal strength values of the peaks are put as the elements; b) a multivariate analysis applying step of applying a predetermined multivariate analysis to the peak matrix generated in the peak matrix generating step and of rendering a multivariate analysis result in a graphical representation, the predetermined multivariate analysis being applied to group the plurality of samples or calculate a distance between the plurality of samples; c) a display processing step of displaying the peak matrix and a loading plot as the graphical representation of the multivariate analysis result on a display screen, wherein in the loading plot, plotted points respectively indicate peaks in the peak matrix, and one plotted point corresponds to one peak in the peak matrix; d) a selecting step of selecting one or more plotted points on the graphical representation of the multivariate analysis result; e) a selected peak indicating step of specifying, in response to desired one or more plotted points being designated by a user on the graphical representation of the multivariate analysis result displayed on the display screen in the selecting step, a row or a column in the peak matrix indicative of a peak corresponding to the desired one or more plotted points and of discriminating the row or the column specified in the peak matrix displayed on the display screen; characterized in that the computer-implemented mass spectrometric data analyzer method additionally comprises f) a peak-to-be-excluded designating step of allowing the user to designate a peak corresponding to the row or the column discriminated on the peak matrix as a marker candidate to be desirably excluded from multivariate analysis targets by checking or unchecking a checkbox (55) associated with the row or the column on the peak matrix displayed on the display screen; and g) a multivariate analysis reapplying step wherein, after one or more peaks are excluded in the peak-to-be-excluded designating step, the predetermined multivariate analysis is applied to a peak matrix obtained by excluding a peak designated in the peak-to-be-excluded designating step and render a multivariate analysis result in a graphical representation.
- The computer-implemented mass spectrometric data analyzing method according to claim 7, further comprising a group-to-be-excluded designating step of allowing the user to designate desired one or more than one of the plurality of groups to be desirably excluded from multivariate analysis targets, wherein: the peak matrix generating step generates a peak matrix based on the mass spectrometric data of any samples but samples included in the desired one or more than one of the plurality of groups designated in the group-to-be-excluded designating step, and the multivariate analysis applying step and/or multivariate analysis reapplying step applies a predetermined multivariate analysis to the peak matrix generated after the samples included in the desired one or more than one of the plurality of groups are excluded or a peak matrix obtained by excluding from the peak matrix the peak designated in the peak-to-be-excluded designating step.
- The computer-implemented mass spectrometric data analyzing method according to claim 7 or 8, wherein: the multivariate analysis applied to the peak matrix in the multivariate analysis applying step and/or the multivariate analysis reapplying step is one of PCA and PLS-DA, and the graphical representation displayed on the display screen in the display processing step is a score plot and a loading plot.
Description
TECHNICAL FIELD The present invention relates to an apparatus for analyzing mass spectrometric data obtained by mass spectrometry and a computer program for use in the analysis, more particularly to a mass spectrometric data analyzer and a computer program for use in the analysis that are suitably used in differential analysis between a plurality of sample groups. BACKGROUND ART There are ongoing studies on biomarker analysis using mass spectrometry with aims to realize early diagnosis of specific diseases and disorders and to verify therapeutic effects, which have partly been used in real clinical settings. For example, if any substance is found that is barely present or not present at all in biological samples including blood and urine collected from healthy subjects but is evidently present in similar biological samples collected from patients affected with a disorder, for example, cancer, the substance may be a promising biomarker candidate. Biomarkers are also used to identify and/or determine species and strains of microorganisms such as bacteria. Such biomarkers are typically searched by measuring samples derived from two or more groups using a mass spectrometer and performing differential analysis between the groups based on the obtained data. A conventional process to search markers (hereinafter, simply referred to as "markers", because they are not necessarily organism-derived markers) using differential analysis is schematically described below (Patent Literature 1, Non-Patent Literatures 1 to 3). In the description below, "G" refers to the total number of groups, and "S" refers to the total number of samples. [Step 1] First, mass spectrometric data of each sample is obtained and subjected to a peak detecting process to collect peak information, i.e., mass-to-charge ratio and signal strength values of each peak. The peak information is coordinated in the form of a peak list per sample. The peak list is a list of signal strength values associated with and arranged in the order of mass-to-charge ratio values of the peaks. The number of peak lists is S in total, which may be divided into G number of groups. [Step 2] One peak list, i.e., a peak list of one sample is defined as a column vector, and all of the peak lists are arranged in the row direction, i.e., lateral direction so that the signal strength values relative to the same mass-to-charge ratio value are lined up in the same row. The table thus obtained is a peak matrix. In the peak matrix, the peak lists are put together per group in the row direction. In the obtained peak matrix, the number of columns is "S" which is equal to the total number of samples, and the number of rows is equal to the total number "P" of peaks observed in the whole samples (except any overlapping peaks having mass-to-charge ratio values within a range of thresholds, which are counted as 1). Fig. 8 shows an example of the peak matrix. In the illustrated example, the number of groups, "G", is two. [Step 3] The columns in the peak matrix are regarded as "S" number of P-dimensional vectors obtained from the "G" number of groups, to which a predetermined multivariate analysis is applied, for example, principal component analysis (PCA) or partial least squares-discriminant analysis (PLS-DA). Then, a degree of separation between groups and candidates of peaks contributing to the separation between groups (mass-to-charge ratio values) are grasped from the obtained multivariate analysis result. When the peak matrix is subjected to the PCA or PLS-DA, for example, a score plot or a loading plot may be obtained. The score plot is obtained by projecting the columns in the peak matrix on a low-dimensional space. The loading plot is a graphical representation of components in the rows and columns in the peak matrix converted into the score plot. Figs. 9 are drawings that illustrate examples of the score plot and the loading plot obtained by PCA. In the score plot illustrated in Fig. 9 (a), plotted points respectively illustrated with a white circle and a black circle indicate samples of different groups. As illustrated with a dotted line in the figure, the score plot may be useful for visual confirmation of a degree of separation between different groups. In the loading plot illustrated in Fig. 9 (b), plotted points respectively indicate the rows, i.e., peaks, in the peak matrix. In the loading plot, points indicative of peaks that significantly differ in signal strength between different groups are typically plotted in a region with greater absolute values on a first axis (lateral axis). Therefore, a loading value with a greater absolute value on the first axis suggests a peak more contributing to the separation between groups. In the example illustrated in Fig. 9 (b), two plotted points circled with a dotted line may be peaks contributing to the separation between groups, which are prospective marker candidates. Thus, the loading plot may be useful for identifying any peak contr