EP-4738209-A1 - MACHINE LEARNING PROGRAM, MACHINE LEARNING METHOD, AND INFERENCE PROGRAM
Abstract
A machine learning program for causing a computer to execute processing including: determining a window interval having a peak and a feature related to the window interval based on a spectrum related to partial autocorrelation for each of the plurality of window intervals indicating a variation of an index of time-series data; and performing machine learning of a model that predicts the variation of the index after a first time point from the variation of the index before the first time point using the determined feature.
Inventors
- TOYOTA, Kodai
- URA, AKIRA
Assignees
- FUJITSU LIMITED
Dates
- Publication Date
- 20260506
- Application Date
- 20251030
Claims (15)
- A machine learning program for causing a computer to execute processing comprising: determining a window interval having a peak and a feature related to the window interval based on a spectrum related to partial autocorrelation for each of the plurality of window intervals indicating a variation of an index of time-series data; and performing machine learning of a model (17) that predicts the variation of the index after a first time point from the variation of the index before the first time point using the determined feature.
- The machine learning program according to claim 1, wherein the processing of determining the feature includes: a process of selecting a feature having a peak in which a value of a partial autocorrelation function exceeds a statistical confidence interval.
- The machine learning program according to claim 1, the processing further comprising: registering a selected feature selected by performing feature selection on a plurality of types of features extracted based on the variation of the index before the first time point, as a corpus (16) in association with a data group, wherein the processing of determining the feature includes: a process of extracting the selected feature corresponding to a data group having a feature vector most similar to a feature vector extracted from newly input time-series data from the corpus (16) and determining the extracted selected feature as the feature.
- The machine learning program according to claim 3, the processing further comprising: performing the feature selection on each of a plurality of model algorithms, and registering a model algorithm having the highest accuracy in the corpus (16) in association with the data group; and extracting, from the corpus (16), the model algorithm corresponding to a data group having a feature vector most similar to a feature vector extracted from newly input time-series data, wherein the processing of performing machine learning includes: a process of performing machine learning of a model (17) using the model algorithm extracted from the corpus (16).
- The machine learning program according to any one of claims 1 to 4, wherein the processing of determining the feature includes: a process of stabilizing the time-series data, and a process of determining the feature based on the stabilized time-series data.
- A machine learning method, wherein a computer executes processing of: determining a window interval having a peak and a feature related to the window interval based on a spectrum related to partial autocorrelation for each of a plurality of window intervals indicating a variation of an index of time-series data; and performing machine learning of a model (17) that predicts the variation of the index after a first time point from the variation of the index before the first time point using the determined feature.
- The machine learning method according to claim 6, wherein the processing of determining the feature includes: a process of selecting a feature having a peak in which a value of a partial autocorrelation function exceeds a statistical confidence interval.
- The machine learning method according to claim 6, the processing further comprising: registering a selected feature selected by performing feature selection on a plurality of types of features extracted based on the variation of the index before the first time point, as a corpus (16) in association with a data group, and the processing of determining the feature includes: a process of extracting the selected feature corresponding to a data group having a feature vector most similar to a feature vector extracted from newly input time-series data from the corpus (16) and determining the extracted selected feature as the feature.
- The machine learning method according to claim 8, the processing further comprising: performing the feature selection on each of a plurality of model algorithms, and registering a model algorithm having the highest accuracy in the corpus (16) in association with the data group; and extracting, from the corpus (16), the model algorithm corresponding to a data group having a feature vector most similar to a feature vector extracted from newly input time-series data, and the processing of performing machine learning includes: a process of performing machine learning of a model (17) using the model algorithm extracted from the corpus (16).
- The machine learning method according to any one of claims 6 to 9, wherein the processing of determining the feature includes: a process of stabilizing the time-series data, and a process of determining the feature based on the stabilized time-series data.
- An inference program for causing a computer to execute processing comprising: predicting, using a model (17) generated by machine learning, a variation of an index after a first time point from a variation of the index at and before the first time point, the machine learning using a feature related to a window interval having a peak, the feature and the window interval being based on a spectrum related to partial autocorrelation for each of the plurality of window intervals each indicating a variation of an index of time-series data.
- The inference program according to claim 11, wherein the feature is selected from one or more features each having a peak in which a value of a partial autocorrelation function exceeds a statistical confidence interval.
- The inference program according to claim 11, wherein the processing further comprises: registering, as a corpus (16), a selected feature selected by performing feature selection on a plurality of types of features extracted based on the variation of the index before the first time point in association with a data group, and the feature is the selected feature corresponding to a data group having a feature vector most similar to a feature vector extracted from newly input time-series data, the selected feature being extracted from the corpus (16).
- The inference program according to claim 13, wherein the feature selection is performed on each of a plurality of model algorithms, and a model algorithm having the highest accuracy among the plurality of model algorithms is registered in the corpus (16) in association with the data group, the model algorithm corresponding to a data group having a feature vector most similar to a feature vector extracted from newly input time-series data is extracted from the corpus (16), and the processing of performing machine learning includes: a process of performing machine learning of a model (17) using the model algorithm extracted from the corpus (16).
- The inference program according to any one of claims 11 to 14, wherein the feature is based on the time-series data being subjected to stabilization.
Description
[Technical Field] The embodiments discussed herein are directed to a machine learning program, a machine learning method, and an inference program. [Background Technique] For example, sales prediction and demand prediction can be performed by time-series prediction using a time-series prediction model. In order to train a highly accurate time-series prediction model, it is important to design a time-series feature that well describes a time-series feature (a trend, a seasonal characteristic, or the like). Here, the time-series feature is calculated using data of a certain period (window) in the past from the prediction time point, and the time-series feature is generated for each period. How to select the period and how to select a large number of time-series features to be generated are not automated, and it takes a lot of time in the process of feature engineering. As a method of selecting a feature, for example, RFE (Recursive Feature Elimination) is known. In the RFE, first, a model is trained using all features. Thereafter, the model is retrained while excluding the feature having the lowest index (such as feature importance or the like) for evaluating the importance of the feature. The exclusion of the feature and the retraining of the model are repeated until the total number of the features reaches the number designated by the analyst. In addition, Auto sklearn, which is one implementation in Automated machine learning (AutoML) that automates a part of the process of machine learning, is also known. In the Auto sklearn, the time-series feature data is extracted, the time-series feature data is converted into a table, and the converted table data is input to the AutoML to perform prediction. [Prior Art Reference] [Patent Document] [Patent Document 1] Japanese Laid-open Patent Publication No. 2019-159760 A[Patent Document 2] Japanese National Publication of International Patent Application No. 2023-544011 A[Patent Document 3] Japanese Laid-open Patent Publication No. 2012-27880 A[Patent Document 4] US Patent Application Publication No. 2015/0377938 A[Patent Document 5] US Patent Application Publication No. 2020/0242483 A [Summary of Invention] [Problems to be Solved by Invention] However, in the RFE, the model needs to be retrained every time one feature is excluded, and the calculation cost is very high. On the other hand, the Auto sklearn is a tool for table data, and since information at each time point of the time series is input as a table, trend information such as autocorrelation at the time of model selection cannot be included, and the result does not take into consideration the period of the time-series data. In one aspect, an object of the present embodiment is to enable output of a highly accurate prediction result in a short time even for unstabilized time-series data. [Means to Solve Problems] According to an aspect of the embodiments, a machine learning program for causing a computer to execute process including: determining a window interval having a peak and a feature related to the window interval based on a spectrum related to partial autocorrelation for each of the plurality of window intervals indicating a variation of an index of time-series data; and performing machine learning of a model that predicts the variation of the index after a first time point from the variation of the index before the first time point using the determined feature. [Effect of Invention] According to one embodiment, it is possible to output a highly accurate prediction result in a short time even for unstabilized time-series data. [Brief Description of Drawings] FIG. 1 is a diagram schematically illustrating a configuration of an information processing system according to an embodiment;FIG. 2 is a block diagram illustrating a hardware (HW) configuration example of a computer included in the information processing system according to the embodiment;FIG. 3 is a diagram illustrating an example of configuration of a corpus in the information processing system according to the embodiment;FIG. 4 is a diagram illustrating a time-series dataset in the information processing system according to the embodiment;FIG. 5 is a diagram illustrating Ljung-Box test;FIG. 6 is a diagram illustrating a lag feature in the information processing system according to the embodiment;FIG. 7 is a diagram illustrating a time window feature in the information processing system according to the embodiment;FIG. 8 is a diagram illustrating an example of spectrum;FIG. 9 is a diagram illustrating an example of trigonometric function feature in the information processing system according to the embodiment;FIG. 10 is a flowchart illustrating processing of the corpus creation unit of the information processing system according to the embodiment;FIG. 11 is a flowchart illustrating details of the processing of step A3 of the flowchart illustrated in FIG. 10;FIG. 12 is a flowchart illustrating details of the processing of step A4 of the flowch