Search

KR-20260067304-A - METHOD AND SYSTEM FOR DATA NORMALIZATION FOR LEARNING FROM TABULAR DATA

KR20260067304AKR 20260067304 AKR20260067304 AKR 20260067304AKR-20260067304-A

Abstract

The present invention relates to a data normalization method and system for learning tabular data, and provides a B-spline-based data normalization method and system for efficient Deep Neural Network (DNN) learning in tabular data.

Inventors

  • 서민국
  • 어문정
  • 심예슬
  • 임우형

Assignees

  • 주식회사 LG 경영개발원

Dates

Publication Date
20260512
Application Date
20250908
Priority Date
20241105

Claims (18)

  1. In a computerized method including the following, A step of receiving table data configured to include multiple different features and multiple samples having values corresponding to each of the multiple features; A step of obtaining a plurality of normalized features by applying a curve-parameterized normalization function to each of the plurality of features included in the table data; A step of performing training of an artificial neural network using each of the above-mentioned normalized multiple features; and A data normalization method for learning tabular data, characterized by including a step of optimizing the normalization function based on the progress of learning of the artificial neural network.
  2. In paragraph 1, The method further includes the step of applying a min-max transformation to each of the plurality of features included in the table data, In the step of acquiring the above-mentioned normalized multiple features, A data normalization method for learning tabular data, characterized by applying the normalization function to each of the plurality of features to which the above min-max transformation is applied to obtain the above normalized plurality of features.
  3. In paragraph 1, The step of training the artificial neural network above is, A step of calculating a learning loss between a learning target value corresponding to the table data and an output value for the normalized plurality of features using a pre-set loss function; and The method includes the step of training the artificial neural network so that the calculated learning loss is reduced, and In the above optimization step, A data normalization method for learning tabular data characterized by optimizing the normalization function based on the calculated learning loss.
  4. In paragraph 3, The step of calculating the above learning loss is, The step of calculating the learning loss between the output values for a plurality of samples, including the learning target value and the value corresponding to each of the normalized plurality of features, using the above loss function, and In the above optimization step, A data normalization method for learning tabular data, characterized by optimizing at least one parameter of the normalization function simultaneously with the learning of the artificial neural network.
  5. In paragraph 4, In the above optimization step, Based on the importance specified for each of the plurality of samples, the parameters of the normalization function are optimized simultaneously with the training of the artificial neural network, and The above importance is, A data normalization method for learning tabular data characterized by being determined based on the learning loss calculated through the above loss function.
  6. In paragraph 5, To determine the importance of each of the plurality of samples, a step of analyzing the learning difficulty of each of the plurality of samples based on the learning loss calculated through the loss function; and Based on the analyzed learning difficulty above, the method further includes the step of specifying the importance of each of the plurality of samples, and The above-mentioned optimization step is, A data normalization method for learning tabular data, characterized by a step of adjusting the slope of the normalization function according to the importance of each of the plurality of samples.
  7. In paragraph 6, The above learning difficulty is, The above learning loss includes at least one of a first learning difficulty related to satisfying a first criterion and a second learning difficulty related to satisfying a second criterion, and In the step of specifying the importance mentioned above, To adjust the slope of the above normalization function, the importance of the first sample satisfying the first learning difficulty among the plurality of samples is specified as having the first importance, and A data normalization method for learning tabular data, characterized in that the importance of a second sample satisfying the second learning difficulty among the plurality of samples is specified as having the second importance.
  8. In Paragraph 7, In the above optimization step, A data normalization method for learning tabular data, characterized by adjusting the magnitude of the slope of the normalization function to increase in the case of the first sample having the first importance among the plurality of samples.
  9. In Paragraph 7, In the above optimization step, A data normalization method for learning tabular data, characterized by adjusting the magnitude of the slope of the normalization function to decrease in the case of the second sample having the second importance among the plurality of samples.
  10. In paragraph 6, The above-mentioned optimization step is, The step of adjusting the parameters of the normalization function according to the learning difficulty of each of the plurality of samples using the above loss function, and The above loss function is, A data normalization method for learning tabular data, characterized by being defined to adjust the slope of the normalization function according to the importance of each of the plurality of samples.
  11. In Paragraph 10, The above importance includes at least one of a first importance and a second importance determined according to the learning loss, and In the above optimization step, A data normalization method for learning tabular data, characterized by adjusting the parameters of the normalization function using the loss function so that, in the case of the first sample having the first importance among the plurality of samples, the magnitude of the slope of the normalization function increases.
  12. In Paragraph 10, In the above optimization step, A data normalization method for learning tabular data, characterized by adjusting the parameters of the normalization function using the loss function so that, in the case of a second sample having the second importance among the plurality of samples, the magnitude of the slope of the normalization function is reduced.
  13. In paragraph 4, The above-mentioned optimization step is, A data normalization method for learning tabular data, characterized by a step of updating the parameters of the normalization function based on the plurality of sample-specific learning losses.
  14. In Paragraph 13, A data normalization method for learning tabular data, characterized in that when the parameters of the normalization function are updated, the values of each of the normalized plurality of features are updated.
  15. In paragraph 4, The above loss function is, A data normalization method for learning tabular data, characterized in that the learning process of an artificial neural network and the parameter optimization process of the normalization function are defined to be performed separately.
  16. In paragraph 1, A data normalization method for learning tabular data, characterized in that the above normalization function is generated for each of the plurality of features included in the above table data.
  17. In a system comprising memory configured to store executable instructions and one or more processors configured to perform operations by executing one or more instructions, The above system is, Receive table data configured to include multiple different features and multiple samples having values corresponding to each of the multiple features, and For each of the plurality of features included in the table data above, a curve-parameterized normalization function is applied to obtain a plurality of normalized features, and Using each of the above-mentioned normalized multiple features, training of the artificial neural network is performed, and A data normalization system for learning tabular data characterized by optimizing the normalization function based on the progress of learning of the artificial neural network.
  18. A program that is executed by one or more processes in an electronic device and stored on a computer-readable recording medium, The above program is, A step of receiving table data configured to include multiple different features and multiple samples having values corresponding to each of the multiple features; A step of obtaining a plurality of normalized features by applying a curve-parameterized normalization function to each of the plurality of features included in the table data; A step of performing training of an artificial neural network using each of the above-mentioned normalized multiple features; and A program stored on a computer-readable recording medium characterized by including instructions for performing a step of optimizing the normalization function based on the progress of learning of the artificial neural network.

Description

Method and System for Data Normalization for Learning from Tabular Data The present invention relates to a data normalization method and system for learning tabular data, and provides a B-spline-based data normalization method and system for efficient Deep Neural Network (DNN) learning in tabular data. Input normalization of numerical features plays a crucial role in improving the performance and training stability of neural networks. This is particularly important for tabular data, which is widely used across many industries. A key characteristic of tabular data is its high heterogeneity, which means that features often have different scales and distributions. Therefore, it is essential to apply custom normalization to each input feature. Unlike image or text data, tabular data generally has non-uniform input features; therefore, more precise normalization techniques are required to effectively handle diverse feature distributions. In this regard, various normalization techniques have been proposed to date, but they are often specialized for specific input feature distributions and have limitations in that they fail to consider the neural network learning process itself. In other words, they are not optimized for neural network training. Therefore, there is still a need for new normalization methods for efficient neural network training on tabular data. FIG. 1 is a conceptual diagram illustrating a data normalization system for learning tabular data according to the present invention. FIGS. 2a, FIGS. 2b, and FIGS. 2c are flowcharts illustrating a data normalization method for learning tabular data according to the present invention. FIGS. 3a, FIGS. 3b, FIGS. 4, and FIGS. 5 are conceptual diagrams illustrating a data normalization method for learning tabular data according to the present invention. FIGS. 6, FIGS. 7, and FIGS. 8 are formulas related to a data normalization method for learning tabular data according to the present invention. FIG. 9 is a table showing an example of the learning result of an artificial neural network trained using a data normalization method for learning tabular data according to the present invention. FIG. 10 illustrates an example of a block diagram of a computing system in which the present invention can be implemented. FIG. 11 illustrates an example of a block diagram of a computing device that may be included in a user computing device, a server computing system, and a training computing system, as an embodiment of a computing system in which the present invention can be implemented. FIG. 12 illustrates an example of a block diagram from another perspective of a computing device, which is one of the components of a computing system. Hereinafter, embodiments disclosed in this specification will be described in detail with reference to the attached drawings. Identical or similar components are assigned the same reference number regardless of the drawing symbols, and redundant descriptions thereof will be omitted. The suffixes "module" and "part" used for components in the following description are assigned or used interchangeably solely for the ease of drafting the specification and do not have distinct meanings or roles in themselves. Furthermore, in describing the embodiments disclosed in this specification, if it is determined that a detailed description of related prior art could obscure the essence of the embodiments disclosed in this specification, such detailed description will be omitted. Additionally, the attached drawings are intended only to facilitate understanding of the embodiments disclosed in this specification; the technical concept disclosed in this specification is not limited by the attached drawings, and it should be understood that they include all modifications, equivalents, and substitutions that fall within the spirit and technical scope of the present invention. Terms including ordinal numbers, such as first, second, etc., may be used to describe various components, but said components are not limited by said terms. These terms are used solely for the purpose of distinguishing one component from another. When it is stated that one component is "connected" or "connected" to another component, it should be understood that while it may be directly connected or connected to that other component, there may also be other components in between. On the other hand, when it is stated that one component is "directly connected" or "directly connected" to another component, it should be understood that there are no other components in between. A singular expression includes a plural expression unless the context clearly indicates otherwise. In this application, terms such as “comprising” or “having” are intended to specify the existence of the features, numbers, steps, actions, components, parts, or combinations thereof described in the specification, and should be understood as not precluding the existence or addition of one or more other features, numbers, ste