US-12626170-B2 - System and method for approximating numerical features via cubic splines and applications thereof
Abstract
The present teaching relates to method, system, medium, and implementations for approximating a non-linear relationship between a numerical feature and an output of a model. A value of a numerical feature is received and is transformed, via a transform function, into a transformed value within a fixed range. With respect to each of a plurality of basis functions used for approximating the non-linear relationship, a respective basis function value of the basis function is computed based on the transformed value. An approximated value of the non-linear numeric feature is generated based on a sum of the plurality of basis function values weighted respectively by each corresponding one of a set of the weights, obtained via machine learning.
Inventors
- Alex Shtoff
Assignees
- YAHOO AD TECH LLC
Dates
- Publication Date
- 20260512
- Application Date
- 20210826
Claims (20)
- 1 . A method implemented on at least one processor, a memory, and a communication platform for online advertising, comprising: obtaining, via machine learning, based on training data indicative of effectiveness of online advertisements on various websites among different users, a set of weights each of which is associated with one of a plurality of basis functions; receiving a value of a numerical feature including a plurality of features; transforming, via a transform function, the value into a transformed value within a fixed range; with respect to each of the plurality of basis functions for approximating a non-linear relationship between the numerical feature and a prediction of a model associated with online advertising, computing a respective basis function value of the basis function based on the transformed value; retrieving the set of weights; generating an approximated value of the numerical feature to facilitate assessment on which advertisement is to be displayed to which user and on which platform, wherein the approximated value of the numeral feature depends on a sum of the plurality of basis function values weighted respectively by each corresponding one of the set of the weights, wherein the approximated value of the numerical feature also depends on interactions among the plurality of features, and wherein the approximated value of the numerical feature is used to select an online advertisement for display on an online platform; and adjusting, via machine learning and based on a discrepancy between an actual value of the numerical feature included in the training data and the approximated value of the numerical feature, the set of weights for more accurate approximation.
- 2 . The method of claim 1 , wherein the value of the numerical feature is not bounded; and the transformed value is bounded in the fixed range.
- 3 . The method of claim 1 , wherein the plurality of basis functions correspond to spline functions.
- 4 . The method of claim 1 , wherein the transform function is determined based on a cumulative distribution of the numerical feature and modeling thereof.
- 5 . The method of claim 1 , wherein the machine learning to obtain the set of weights is performed by: initializing the set of weights, each of which is to be associated with a respective one of the plurality of basis functions; accessing numerical feature values of the numerical feature from the training data; and with respect to each of the numerical feature values, generating an approximated numerical feature value based on a weighted sum of basis function values, each of which is weighed using a corresponding one of the set of weights and computed using a corresponding one of the plurality of basis functions based on a transformed numerical feature value associated with the numerical feature value, and adjusting the set of weights based on the numerical feature value and the approximated numerical feature value.
- 6 . The method of claim 5 , wherein the step of generating the approximated numerical feature value comprises: transforming the numerical feature value using the transform function to generate the transformed numerical feature value; calculating the respective basis function values of the plurality of basis functions based on the transformed numerical feature value; applying each of the set of weights to a corresponding one of the respective basis function values to generate a weighted basis function value; and computing the approximated numerical feature value based on a sum of the weighted basis function values.
- 7 . The method of claim 1 , further comprising: receiving values of additional features and corresponding additional weights involved in a factorization machine formulation; and computing a value of the factorization machine formulation based on the approximated value of the numerical feature, the values of the additional features, as well as the corresponding additional weights.
- 8 . Machine readable and non-transitory medium having information recorded thereon for online advertising, wherein the information, once read by the machine, causes the machine to perform the following steps: obtaining, via machine learning, based on training data indicative of effectiveness of online advertisements on various websites among different users, a set of weights each of which is associated with one of a plurality of basis functions; receiving a value of a numerical feature including a plurality of features; transforming, via a transform function, the value into a transformed value within a fixed range; with respect to each of the plurality of basis functions for approximating a non-linear relationship between the numerical feature and a prediction of a model associated with online advertising, computing a respective basis function value of the basis function based on the transformed value; retrieving the set of weights; generating an approximated value of the numerical feature to facilitate assessment on which advertisement is to be displayed to which user and on which platform, wherein the approximated value of the numeral feature depends on a sum of the plurality of basis function values weighted respectively by each corresponding one of the set of the weights, wherein the approximated value of the numerical feature also depends on interactions among the plurality of features, and wherein the approximated value of the numerical feature is used to select an online advertisement for display on an online platform; and adjusting, via machine learning and based on a discrepancy between an actual value of the numerical feature included in the training data and the approximated value of the numerical feature, the set of weights for more accurate approximation.
- 9 . The medium of claim 8 , wherein the value of the numerical feature is not bounded; and the transformed value is bounded in the fixed range.
- 10 . The medium of claim 8 , wherein the plurality of basis functions correspond to spline functions.
- 11 . The medium of claim 8 , wherein the transform function is determined based on a cumulative distribution of the numerical feature and modeling thereof.
- 12 . The medium of claim 8 , wherein the machine learning to obtain the set of weights is carried out by: initializing the set of weights, each of which is to be associated with a respective one of the plurality of basis functions; accessing numerical feature values of the numerical feature from the training data; with respect to each of the numerical feature values, generating an approximated numerical feature value based on a weighted sum of basis function values, each of which is weighed using a corresponding one of the set of weights and computed using a corresponding one of the plurality of basis functions based on a transformed numerical feature value associated with the numerical feature value, adjusting the set of weights based on the numerical feature value and the approximated numerical feature value.
- 13 . The medium of claim 12 , wherein the step of generating the approximated numerical feature value comprises: transforming the numerical feature value using the transform function to generate the transformed numerical feature value; calculating the respective basis function values of the plurality of basis functions based on the transformed numerical feature value; applying each of the set of weights to a corresponding one of the respective basis function values to generate a weighted basis function value; computing the approximated numerical feature value based on a sum of the weighted basis function values.
- 14 . The medium of claim 8 , wherein the information, once read by the machine, further causes the machine to perform the step of: receiving values of additional features and corresponding additional weights involved in a factorization machine formulation; computing a value of the factorization machine formulation based on the approximated value of the numerical feature, the values of the additional features, as well as the corresponding additional weights.
- 15 . A system for online advertising, comprising: a machine learning mechanism configured for obtaining, via machine learning, based on training data indicative of effectiveness of online advertisements on various websites among different users, a set of weights each of which is associated with one of a plurality of basis functions; a training data processor configured for receiving a value of a numerical feature including a plurality of features; a data transformation unit configured for transforming, via a transform function, the value into a transformed value within a fixed range; a basis function value generator configured for computing, with respect to each of the plurality of basis functions for approximating a non-linear relationship between the numerical feature and a prediction of a model associated with online advertising, a respective basis function value of the basis function based on the transformed value, wherein the set of weights are used for generating an approximated value of the numerical feature to facilitate assessment on which advertisement is to be displayed to which user and on which platform, wherein the approximated value of the numeral feature depends on a sum of the plurality of basis function values weighted respectively by each corresponding one of the set of the weights, wherein the approximated value of the numerical feature also depends on interactions among the plurality of features, and wherein the approximated value of the numerical feature is used to select an online advertisement for display on an online platform, and the set of weights is adjusted for more accurate approximation via machine learning and based on a discrepancy between an actual value of the numerical feature included in the training data and the approximated value of the numerical feature.
- 16 . The system of claim 15 , wherein the value of the numerical feature is not bounded; and the transformed value is bounded in the fixed range.
- 17 . The system of claim 16 , wherein the transformed function is determined based on a cumulative distribution of the numerical feature and modeling thereof.
- 18 . The system of claim 15 , wherein the plurality of basis functions correspond to spline functions.
- 19 . The system of claim 15 , wherein the machine learning to obtain the set of weights is performed by: initializing the set of weights, each of which is to be associated with a respective one of the plurality of basis functions; accessing numerical feature values of the numerical feature from the training data; with respect to each of the numerical feature values, generating an approximated numerical feature value based on a weighted sum of basis function values, each of which is weighed using a corresponding one of the set of weights and computed using a corresponding one of the plurality of basis functions based on a transformed numerical feature value associated with the numerical feature value, adjusting the set of weights based on the numerical feature value and the approximated numerical feature value.
- 20 . The system of claim 19 , wherein the step of generating the approximated numerical feature value comprises: transforming the numerical feature value using the transform function to generate the transformed numerical feature value; calculating the respective basis function values of the plurality of basis functions based on the transformed numerical feature value; applying each of the set of weights to a corresponding one of the respective basis function values to generate a weighted basis function value; computing the approximated numerical feature value based on a sum of the weighted basis function values.
Description
BACKGROUND 1. Technical Field The present teaching generally relates to data representation in computers. More specifically, the present teaching relates to representation of numerical features in computers. 2. Technical Background With the development of the Internet and the ubiquitous network connections, more and more commercial and social activities are conducted online. To facilitate a more productive online environment, information about different online events is collected and analyzed in order to more effectively utilize the online environment. For example, event information may be used to determine preferences of users in order to display ads more relevant to their interests. Information about users who signed up for certain online services may be analyzed to determine the more active time frames of different demographical groups so that services or advertisements appropriate for different demographical groups may be scheduled in a more meaningful way. One of the issues in data analytics is how to represent different types of information embedded in online events of interests. For example, a user may fall within a certain age group, may be active on some online platforms during different time frames of the day, may have clicked a certain number ads displayed on different online platforms, etc. Big data with such information may be valuable in correlating certain population with certain preferences in different time periods. Analytics of such collected information are to reveal such correlations and machine learning may be applied to learn how to optimize the performance considering the characteristics captured by such analytics. FIG. 1A shows an exemplary framework called factorization machine that learns such correlation via collected event data. Different features involved in different events may be collected and represented as feature vectors 100. In this illustration, three types of data are involved, including website, gender, and advertiser. The specific example values for the three types of features are Yahoo.com (website), male, and Disney and they are encoded in matrix 100 as embeddings. The encoded values of the feature vectors for website, gender, and advertisers are sent to subnets 110, 120, and 130, which generate individual representation for each type of information. Interactions among types of information do often play an important role. For example, users in a certain gender group may tend to click on ads from some certain advertisers and possibly even on certain website. To learn such interactions among different types of information, the representations of these three types of information are fed to the dot product layer 140. To learn the behavior of a user with respect to advertisements from different advertisers on different websites, both the individual representations of different types of information (such as website, gender, and advertiser) as well as the interactions among different feature types may be used to formulate a learning scheme as provided in FIG. 1A. In this formulation, what is learned include wi, 0<=i<=N, as weights to different features, where w0 represents a bias, wi is a weight applied to feature xi, and dot product xi*xj(vi, vj) represents the interaction between feature xi and feature xj. To capture the patterns embedded in the collected event data, one of the keys is to be able to represent each feature, especially numerical values, accurately. Current approaches to encoding numerical values (such as age of a user or several visits to a site or a few past views per category of products) when used in factorization machine or its variants include both direct numerical feature encoding and binning. The direct numerical feature encoding in factorization machine can represent only linear relationship. Given that, for any more complex non-linear relationship, such as the one shown in FIG. 1B, this approach captures the underlying non-linear relationship using only a linear approximation, as shown in FIG. 1C. As such, it cannot accurately characterize the relationship being modeled. Another common approach to encode numerical features is binning, which allows learning a non-linear relationship using a step function. This is shown in FIG. 1D. With this approach, the range of valid values of a numerical feature is divided into groups, or bins, with each bin is treated or becomes a categorical feature with two values, 0 and 1. In the example illustrated in FIG. 1D, a non-linear relationship between z and the model's output is represented by a piecewise step function 170 with bins x1 corresponding to ranges {0<=z<20}, x2 corresponding to range {20<=z<40}, . . . , and x5 corresponding to range {z>200}. In this representation, with a number that needs to be encoded, the bin the number belongs to gets the value 1 while the other bins get the value 0. That is, through binning, the numerical feature is transformed into a set of categorical features with values “0” or “