CN-121981188-A - KAN self-adaptive basis function selection method and device based on causal invariance
Abstract
The application relates to a causal invariance-based KAN self-adaptive basis function selection method and a causal invariance-based KAN self-adaptive basis function selection device, which belong to the technical field of artificial intelligence and machine learning, and comprise the steps of starting a basis function selection mechanism, initializing a candidate basis function library and screening a submodel; inputting training data and constructing variable level environment evaluation data, constructing candidate base functions for each variable, screening sub-models and independently training, calculating fitting errors of the candidate base functions and environment stability indexes, constructing joint scores, selecting optimal base functions of each variable according to the joint scores, outputting mapping configuration of the variables and base function types, and using the mapping configuration for subsequent model structure determination or prediction tasks. The method combines the basis function selection and causal invariance, ensures fitting precision, remarkably improves the generalization capability and the interpretability of the model across environments, reduces the dependence of manual design, improves modeling efficiency, and is suitable for scenes requiring high robustness and interpretability such as computer vision, industrial control, automatic driving and the like.
Inventors
- ZHANG YONGFENG
- ZHANG KEJIA
- ZHANG XIAOYU
- LIU ZHIHAO
- SU KAI
- XIAO HAOYANG
Assignees
- 济南大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260408
Claims (10)
- 1. A KAN adaptive basis function selection method based on causal invariance, comprising the steps of: step S1, starting a basis function selection mechanism, initializing a candidate basis function library and screening a submodel, wherein the candidate basis function library comprises at least two different types of basis functions; S2, training data are input, variable-level environment evaluation data are constructed according to environment labels, and the environment labels are used for identifying the data environments to which the samples belong; step S3, aiming at each input variable to be modeled, respectively constructing independent screening sub-models based on each candidate basis function, and independently training under the unified training condition to obtain a fitting result corresponding to each candidate basis function; s4, calculating fitting errors of each candidate basis function on the whole sample, calculating environmental stability indexes based on error differences in different environments, and constructing a joint score according to the fitting errors and the environmental stability indexes; Step S5, selecting a candidate base function with the minimum joint score for each input variable as an optimal base function of the variable according to the joint score, and outputting mapping configuration of the variable and the base function type; and S6, configuring the mapping for a structure construction or prediction task of a subsequent Kolmogorov-Arnold network model so as to construct model branches according to the optimal basis functions corresponding to each variable.
- 2. The KAN adaptive basis function selection method based on causal invariance according to claim 1, wherein said step S1 comprises the steps of: s11, splitting an input feature matrix into a plurality of one-dimensional variable inputs according to columns, and establishing a variable-by-variable independent evaluation channel; Step S12, constructing a candidate basis function library containing at least two different types of basis functions, wherein the basis function types at least comprise any two or more of B-spline type basis functions, radial basis functions, fourier basis functions and piecewise linear basis functions; Step S13, establishing a unified feature mapping interface aiming at any candidate basis function, and expanding one-dimensional variable input into a high-dimensional feature vector; Step S14, a one-dimensional screening sub-model with uniform structure is constructed for each variable and each candidate basis function, and the screening sub-model adopts a serial structure of a basis function expansion layer, a hidden layer and an output layer; And S15, initializing training parameters of all screening submodels, including the type of an optimizer, the learning rate and the training round number, and ensuring that different basis functions are compared under the same condition.
- 3. The KAN adaptive basis function selection method based on causal invariance according to claim 1, wherein said step S2 comprises the steps of: step S21, receiving an original input feature matrix, a corresponding target output vector and an environment label vector for identifying the data environment to which each sample belongs; step S22, uniformly converting the input feature matrix, the target output vector and the environment label vector into tensor format; Step S23, determining a variable index set to be subjected to basis function selection, and extracting corresponding one-dimensional variable input for each index from an input feature matrix; Step S24, according to the value of the environment label vector, putting all samples into at least two mutually exclusive environment subsets respectively, and generating corresponding data indexes for each environment subset for respectively calculating model errors under different environments.
- 4. The KAN adaptive basis function selection method based on causal invariance according to claim 2, wherein said step S3 comprises the steps of: step S31, traversing each basis function type in a candidate basis function library for the current variable to be modeled, and respectively realizing a corresponding one-dimensional screening sub-model for the current variable to be modeled to form a candidate model set of the variable; Step S32, for each screening sub-model in the candidate model set, initializing parameters of a corresponding basis function expansion layer and a subsequent network layer; step S33, taking one-dimensional input data of the variable as model input, taking original target output as a supervision signal, and adopting a preset loss function to perform independent iterative training on each screening sub-model under a unified training condition; step S34, in each round of training, calculating a predicted value through forward propagation, calculating a gradient through backward propagation, and updating model parameters by using an optimizer until a preset training round number or convergence condition is reached; And S35, after training is finished, recording a final prediction result and a loss value of each screening sub-model on a training set, and taking the final prediction result and the loss value as fitting performance of the candidate base function on the current variable.
- 5. A KAN adaptive basis function selection method based on causal invariance according to claim 3, wherein said step S4 comprises the steps of: step S41, calculating a fitting error of the whole sample based on the prediction output of the trained screening submodel on all samples for any variable and any candidate basis function thereof; Step S42, calculating the environmental errors of the screening sub-model on each environmental subset according to the at least two mutually exclusive environmental subsets in step S24; Step S43, calculating the absolute value of the difference of the environmental errors on the basis of the environmental errors on the at least two mutually exclusive environmental subsets, and taking the absolute value as an environmental stability index; And S44, constructing a joint scoring function, and carrying out weighted summation on the fitting error of the whole sample and the environmental stability index to obtain a joint score for comprehensively measuring the performance of the candidate basis function.
- 6. The KAN adaptive basis function selection method based on causal invariance according to claim 1, wherein said step S5 comprises the steps of: Step S51, summarizing the joint scores corresponding to all candidate basis functions of the current variables to be modeled to form a score set; Step S52, comparing the values in the grading set, and determining the basis function type corresponding to the minimum grading value; Step S53, determining the base function type as the optimal base function of the variable; Step S54, traversing all variables to be modeled, repeating the steps S51 to S53, and determining the optimal basis function for each variable; Step S55, all variables are associated with the corresponding optimal basic function types, and a configuration result containing the mapping relation between the variable indexes and the basic function types is generated and stored as a dictionary data structure.
- 7. The KAN adaptive basis function selection method based on causal invariance according to any of claims 1 to 6, wherein said step S6 comprises the steps of: Step S61, reading the mapping configuration of the variable and the basic function type output in the step S5; Step S62, when a Kolmogorov-Arnold network model is constructed, instantiating a corresponding basis function module for each input variable according to the mapping configuration; step S63, taking each instantiated variable basis function module as an edge function of the model, and constructing a KAN model with a variable magnitude self-adaptive basis function; and S64, performing subsequent model training or prediction tasks by using the KAN model with the variable level self-adaptive basis function, wherein the edge function of each variable is independently calculated, and the node output is the sum of the edge function outputs of all the variables.
- 8. A KAN adaptive basis function selection device based on causal invariance, comprising: The initialization module is used for starting a base function selection mechanism, initializing a candidate base function library and screening a submodel, wherein the candidate base function library comprises at least two different types of base functions; The data construction module is used for inputting training data and constructing variable-level environment assessment data according to environment labels, wherein the environment labels are used for identifying the data environments to which the samples belong; The independent training module is used for respectively constructing independent screening sub-models based on each candidate basis function aiming at each input variable to be modeled, and carrying out independent training under the unified training condition to obtain a fitting result corresponding to each candidate basis function; the joint scoring module is used for calculating the fitting error of each candidate basis function on the whole sample, calculating an environmental stability index based on the error difference under different environments, and constructing a joint score according to the fitting error and the environmental stability index; The optimal selection module is used for selecting a candidate base function with the minimum joint score for each input variable as an optimal base function of the variable according to the joint score, and outputting mapping configuration of the variable and the base function type; and the model construction module is used for configuring the mapping to be used for the structure construction or prediction task of the follow-up Kolmogorov-Arnold network model so as to construct model branches according to the optimal basis functions corresponding to each variable.
- 9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the KAN adaptive basis function selection method based on causal invariance as claimed in any of claims 1 to 7 when said program is executed by said processor.
- 10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the steps of the KAN adaptive basis function selection method based on causal invariance as claimed in any of claims 1 to 7.
Description
KAN self-adaptive basis function selection method and device based on causal invariance Technical Field The application relates to a method and a device for selecting a Kolmogorov-Arnold network (KAN) self-adaptive basis function based on causal invariance constraint, which are suitable for deep learning model structure optimization, interpretability enhancement and cross-environment generalization capability improvement, can be widely applied to scenes such as computer vision, natural language processing, industrial control, automatic driving, financial analysis, medical health and the like, and belongs to the technical field of artificial intelligence and machine learning. Background With the development of artificial intelligence technology, deep neural networks (Deep Neural Network, DNN) have made significant progress in many fields of computer vision, natural language processing, time series prediction, and the like. However, the conventional deep network generally depends on a large number of parameters and stacked layer structures to improve the expression capability, which results in the model having the disadvantages of 1) high calculation cost, complex network structure, huge calculation amount, low reasoning efficiency, difficulty in meeting the real-time application requirement, 2) insufficient generalization capability, easiness in fitting phenomenon of the model due to excessive dependence on large-scale training data, and 3) lack of interpretability, namely difficult understanding and interpretation of the calculation process inside the model, especially in the field of physical law or high risk decision. In order to solve the problems of complex structure and parameter redundancy of the deep learning model, a neural Network structure based on the Kolmogorov-Arnold expression theorem, that is, a Kolmogorov-Arnold Network (KAN) has been proposed in recent years (see: liu, X., et al, 'KAN: kolmogorov-Arnold networks)' arXiv preprint arXiv:2404.19756, 2024). KAN is not a direct construct of Kolmogorov-Arnold representation theorem, but is a novel neural network proposed by its "high-dimensional function can be represented by a unitary function combination" thought heuristic. Unlike traditional MLPs that use fixed activation at nodes and linear weights at edges, KAN replaces the linear weights of each edge with a learnable unitary function (often parameterized by B-splines), thus "shifting" the nonlinearities to the edges, with nodes only adding and aggregating. The design brings two key characteristics, namely, firstly, the interpretation is enhanced by the visualization and human-readable interaction of a function level, and secondly, on the basis of a plurality of data fitting and partial differential equation solving tasks, the small-scale KAN can obtain the accuracy equivalent to or better than that of a larger MLP and present a faster nerve scaling law. In current research, a typical KAN model employs a fixed set of functions, such as Gaussian Radial Basis Functions (GRBF), B-spline functions, reflection switch activation functions, chebyshev polynomials, fourier transforms, wavelets and other polynomial functions, with each input sample being calculated by a static, fixed function module. While this design guarantees the theoretical expressive power of the network, it also suffers from the disadvantage that, first, computational redundancy, due to the lack of dynamic function selection mechanisms, the model will perform a complete calculation on all function modules, and even if some of them are not valid or contribute very little to the current input sample, they will still be calculated and participate in the output. This not only results in wasted computational resources, but also affects the reasoning speed and energy consumption performance of the model. Second, lack of adaptability, in practical applications, different samples or tasks may depend on different feature patterns. For example, in industrial equipment failure prediction, important feature functions are different in different operating states. However, the existing KAN model adopts a unified calculation path, and cannot be dynamically adjusted according to input characteristics, so that the flexibility of model expression is insufficient. Third, the structural interpretability is not enough, although KAN has a certain interpretability in theory, because the model has no mechanism to explicitly indicate "which function modules the current sample mainly depends on", it is difficult to build a mapping relationship between the model decision and the physical mechanism or business logic. The practical application value of the model is limited in the high risk fields of medical treatment, automatic driving, finance and the like. In implementing the present invention, the inventor has found that at least the above-mentioned problems exist in the prior art, and the root cause is that the design of the existing KAN model is still a