CN-122021739-A - Global gradient optimization method based on score matching and application thereof

CN122021739ACN 122021739 ACN122021739 ACN 122021739ACN-122021739-A

Abstract

The invention provides a global gradient optimization method based on score matching, which aims to solve the problem of global optimization by a deterministic gradient method. The method comprises the steps of providing a general method for constructing a micro-layering objective optimization function, providing a global gradient optimization algorithm based on the layering objective function, and providing a global optimization-based generation modeling method, wherein the general method is applied to the field of generation modeling. Compared with the traditional gradient descent algorithm, the global gradient optimization method based on the score matching overcomes the defect of local optimization, and can be used for solving the global optimization problems of non-convexity, irreducibility, dispersion, complex constraint and the like. Compared with the traditional generation paradigm based on probability sampling, the generation paradigm based on global optimization provided by the invention can theoretically obtain better reasoning effect.

Inventors

LI MING

Assignees

李明

Dates

Publication Date: 20260512
Application Date: 20260113

Claims (3)

1. A general method for constructing a micro-differentiable objective optimization function for converting any constrained optimization problem into a unified micro-differentiable objective optimization function, comprising the steps of: step one, converting a constraint optimization problem into an unconstrained problem, wherein any constraint optimization problem can be described by a formula (1), wherein And Given the particular problem, it can be converted to an unconstrained equivalent problem, as shown in equation (2). Step two, constructing a layering target optimization function as shown in a formula (3), And Is a Gaussian distribution defined by a forward diffusion process and its inverse, the forward diffusion process being as shown in equation (4), wherein Is a monotonic function of t and satisfies I.e. as t increases from 0 to 1, data Gradually spread into The hierarchical target optimization function has good optimization property, the optimization hierarchy is given by t, when t=0, the optimization hierarchy is equivalent to the original optimization problem, when t >0, part of low-frequency information is ignored, the larger t is, the more low-frequency information is ignored, and when t=1, all the frequency information is ignored. And thirdly, deriving the layering objective function, and deducing the logarithmic reciprocal of the layering objective function as shown in a formula (5). Wherein the method comprises the steps of Definition As shown in equation (6), the sampling probability of the required data C is a normalization constant. According to The samples are subjected to fractional matching, and the obtained product can be obtained Is an unbiased estimate of (1). Then performing Monte Carlo sampling according to equation (5) will result in a logarithmic gradient of the layered objective function.
2. A global gradient optimization algorithm based on a hierarchical objective function, which is characterized by comprising the following steps: Step one, training a score matching integral model. According to the proportion to Probability of (2) is from Domain random sampling (this is equivalent to following Sample) and normalized to [ -1, 1], for all And performing integral training of score matching. This step is optional and is only for those problems where distributed learning is possible with limited model capacity, otherwise the learning capabilities of the model may be exceeded. Step two, random initialization Initializing . And thirdly, fine tuning the score matching model in a local range. Is satisfied that According to the requirements that From the slave The adjacent areas are sampled and then are proportional to Is resampled and normalized to [ -1, 1], only for the current t, the score matching model is fine-tuned. This step is optional and is only aimed at those difficult problems where step one cannot be performed. And step four, optimizing based on gradient. For the current scale t, based on a fractional matching model, performing Monte Carlo sampling according to a formula (5) can obtain a logarithmic gradient of an objective function shown in a formula (3), and using the logarithmic gradient pair The gradient up is repeatedly performed until convergence. Monte Carlo sampling is required to be according to Sampling Samples, in order to eliminate as much as possible The effect of random noise in limited cases can be sampled in a mirror-symmetrical manner First half the number of samples are sampled, then for each sample Will be Adding into the mixture. And fifthly, exploring based on probability. Due to symmetry, there may be multiple globally optimal solutions when The gradient of step four may always approach 0 when in the plateau region in between the two optimal solutions. At this time, one-step probability exploration is performed using formula (7), i.e., selecting one of the optimal solutions close to the probability to break the dilemma, wherein For the pre-sampled standard gaussian noise, it should remain stable, e.g. constant, throughout the optimization process, wherein Is a decreasing factor of t. Can sample a plurality of Different optimization tracks are formed, so that different optimal solutions can be explored. This step is optional, only for those cases where there are multiple globally optimal solutions or near optimal solutions. Step six, reducing Then repeating the steps three, four, five and six until Will be at the same time Returning as the optimal solution.
3. The global optimization-based generation modeling method is illustrated by taking Large Language Modeling (LLM) as a representative, and is characterized by comprising the following steps: step one, a score matching training method based on context conditions. For a certain context Generating the best answer The above-mentioned global optimization problem based on score matching can be regarded as, but for each Training a separate score matching model is inefficient. Thus, a context-conditioned joint score-matching model can be trained Random sampling from training data At this time Corresponding to Is typically unique and can be sampled directly as a single point distribution (to be Considered as constant) at this time Is a discrete token sequence that is converted to a continuous embedding space or directly to a continuous logits space using an encoder model for all And performing integral training of score matching. And step two, an inference method based on global optimization. For new context Can be based on the training in the step one Executing the global gradient optimization algorithm based on the layering objective function in the step 2 to obtain The decoder model is then used to convert it from a continuous embedding space to a discrete token sequence, or directly recover it from a continuous logits space to a discrete token sequence.

Description

Global gradient optimization method based on score matching and application thereof Technical Field The invention relates to the field of global optimization based on gradients, in particular to a global gradient optimization method based on score matching and application thereof in the aspect of generative modeling. Background Global optimization problems have very general and key significance in the fields of science, mathematics, engineering and the like, and the problems often show nonlinear, non-convex, non-microminiatable, complex constraint and other characteristics. Traditional deterministic algorithms and stochastic heuristics are two main directions that address these challenges. Random meta heuristic is still the first choice to solve the global optimization problem. The method mainly comprises genetic algorithm based on evolution, differential evolution, evolution strategy and the like, particle swarm algorithm based on colony, ant colony algorithm and the like, simulated annealing based on physics and the like. Recently, evolutionarily coded agents (AlphaEvolve) based on large language models have been used to automatically discover or refine heuristic algorithms with good results on some complex optimization problems. However, the random meta heuristic algorithm generally lacks strict mathematical description, is difficult to perform optimal theory analysis, and has the problem of low sampling efficiency on exponential complexity based on the optimization power of original sample statistics. Conventional deterministic algorithms are often used to solve linear, convex, or other simple optimization problems, with Gradient Descent (GD) and its variants SGD, adam, quansi-Newton, etc. being the most mature and common algorithms. However, gradient-based methods tend to fall into local optima and can only be used for continuous slightly and simple convex constraint problems. The global gradient optimization method based on the score matching is provided by constructing a global hierarchical target optimization function and optimizing by using strict gradients, and the gradients can be obtained by a score matching (score-matching) technology. The method overcomes the defect of local optimum of gradient descent, and can be used for solving the global optimization problems of non-convex, non-microminiaturable, discrete, complex constraint and the like. The score matching combined with random partial differentiation (SDE) is the essential core of the current Diffusion model (Diffusion model), the probability distribution of the score matching modeling data from which the SDE samples. Another branch of the generative modeling, autoregressive modeling, is also a similar paradigm, sampling the generation based on the probability distribution of the learning data. The method applies the proposed global optimization method to the generation type modeling, forms a new paradigm, namely, probability distribution of the score matching modeling data, carries out global optimization by hierarchical gradient descent, and generates a sample with highest probability. The generation paradigm based on probability sampling cannot guarantee that better samples, and possibly even very bad samples, can not always be sampled, and the generation paradigm based on global optimization provided by the invention solves the problem. Disclosure of Invention The invention provides a global gradient optimization method based on score matching, which overcomes the defect of local optimal gradient descent, can be used for solving the global optimization problems of non-convex, non-microminiatable, discrete, complex constraint and the like, and is applied to generating modeling. The invention is realized by adopting the following technical scheme, and is described in three parts: 1. a general method for constructing a micro-layering objective optimization function is used for converting any constraint optimization problem into a uniform micro-layering objective optimization function. The method comprises the following steps: step one, converting a constraint optimization problem into an unconstrained problem, wherein any constraint optimization problem can be described by a formula (1), wherein AndGiven the particular problem. It can be converted into an unconstrained equivalence problem as shown in equation (2). Step two, constructing a layering target optimization function as shown in a formula (3),AndIs a Gaussian distribution defined by a forward diffusion process and its inverse, the forward diffusion process being as shown in equation (4), whereinIs a monotonic function of t and satisfiesI.e. as t increases from 0 to 1, dataGradually spread intoThe hierarchical target optimization function has good optimization property, the optimization hierarchy is given by t, when t=0, the optimization hierarchy is equivalent to the original optimization problem, when t >0, part of low-frequency information is ignored, the larger t is, the more