CN-121998677-A - Zero sample marketing gain modeling method and system based on large language model

CN121998677ACN 121998677 ACN121998677 ACN 121998677ACN-121998677-A

Abstract

The invention discloses a zero sample marketing gain modeling method and system based on a large language model, which effectively solve the technical bottleneck that the traditional gain model cannot effectively estimate individual processing effects due to lack of historical samples when facing brand new marketing intervention. The method comprises the steps of converting user characteristics and marketing scene descriptions into natural language prompts, activating pre-trained consumer behavior common sense of the user characteristics and marketing scene descriptions by using a large language model to conduct semantic reasoning, and respectively estimating potential results of the user under the condition of intervention and non-intervention by means of a double-branch anti-facts prediction framework so as to calculate individual processing effects. The method realizes accurate gain estimation without target scene history data, can be widely applied to cold start scenes such as personalized marketing, new product popularization and the like, obviously reduces the enterprise A/B test cost and improves the intelligent level of marketing decisions.

Inventors

LI TONGYUN
FENG JUAN

Assignees

清华大学深圳国际研究生院

Dates

Publication Date: 20260508
Application Date: 20260116

Claims (10)

1. A zero sample marketing gain modeling method based on a large language model is characterized by comprising the following steps: S1, prompt construction, namely constructing a natural language prompt containing a counterfactual task instruction based on structural features of a target user and text description of target marketing intervention; s2, semantic coding and representation learning, namely coding the natural language prompt by utilizing a pre-trained large language model, and extracting a joint representation vector fusing user semantics and intervention semantics; S3, the inverse fact prediction is carried out, namely the joint representation vector is input into a dual-branch prediction network, a first branch of the network outputs a first potential result predicted value of a user under the condition of accepting the target marketing intervention, and a second branch outputs a second potential result predicted value of the user under the condition of not accepting any intervention; And S4, gain calculation and output, namely calculating the difference value of the first potential result predicted value and the second potential result predicted value as an individual processing effect estimated value of the user aiming at the target marketing intervention and outputting the result estimated value.
2. The large language model based zero sample marketing gain modeling method of claim 1, wherein in step S1, the constructing a natural language hint containing a counterfactual task instruction specifically comprises: Converting the numerical type and the category type structural features of the target user into descriptive natural language fragments; Taking the text description of the target marketing intervention as an intervention context; Combining the descriptive natural language fragments, the intervention context and the explicit requirement model to estimate the anti-facts task instructions of the potential results of the user under the intervention and the non-intervention, respectively, to form a complete natural language prompt text.
3. The method for modeling zero sample marketing gain based on a large language model according to claim 1, wherein in step S2, when the pre-trained large language model is used for encoding, a parameter efficient fine tuning technique is used for adapting the large language model.
4. The large language model-based zero-sample marketing gain modeling method of claim 1, wherein in step S2, the extracting the joint expression vector fusing the user semantics and the intervention semantics specifically comprises: Pooling the sequence representation of the large language model encoded output to generate the joint representation vector of fixed dimensions.
5. The large language model based zero sample marketing gain modeling method of claim 1, further comprising, between step S2 and step S3, step S2a, representing a distribution alignment: In the model training process, based on the joint representation vector extracted from the large language model, calculating the distribution distance between the samples of the intervention group and the samples of the control group in the representation space, and adding the distribution distance as a regular term into a loss function to optimize so as to reduce the distribution difference between the two groups.
6. The large language model based zero sample marketing gain modeling method of claim 5, wherein: The representation distribution alignment is achieved by optimizing a joint loss function, the joint loss function comprising a fact-predicted loss term and a representation balance loss term; the fact prediction loss term is calculated only based on the actually observed result of each sample and the corresponding branch prediction value; the representation balance loss term is calculated based on the selected distribution distance metric for causing the representation distribution of the intervention group to approach the representation distribution of the control group.
7. The large language model based zero sample marketing gain modeling method of claim 6, wherein the distribution distance metric used in the representation balance loss term is the maximum mean difference.
8. The large language model based zero sample marketing gain modeling method of claim 1, wherein in step S3, two branches of the dual-branch prediction network are neural networks whose parameters are not shared, sharing the joint representation vector extracted by step S2 as an input; the first branch is to receive the joint representation vector and output a first potential result predictor, and the second branch is to receive the joint representation vector and output a second potential result predictor.
9. A large language model based zero sample marketing gain modeling system, comprising: the prompt construction module is used for constructing a natural language prompt containing a counterfactual task instruction based on the structural characteristics of the target user and the text description of the target marketing intervention; The semantic coding and representation learning module is used for coding the natural language prompt by utilizing a pre-trained large language model and extracting a joint representation vector fusing user semantics and intervention semantics; The inverse fact prediction module inputs the joint representation vector into a dual-branch prediction network, wherein a first branch of the network outputs a first potential result predicted value of a user under the condition of accepting the target marketing intervention, and a second branch outputs a second potential result predicted value of the user under the condition of not accepting any intervention; and the gain calculation and output module calculates the difference value of the first potential result predicted value and the second potential result predicted value as an individual processing effect estimated value of the user aiming at the target marketing intervention and outputs the result estimated value.
10. The large language model based zero sample marketing gain modeling method of claim 9, further comprising: And the representation distribution alignment module is used for calculating the distribution distance between the intervention group sample and the comparison group sample in the representation space based on the joint representation vector extracted from the large language model in the model training process, and adding the distribution distance as a regular term into the loss function to optimize so as to reduce the distribution difference between the two groups.

Description

Zero sample marketing gain modeling method and system based on large language model Technical Field The invention relates to an artificial intelligence and data mining technology, in particular to application of the artificial intelligence in marketing technology, and particularly relates to a zero-sample marketing gain modeling method and system based on a large language model. Background In digital economics, personalized marketing is critical to the improvement of conversion efficiency for enterprises. Traditional marketing models (e.g., click-through rate predictions) focus only on "user response" ignoring whether the marketing intervention actually "triggers" a behavioral change, and thus marketing modeling is moving from "response prediction" to "incremental response estimation," i.e., evaluating individual processing effects (Individual TREATMENT EFFECT, ITE) or gain modeling (Uplift Modeling). The core goal of gain modeling is to estimate the causal effects of interventions on individual outcomes from observed data. Formalized definition is as follows: Wherein Y (1) represents the potential result of the individual under the intervention, Y (0) represents the potential result of the individual under the non-intervention, and X is a user covariate (such as historical behavior, portraits and the like). Gain valueIntuitively reflecting the expected incremental value that would be brought by applying the intervention for a particular user x. According toThe users can be divided into four typical groups (as shown in FIG. 1) 1 persuasion (Persuadables): >0, only the target population that would respond upon receiving an intervention. (2) Ensured (Sure Things): the user, whether or not the intervention is responsive, is wasteful of resources to the intervention. (3) Sleep-on (Lost users): a user that does not respond, whether or not to intervene. (4) Anti-sense (Do-Not-Disturbs): <0, the intervention would instead be to the objectionable user. The existing gain modeling method can be mainly divided into three types, namely (1) a Meta learner method (Meta-learner Methods), wherein the method is used for fitting potential results by multiplexing a traditional supervised learning model and combining different modes. Representative methods include S-learner, T-learner, X-learner, and the like. The method has the advantages of simple implementation and strong compatibility, but the performance is highly dependent on the expression capacity of the base model and the coverage of the sample. (2) Decision Tree-based Methods (Tree-based Methods) such as Uplift Tree, causal Tree, and Causal Forest, identify subgroups of users that are sensitive to intervention by modifying the splitting criteria, with good interpretability, but limited modeling capability in high-dimensional non-linear scenarios. (3) Neural network based Methods (Neural Network-based Methods), such as TarNet, CFRNet, learn causal representations using deep networks and introduce a distribution alignment mechanism to mitigate selection bias. The follow-up work such as EFIN, DESCN and the like further optimizes feature interaction and full space modeling, and improves estimated stability. However, the above methods all rely on historical intervention-response data for supervised training, cannot essentially deal with unseen intervention types, and lack zero sample cross-scene generalization capability. In the face of a brand new marketing strategy, enterprises often need to develop expensive and time-consuming A/B tests to collect data, which results in high cold start cost and long period. In recent years, large language models (Large Language Models, LLMs) have demonstrated significant potential in causal inference tasks. The research direction mainly comprises (1) causal inference and large model combination, wherein LLM is used for variable extraction, causal discovery, anti-facts generation and other tasks. For example, the NATURAL framework enables end-to-end estimation of causal effects from unstructured text, causal Prompting mitigates hint bias through front-door adjustment, and also has been studied to use LLM as a causal effect generator or to build autonomous causal analysis agents (e.g., causal-Copilot). However, existing work has focused on average process effect estimation or qualitative causal discovery, and no research has systematically used LLM for individual process effect estimation, especially in the zero sample marketing gain modeling task. (2) The application of the large model in the search recommendation system is that LLM significantly improves the cold start performance and the interpretability of the recommendation system through semantic coding, interest abstract generation, retrieval enhancement and other means. The main flow paradigm includes LLM-to-Rec (represented by LLM enhancement features), rec-to-LLM (translating user behavior into hints to directly generate recommendations), and a native recommendation large mo