CN-121996685-A - Wind control feature mining method, device, medium and product based on large model
Abstract
The embodiment of the application relates to the technical field of information and discloses a wind control feature mining method, equipment, a medium and a product based on a large model; the method comprises the steps of responding to an air control feature mining request, generating an initial feature extraction SQL statement comprising a sample table and an associated table by using a large model, obtaining an execution plan tree of the initial feature extraction SQL statement in a database, analyzing table associated operation nodes in the execution plan tree, extracting hash predicate conditions and filtering predicate conditions of the table associated operation nodes, checking the initial feature extraction SQL statement based on a checking rule, wherein the checking at least comprises time sequence traversing prevention checking aiming at the table associated operation nodes, generating feedback information based on a checking result to input the feedback information into the large model to drive the large model to correct the initial feature extraction SQL statement if the initial feature extraction SQL statement does not pass the checking, and determining that a target feature extraction SQL statement passes the checking, so that air control feature mining is carried out.
Inventors
- SONG MINGYAO
- LI BIN
- WU WENHUA
- YAN KUN
Assignees
- 上海上湖信息技术有限公司
- 上海耳序信息技术有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260129
Claims (10)
- 1. A method for mining wind control features based on a large model, the method comprising: Responding to the wind control feature mining request, and generating an initial feature extraction SQL sentence containing a sample table and an association table by using a large model; Acquiring an execution plan tree of the initial feature extraction SQL sentence in a database, analyzing a table-associated operation node in the execution plan tree, and extracting hash predicate conditions and filtering predicate conditions of the table-associated operation node; Verifying whether the filtering predicate condition comprises a time field aiming at an association table and a time sequence constraint condition relative to a sample table reference time or not when the hash predicate condition is detected to comprise a user dimension identification field; if the verification is not passed, generating feedback information based on a verification result and inputting the feedback information into the large model so as to drive the large model to correct the initial feature extraction SQL sentence; and determining target features to extract the SQL sentence until the generated SQL sentence passes the verification.
- 2. The method according to claim 1, wherein the step of verifying whether the time field for the association table and the timing constraint with respect to the sample table reference time are contained in the filter predicate condition when the hash predicate condition is detected to contain a user dimension identification field, specifically comprises: analyzing the hash predicate condition, and detecting whether the hash predicate condition contains a user dimension identification field; In response to detecting that the hash predicate condition contains the user dimension identification field, parsing the filter predicate condition, extracting a time comparison expression containing a time field of the association table and a reference time field of the sample table; And checking a comparison operator in the time comparison expression, and judging whether the comparison operator characterizes a time sequence relation of the time field of the association table not later than the reference time of the sample table.
- 3. The method of claim 1, wherein the verifying the initial feature extraction SQL statement based on a verification rule further comprises: Analyzing a window function operation node in the execution plan tree, and checking whether a partition key list of the window function operation node contains a unique identification field of a sample dimension; and/or; analyzing an aggregation operation node in the execution plan tree, and checking whether a grouping key list of the aggregation operation node contains a unique identification field of the sample dimension; if the partition key list or the grouping key list lacks the unique identification field, judging that the verification is not passed, and adding the corresponding node position information into the feedback information.
- 4. A method according to any one of claims 1 to 3, wherein generating feedback information based on the verification result is entered into the large model, comprising: Determining node position information which does not pass the verification, and extracting data table names, field aliases and associated key information which are related in the node position information; based on the error type mapping table, matching a corresponding natural language interpretation template, wherein the natural language interpretation template comprises a standard correction mode; And filling the natural language interpretation template by using the table names, the field aliases and the associated key information of the data table, and constructing a structured prompt word containing correction suggestions as the feedback information.
- 5. The method of claim 1, wherein after the step of determining the target feature extraction SQL statement until the generated SQL statement passes the verification, further comprising: Executing the target feature extraction SQL sentence to extract feature data, training a wind control model by utilizing the feature data, and obtaining a model performance index and a feature importance list of the wind control model; identifying new features generated in the current mining turn from the feature importance list, and determining the ranking positions of the new features; and generating a feature iteration record based on the model performance index and the ranking position, associating the feature iteration record with the target feature extraction SQL statement, and storing the feature iteration record in a feature iteration record library.
- 6. The method of claim 1, wherein the step of generating an initial feature extraction SQL statement comprising a sample table and an association table using a large model in response to a wind-controlled feature mining request comprises: Accessing a feature iteration record library, reading a feature iteration record, and obtaining a model performance evaluation index of the mined features; Determining a characteristic iteration direction of the turn according to the model performance evaluation index, wherein the characteristic iteration direction comprises a characteristic derivative strategy for carrying out logic combination and parameter adjustment on the existing characteristic set or a characteristic expansion strategy for carrying out field introduction and theme switching on an uncovered data domain; And constructing a prompt word based on the characteristic iteration direction, and driving the large model to generate an initial characteristic extraction SQL sentence conforming to the characteristic iteration direction.
- 7. The method of claim 1, wherein the step of generating an initial feature extraction SQL statement comprising a sample table and an association table using a large model in response to a wind-controlled feature mining request, further comprises: Responding to the feature mining request, analyzing the semantic features of the request, and searching feature iteration records matched with the semantic features in a feature iteration record library; Extracting a calculation logic template of the history feature from the matched feature iteration records and an error type record of the history failed attempt; And injecting the calculation logic template and the error type record into an SQL generating prompt word as context knowledge to guide the large model to generate the initial feature extraction SQL sentence.
- 8. An electronic device, the electronic device comprising: one or more processors, and A memory storing computer program instructions that, when executed, cause the processor to perform the steps of the method of any one of claims 1 to 7.
- 9. A computer readable medium having stored thereon a computer program/instruction, which when executed by a processor, implements the steps of the method according to any of claims 1 to 7.
- 10. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of any of claims 1 to 7.
Description
Wind control feature mining method, device, medium and product based on large model Technical Field The application relates to the technical field of information, in particular to a wind control feature mining method, device, medium and product based on a large model. Background In the financial business fields of credit management, anti-fraud, insurance underwriting and the like, the prediction performance of a machine learning model depends on the quality of feature engineering to a great extent. Traditional feature engineering flows are highly dependent on the close collaboration of business experts with data engineers who empirically put assumptions, such as "calculate average consumption of users for 30 days recently," which are converted into complex SQL query statements and executed in a data warehouse. Such manual modes are not only costly, long in iteration cycle, but also difficult to cope with rapidly changing risk fraud modes. With the breakthrough of Large Language Model (LLM) technology, automatic generation of SQL codes by using large models has become an important direction for improving feature mining efficiency. The prior art scheme is to input the service requirement description directly into a large model, and generate feature extraction SQL at one time. However, existing large model-based feature generation techniques face significant challenges in practical industrial landfills, especially in financial scenarios where timing logic requirements are extremely high. First, feature computation must be strictly based on data before "viewpoint", whereas existing large models lack deep understanding of time-sequential causal logic, and the generated SQL often suffers from hidden "feature traversal" problems. For example, only the user ID is used in the multi-table association and the time constraint is ignored, or the incorrect setting of the partition key in the window function results in confusion of data of different borrowing orders for the same user. The crossing error is extremely hidden, the traditional text matching detection based on the regular expression is difficult to cover complex nested query, so that the offline index is high in the model training process, the online practical application effect is greatly reduced, and serious business risks are brought. Second, the data table of a credit scene typically contains hundreds of fields of ultra-wide table structures, and directly injecting a complete table structure into a large model can lead to contextual window overflow or distraction, making it difficult to generate a precise query. In addition, the existing generation schemes are mostly executed once, and the memory and the utilization of historical successful experience and failed training are lacked, so that the continuous accumulation and the automatic evolution of the feature engineering knowledge can not be realized. Disclosure of Invention The application aims to provide a wind control feature mining method, device, medium and product based on a large model, which are at least used for solving the technical problems that hidden feature crossing errors are difficult to automatically detect and correct, an ultra-wide table structure cannot be effectively processed and continuous learning capability is lacking when wind control features SQL are generated by using the large model in the prior art. To achieve the above object, some embodiments of the present application provide the following aspects: in a first aspect, some embodiments of the present application provide a method for mining wind-controlled features based on a large model, the method comprising: Responding to the wind control feature mining request, and generating an initial feature extraction SQL sentence containing a sample table and an association table by using a large model; Acquiring an execution plan tree of the initial feature extraction SQL sentence in a database, analyzing a table-associated operation node in the execution plan tree, and extracting hash predicate conditions and filtering predicate conditions of the table-associated operation node; Verifying whether the filtering predicate condition comprises a time field aiming at an association table and a time sequence constraint condition relative to a sample table reference time or not when the hash predicate condition is detected to comprise a user dimension identification field; if the verification is not passed, generating feedback information based on a verification result and inputting the feedback information into the large model so as to drive the large model to correct the initial feature extraction SQL sentence; and determining target features to extract the SQL sentence until the generated SQL sentence passes the verification. In a second aspect, some embodiments of the application also provide an electronic device comprising one or more processors and a memory storing computer program instructions that, when executed, cause the process