WO-2026091612-A1 - METHOD AND APPARATUS FOR CREATING AI MODEL, AND DATABASE SYSTEM
Abstract
The present disclosure relates to the technical field of computers. Provided are a method and apparatus for creating an AI model, and a database system. In the present disclosure, filtering processing is performed on a first attribute set carried in a model creation instruction. In this way, since attributes used for creating an AI model are reduced by means of filtering processing, during the creation of the AI model, the amount of data that a database server needs to transmit to an AI server is reduced, such that the data transmission efficiency is improved, thereby improving the efficiency of AI model creation.
Inventors
- LI, Jingjin
- GAO, Congli
- MA, WENLONG
- REN, Bo
- ZHANG, WENLIANG
Assignees
- 华为云计算技术有限公司
Dates
- Publication Date
- 20260507
- Application Date
- 20250627
- Priority Date
- 20241029
Claims (20)
- A method for creating an artificial intelligence (AI) model, characterized in that the method is applied to a database server, and the method includes: The system receives a first model creation instruction carrying a first set of attributes, wherein the first set of attributes includes multiple attributes, and the first model creation instruction is used to instruct the creation of an AI model based on the data corresponding to the first set of attributes stored in the database server. Based on the data corresponding to the first attribute set, the multiple attributes included in the first attribute set are filtered out to obtain the second attribute set. The data corresponding to the second attribute set stored in the database server is sent to the AI server, and the AI server is used to create the AI model based on the data corresponding to the second attribute set. Receive the AI model fed back from the AI server.
- According to the method of claim 1, the step of filtering out multiple attributes included in the first attribute set based on the data corresponding to the first attribute set to obtain a second attribute set includes: The target attribute is filtered out from the first attribute set to obtain the second attribute set, wherein the distribution characteristics of the data of the target attribute satisfy the distribution filtering condition, and/or the association characteristics of the data of the target attribute with the data of other attributes satisfy the association filtering condition.
- The method according to claim 2, wherein the distribution screening conditions include at least one of the following screening conditions: The proportion of null values in the target attribute data reaches a first proportion threshold. The number of different values in the target attribute data is less than the number threshold.
- The method according to claim 2 or 3, characterized in that the associated screening conditions include at least one of the following screening conditions: The correlation between the data of the target attribute and the data of other attributes is greater than a first correlation threshold; The correlation between the data of the target attribute and the data of other attributes is less than a second correlation threshold; The other attributes belong to the first attribute set.
- The method according to any one of claims 1-4, characterized in that the method further comprises: Receive the instruction information fed back by the AI server after the AI model is created; Based on the indicated information, a third attribute set is generated and the third attribute set is recorded; Receive the model retraining instruction corresponding to the AI model; The data corresponding to the third attribute set stored in the database server is sent to the AI server, and the AI server is used to retrain the AI model based on the data corresponding to the third attribute set. Receive the retrained AI model from the AI server.
- The method according to any one of claims 1-4, characterized in that the method further comprises: A specified number of data corresponding to the first attribute set are sent to the AI server, and the AI server is used to determine indication information based on the specified number of data corresponding to the first attribute set. Receive the instruction information fed back by the AI server; Based on the indicated information, a third attribute set is generated and the third attribute set is recorded; Receive the model retraining instruction corresponding to the AI model; The data corresponding to the third attribute set stored in the database server is sent to the AI server, and the AI server is used to retrain the AI model based on the data corresponding to the third attribute set. Receive the retrained AI model from the AI server.
- According to the method of claim 6, the method further includes, before sending the specified number of data corresponding to the first attribute set to the AI server: It is determined that the data corresponding to the first attribute set has changed relative to the last time the third attribute set was recorded, and the change ratio reaches the second ratio threshold.
- The method according to any one of claims 5-7, wherein the indication information includes the attributes selected by the AI server for model training; The third set of attributes consists of attributes selected by the AI server for model training.
- The method according to any one of claims 5-7, wherein the indication information includes importance values corresponding to each of the at least one attribute; The third set of attributes consists of attributes whose importance value is greater than the importance threshold.
- The method according to any one of claims 1-9, characterized in that, after filtering out multiple attributes included in the first attribute set based on the data corresponding to the first attribute set to obtain the second attribute set, the method further includes: Generate a second model creation instruction carrying a second set of attributes, the second model creation instruction being used to instruct the data corresponding to the second set of attributes to be sent to the AI server; Sending the data corresponding to the second attribute set stored in the database server to the AI server includes: in response to the second model creation instruction, sending the data corresponding to the second attribute set stored in the database server to the AI server.
- A database system, characterized in that the database system includes a database server, the database server being used for: The system receives a first model creation instruction carrying a first set of attributes, wherein the first set of attributes includes multiple attributes, and the first model creation instruction is used to instruct the creation of an AI model based on the data corresponding to the first set of attributes stored in the database server. Based on the data corresponding to the first attribute set, the multiple attributes included in the first attribute set are filtered out to obtain the second attribute set. The data corresponding to the second attribute set stored in the database server is sent to the AI server, and the AI server is used to create the AI model based on the data corresponding to the second attribute set. Receive the AI model fed back from the AI server.
- The database system according to claim 11, wherein the database server is configured to: The target attribute is filtered out from the first attribute set to obtain the second attribute set, wherein the distribution characteristics of the data of the target attribute satisfy the distribution filtering condition, and/or the association characteristics of the data of the target attribute with the data of other attributes satisfy the association filtering condition.
- The database system according to claim 12, wherein the distribution screening conditions include at least one of the following screening conditions: The proportion of null values in the target attribute data reaches a first proportion threshold. The number of different values in the target attribute data is less than the number threshold.
- The database system according to claim 12 or 13 is characterized in that the association filtering conditions include at least one of the following filtering conditions: The correlation between the data of the target attribute and the data of other attributes is greater than a first correlation threshold; The correlation between the data of the target attribute and the data of other attributes is less than a second correlation threshold; The other attributes belong to the first attribute set.
- The database system according to any one of claims 11-14, characterized in that the database server is further configured to: Receive the instruction information fed back by the AI server after the AI model is created; Based on the indicated information, a third attribute set is generated and the third attribute set is recorded; Receive the model retraining instruction corresponding to the AI model; The data corresponding to the third attribute set stored in the database server is sent to the AI server, and the AI server is used to retrain the AI model based on the data corresponding to the third attribute set. Receive the retrained AI model from the AI server.
- The database system according to any one of claims 11-14, characterized in that the database server is further configured to: A specified number of data corresponding to the first attribute set are sent to the AI server, and the AI server is used to determine indication information based on the specified number of data corresponding to the first attribute set. Receive the instruction information fed back by the AI server; Based on the indicated information, a third attribute set is generated and the third attribute set is recorded; Receive the model retraining instruction corresponding to the AI model; The data corresponding to the third attribute set stored in the database server is sent to the AI server, and the AI server is used to retrain the AI model based on the data corresponding to the third attribute set. Receive the retrained AI model from the AI server.
- The database system according to claim 16, wherein the database server is further configured to: It is determined that the data corresponding to the first attribute set has changed relative to the last time the third attribute set was recorded, and the change ratio reaches the second ratio threshold.
- The database system according to any one of claims 15-17, wherein the indication information includes the attributes selected by the AI server for model training; The third set of attributes consists of attributes selected by the AI server for model training.
- The database system according to any one of claims 15-17, wherein the indication information includes importance values corresponding to each of the at least one attribute; The third set of attributes consists of attributes whose importance value is greater than the importance threshold.
- The database system according to any one of claims 11-19, wherein the database server is further configured to: Generate a second model creation instruction carrying a second set of attributes, the second model creation instruction being used to instruct the data corresponding to the second set of attributes to be sent to the AI server; The database server is used for: In response to the second model creation instruction, the data corresponding to the second attribute set stored in the database server is sent to the AI server.
Description
Methods, apparatus and database systems for creating AI models This application claims priority to Chinese Patent Application No. 202411526539.0, filed on October 29, 2024, entitled "Method, Apparatus and Database System for Creating AI Models", the entire contents of which are incorporated herein by reference. Technical Field This disclosure relates to the field of computer technology, and in particular to a method, apparatus and database system for creating AI models. Background Technology With the development of artificial intelligence (AI) and database technologies, the integration of AI and databases has become an important trend in technological development. Many database systems have already integrated AI-related operations, such as creating AI models and using AI models for reasoning. After receiving an instruction to create an AI model, the database server can locally read data from several attributes specified in the instruction and transmit the corresponding data to the AI server. The AI server uses this data to train the model, obtains the AI model, and returns it to the database server. Users can then use this AI model for inference within the database server. However, the process of creating an AI model generally requires a large amount of data for model training, which means that the database server needs to send a large amount of data to the AI server, resulting in low efficiency in creating AI models. Summary of the Invention This disclosure provides a method, apparatus, and database system for creating AI models, which reduces the amount of data transmitted and improves data transmission efficiency. Firstly, a method for creating an AI model is provided, which is applied to a database server. The method includes: firstly, receiving a first model creation instruction carrying a first set of attributes; then, based on the data corresponding to the first set of attributes, performing a filtering process on multiple attributes included in the first set of attributes to obtain a second set of attributes; then, sending the data corresponding to the second set of attributes stored in the database server to an AI server; finally, the AI server creates an AI model based on the data corresponding to the second set of attributes and feeds the AI model back to the database server. The first attribute set includes multiple attributes. The first model creation instruction is used to instruct the creation of an AI model based on the data corresponding to the first attribute set stored in the database server. The AI server is used to create an AI model based on the data corresponding to the second attribute set. In this way, because the attributes used to create the AI model are reduced through the filtering process, the amount of data that the database server needs to transmit to the AI server during the AI model creation process is reduced, improving data transmission efficiency and thus improving the efficiency of creating the AI model. In one possible implementation, the filtering process can be as follows: filtering out the target attribute from the first attribute set to obtain a second attribute set, wherein the distribution characteristics of the target attribute data satisfy the distribution filtering condition, and/or, the association characteristics of the target attribute data with the data of other attributes satisfy the association filtering condition. In this way, by distributing and/or associating screening conditions, some attributes can be selectively screened out, which effectively reduces the amount of data transmission and improves the overall performance of the system while ensuring the performance of the AI model. In one possible implementation, the distribution screening conditions include at least one of the following screening conditions: the proportion of null values in the data of the target attribute reaches a first proportion threshold; the number of different values in the data of the target attribute is less than a number threshold. In this way, by setting the filtering condition "the proportion of null values in the attribute data reaches the first proportion threshold", those containing a large number of null values can be effectively filtered out. By setting the filtering condition "the number of different values in the attribute data is less than the number threshold", those attributes with overly simplistic values can be effectively filtered out. These attributes do not contribute much to improving the performance of the AI model during training. Therefore, setting such filtering conditions can reduce unnecessary data transmission and improve the system's processing efficiency without affecting the performance of the AI model. In one possible implementation, the association filtering conditions include at least one of the following filtering conditions: the correlation between the data of the target attribute and the data of other attributes is greater than a first cor