CN-122019154-A - Multi-plane data infrastructure system and bidirectional optimization method based on same

CN122019154ACN 122019154 ACN122019154 ACN 122019154ACN-122019154-A

Abstract

The application relates to the technical field of artificial intelligence and discloses a multi-plane data infrastructure system and a bidirectional optimization method based on the multi-plane data infrastructure system, wherein the system comprises a plurality of function planes which are sequentially coordinated, the plurality of function planes comprise an edge equipment plane, a network arrangement plane, a calculation force processing plane, a data transaction plane and a generation type intelligent plane, and each function plane forms a self-optimization closed loop through bidirectional data circulation. The technical scheme provided by the application can provide a multi-plane data infrastructure system capable of realizing bidirectional cooperation and self-optimization iteration of all links.

Inventors

Xie Gaochang
JIA QINGMIN
ZHOU XIAOMAO
DING CHENGCHENG
ZHANG HUAYU
XIE RENCHAO

Assignees

紫金山实验室

Dates

Publication Date: 20260512
Application Date: 20260121

Claims (12)

1. A multi-plane data infrastructure system is characterized in that the system comprises a plurality of function planes which are sequentially cooperated, wherein the plurality of function planes comprise an edge device plane, a network arrangement plane, a calculation force processing plane, a data transaction plane and a generation type intelligent plane, each function plane forms a self-optimization closed loop through bidirectional data circulation, The edge equipment plane is used for collecting perception data of the multi-source heterogeneous equipment and adjusting collection parameters according to a self-optimization strategy; The network arrangement plane is used for carrying out transmission link planning, bandwidth allocation and transmission priority scheduling on the perceived data and outputting the transmitted perceived data to the calculation force processing plane; the computing power processing plane comprises a computing power resource pool formed by packaging a virtualization technology and a containerization technology, and is used for screening computing power units from the computing power resource pool based on service task demands, preprocessing the perception data and outputting standardized preprocessing data; the data transaction plane is used for realizing asset circulation and value distribution of the preprocessed data based on blockchain and intelligent contracts, and screening high-quality data adapting to a GAI model through value scoring; The generating type intelligent plane comprises a cross-modal knowledge base and a GAI model, wherein the cross-modal knowledge base and the GAI model are used for training the GAI model by utilizing the high-quality data, an reasoning task is executed based on the trained GAI model, a self-optimization strategy adapting to each functional plane is reversely generated based on a reasoning result and the data quality characteristics of the high-quality data, and each functional plane outputs updated high-quality data to flow back to the generating type intelligent plane after being optimized based on the corresponding self-optimization strategy so as to form a self-optimization closed loop.
2. A bi-directional optimization method based on the system of claim 1, wherein the method comprises a forward data flow process and a reverse optimization flow process.
3. The method of claim 2, wherein the forward data flow procedure comprises: the perceived data of the multi-source heterogeneous device is collected through the edge device plane and transmitted to the network arrangement plane; carrying out transmission link planning and resource allocation on the perceived data through the network arrangement plane, and transmitting the perceived data to the computing power processing plane; Screening the adaptive computing units through a computing resource pool of the computing processing plane, preprocessing the perception data, outputting standardized preprocessed data, and transmitting the standardized preprocessed data to the data transaction plane; The value scoring is carried out on the preprocessed data through the data transaction plane, high-quality data adapting to a GAI model are screened out, and the high-quality data are transmitted to the generating intelligent plane; Training a GAI model by using the high-quality data through the generated intelligent plane, executing an inference task based on the trained GAI model, reversely generating a self-optimization strategy adapting to each functional plane based on an inference result and the data quality characteristics of the high-quality data, structuring the high-quality data, and storing the high-quality data into the cross-modal knowledge base.
4. A method according to claim 3, wherein preprocessing the perceptual data to output normalized preprocessed data comprises: Cleaning and denoising the perceived data to remove redundant data and abnormal data; Semantic level fusion is carried out on the perception data of different modes by adopting a multi-mode encoder, so that a feature vector with uniform dimension is generated; and carrying out feature alignment on the feature vectors, and outputting standardized preprocessing data.
5. A method according to claim 3, wherein said scoring the value of the preprocessed data via the data transaction plane, screening out high quality data that fits the GAI model, comprises: splitting the preprocessed data into a plurality of data units; acquiring a data use frequency score, a data quality score and a model performance contribution score of each data unit; Determining a value score for each data unit based on the data usage frequency score, the data quality score, and the model performance contribution score; And screening the data units with the value scores larger than a preset score threshold value to be used as high-quality data of the adaptive GAI model.
6. The method of claim 3, wherein training a GAI model using the high quality data, performing an inference task based on the trained GAI model, comprises: Training a GAI model by using the high-quality data to obtain a trained GAI model; generating a first semantic vector of each data unit in the cross-modal knowledge base and a second semantic vector of an input sample through the trained GAI model, wherein the dimensions of the first semantic vector and the second semantic vector are consistent; determining the semantic similarity of the first semantic vector and the second semantic vector, and obtaining a retrieval result based on a result of screening the similarity; and splicing the input sample with the search result, inputting the spliced input sample into the trained GAI model, and outputting an inference result.
7. A method according to claim 3, wherein said generating a self-optimizing strategy adapted to each functional plane based on the reasoning result and the data quality characteristics of the high quality data in reverse comprises: Obtaining the prediction confidence coefficient of an input sample through the trained GAI model; determining an inference confidence coefficient variation between a preset target confidence threshold and the prediction confidence coefficient; determining a data coverage gap based on the inference confidence change; and generating a self-optimization strategy based on the reasoning confidence change amount and the summary of the data coverage gap, wherein the self-optimization strategy is used for guiding the edge equipment plane to adjust the acquisition frequency and the acquisition range and guiding the network arrangement plane to optimize the transmission priority.
8. A method according to claim 3, wherein said generating a self-optimizing strategy adapted to each functional plane based on the reasoning result and the data quality characteristics of the high quality data in reverse comprises: acquiring a data quality score of each data unit in the high-quality data through the trained GAI model; And generating a self-optimizing strategy for guiding the data transaction plane to adjust the data pricing, sequencing and value screening rules based on the data quality scores.
9. The method of claim 2, wherein the reverse optimization flow process comprises: executing an reasoning task through the trained GAI model of the generated intelligent plane, and generating a self-optimization strategy adapting to each functional plane by combining a reasoning result and the data quality characteristics of the high-quality data; each function plane adjusts the operation parameters based on the corresponding self-optimization strategy; And after the operation parameters are regulated by each functional plane, the forward data flow process is re-executed, and updated high-quality data is output to flow back to the generated intelligent plane so as to form a self-optimization closed loop.
10. The method of claim 2, further comprising a synthetic data generation and reflow process comprising: Obtaining a scarcity seed sample through the generated intelligent plane, wherein the scarcity seed sample is obtained by obtaining real scarcity scene data from the edge equipment plane or obtaining an auxiliary sample by retrieving from the cross-modal knowledge base; Generating, by the trained GAI model, a plurality of synthetic data based on features of the sparse seed sample; Utilizing the synthesized data to finely tune the trained GAI model, and determining the variation of the performance indexes of the model before and after fine tuning; And under the condition that the variation is larger than a preset variation threshold, judging the synthesized data as effective data, and reflowing the effective data to the data transaction plane.
11. The method of claim 2, further comprising an objective function optimizing step comprising: determining a current data set state and a current model state according to a preset period through the generated intelligent plane; Determining the data side resource consumption cost corresponding to the current data set state and determining the model performance improvement benefits corresponding to the current model state; Comparing the net gains of multiple cycles by taking the difference between the model performance improvement gain and the data side resource consumption cost as an objective function, and approaching an optimal solution by adopting an iterative optimization strategy; And taking the target data set state and the target model state corresponding to the optimal solution as a system steady-state reference to guide the generation of the self-optimization strategy.
12. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the bi-directional optimization method of any one of claims 2 to 11.

Description

Multi-plane data infrastructure system and bidirectional optimization method based on same Technical Field The application relates to the technical field of artificial intelligence, in particular to a multi-plane data infrastructure system and a bidirectional optimization method based on the multi-plane data infrastructure system. Background In recent years, the rapid development of generative artificial intelligence (GENERATIVE ARTIFICIAL INTELLIGENCE, GAI) has placed higher demands on data infrastructure, especially the significant differences in data requirements for general purpose large models and small proprietary models. However, with exhaustion of internet public data and high dispersion of industry internal data, it becomes more difficult to obtain high quality and high aging data, which becomes a bottleneck for sustainable development of GAI. Most of the current data infrastructures are designed by traditional analysis tasks, and lack of unified support for diversified data leads to low training efficiency and unstable reasoning performance. In addition, existing systems often fail to systematically consider data lifecycles, neglecting optimization of model feedback for data management, limiting the overall application of GAI. Therefore, how to construct a multi-plane bi-directional collaborative data infrastructure system is a technical problem that needs to be solved currently, aiming at the special requirement of GAI. Disclosure of Invention The application provides a multi-plane data infrastructure system and a bi-directional optimization method based on the multi-plane data infrastructure system, which achieve the technical effects of providing a multi-plane bi-directional collaborative data infrastructure system and realizing full-link bi-directional collaborative and self-optimization iteration. In order to achieve the above purpose, the main technical scheme adopted by the application comprises the following steps: in a first aspect, an embodiment of the present application provides a multi-plane data infrastructure system, the system including a plurality of function planes that cooperate in sequence, the plurality of function planes including an edge device plane, a network orchestration plane, a computation processing plane, a data transaction plane, and a generation-type intelligent plane, each function plane forming a self-optimization closed loop through bi-directional data flow, wherein, The edge equipment plane is used for collecting perception data of the multi-source heterogeneous equipment and adjusting collection parameters according to a self-optimization strategy; The network arrangement plane is used for carrying out transmission link planning, bandwidth allocation and transmission priority scheduling on the perceived data and outputting the transmitted perceived data to the calculation force processing plane; the computing power processing plane comprises a computing power resource pool formed by packaging a virtualization technology and a containerization technology, and is used for screening computing power units from the computing power resource pool based on service task demands, preprocessing the perception data and outputting standardized preprocessing data; the data transaction plane is used for realizing asset circulation and value distribution of the preprocessed data based on blockchain and intelligent contracts, and screening high-quality data adapting to a GAI model through value scoring; The generating type intelligent plane comprises a cross-modal knowledge base and a GAI model, wherein the cross-modal knowledge base and the GAI model are used for training the GAI model by utilizing the high-quality data, an reasoning task is executed based on the trained GAI model, a self-optimization strategy adapting to each functional plane is reversely generated based on a reasoning result and the data quality characteristics of the high-quality data, and each functional plane outputs updated high-quality data to flow back to the generating type intelligent plane after being optimized based on the corresponding self-optimization strategy so as to form a self-optimization closed loop. In a second aspect, an embodiment of the present application provides a bidirectional optimization method based on the above system, where the method includes a forward data flow process and a reverse optimization flow process. In one embodiment, the forward data flow process includes: the perceived data of the multi-source heterogeneous device is collected through the edge device plane and transmitted to the network arrangement plane; carrying out transmission link planning and resource allocation on the perceived data through the network arrangement plane, and transmitting the perceived data to the computing power processing plane; Screening the adaptive computing units through a computing resource pool of the computing processing plane, preprocessing the perception data, outputting standar