Search

CN-121116271-B - Automatic generation method and system for data analysis instrument panel based on low codes

CN121116271BCN 121116271 BCN121116271 BCN 121116271BCN-121116271-B

Abstract

The embodiment of the specification provides a method and a system for automatically generating a data analysis instrument panel based on low codes, wherein the method comprises the steps of extracting a data set identifier and a user intention vector of an input text through a natural language processing module, mapping the user intention vector to a pre-constructed semantic index space, quantifying the matching degree of semantic descriptions of visual components and the user intention vector in the semantic index space based on cosine similarity, obtaining a candidate component set according to the matching degree, constructing a weighted undirected graph by taking the candidate component set as a node and the semantic relevance among components as side weights, adopting a maximum spanning tree algorithm to generate an initial layout topology, binding a data source field for each candidate component based on the data set identifier according to the initial layout topology, monitoring a user dragging event, recording component position offset, inputting a reinforcement learning network, updating layout strategy network parameters, and pushing the optimized layout topology to a client.

Inventors

  • HU LUNLIANG
  • KANG ZHEN
  • FENG YIJUN
  • LIU HU
  • WU ZHENGANG
  • PENG CHENG
  • GAO JU

Assignees

  • 中建材信息技术股份有限公司
  • 中建材信云智联科技有限公司
  • 中建材信云智联科技有限公司北京分公司
  • 中建材信云智联科技(北京)有限公司

Dates

Publication Date
20260512
Application Date
20250820

Claims (10)

  1. 1. An automatic generation method of a data analysis instrument panel based on low codes is characterized by comprising the following steps: performing dependency syntax analysis and entity recognition on the input text by a natural language processing module, and extracting a dataset identifier and a user intention vector; mapping the user intention vector to a pre-constructed semantic index space, wherein the semantic index space is composed of a plurality of high-dimensional vectors, and each high-dimensional vector corresponds to the semantic description of a visualization component; Quantifying the matching degree of the semantic description of the visual component in the semantic index space and the user intention vector based on cosine similarity, and obtaining a candidate component set according to the matching degree; Constructing a weighted undirected graph by taking the candidate component set as a node and the semantic association degree among components as an edge weight, and generating an initial layout topology by adopting a maximum spanning tree algorithm; Binding data source fields for each candidate component based on a data set identifier according to the initial layout topology, wherein the binding is realized through pattern matching of field types and component input signatures; monitoring a user dragging event, recording component position offset, inputting the offset as a training sample into a reinforcement learning network, and updating layout strategy network parameters; And serializing the optimized layout topology into a configuration object conforming to the target runtime specification, and pushing the configuration object to the client.
  2. 2. The method according to claim 1, wherein the natural language processing module is specifically configured to: a BERT-based pre-trained language model for generating contextual embedding of the user intent vector; The BiLSTM-CRF based entity identifies the sub-network for extracting the dataset identifier.
  3. 3. The method of claim 1, wherein the process of constructing the semantic index space comprises: word2Vec training is carried out on the metadata of each visual component to obtain component-level Word vectors; And performing PCA dimension reduction on the word vector to 512 dimensions to form the high-dimension vector.
  4. 4. The method of claim 1, wherein the maximum spanning tree algorithm employs a Kruskal algorithm and is constrained by the sum of component canvas areas when merging sets.
  5. 5. The method according to claim 1, characterized in that the pattern matching is achieved by the sub-steps of: the parsing component inputs the JSON Schema of the signature; searching a field subset meeting the JSON Schema in a data source field list; and automatically selecting an aggregation function according to the field statistical distribution and generating an SQL fragment.
  6. 6. The method of claim 1, wherein the reinforcement learning network is a near-end policy optimization network, wherein the state space is a graph embedding of a current layout topology, wherein the action space is a discretized grid of component position offsets, and wherein the reward function is a weighted sum of click-through rate and dwell time.
  7. 7. The method of claim 1, wherein the configuration object is in JSON Patch format and is transported in gzip compressed streams over WebSocket channels.
  8. 8. A low code based data analysis dashboard auto-generation system, comprising: the semantic analysis module is used for executing dependency syntactic analysis and entity identification on the input text through the natural language processing module and extracting a data set identifier and a user intention vector; the semantic indexing module is used for mapping the user intention vector to a pre-constructed semantic indexing space, wherein the semantic indexing space is composed of a plurality of high-dimensional vectors, and each high-dimensional vector corresponds to the semantic description of a visualization component; the matching module is used for quantifying the matching degree of the semantic description of the visual component in the semantic index space and the user intention vector based on cosine similarity, and obtaining a candidate component set according to the matching degree; the layout generation module is used for constructing a weighted undirected graph by taking the candidate component set as a node and the semantic association degree among components as an edge weight, and generating an initial layout topology by adopting a maximum spanning tree algorithm; the data binding module is used for binding data source fields for each candidate component based on the data set identifier according to the initial layout topology, and the binding is realized through pattern matching of field types and component input signatures; the interaction optimization module is used for monitoring a user dragging event and recording the position offset of the component, inputting the offset as a training sample into the reinforcement learning network, and updating the network parameters of the layout strategy; And the rendering module is used for serializing the optimized layout topology into a configuration object conforming to the target runtime specification and pushing the configuration object to the client.
  9. 9. An electronic device, comprising: Processor, and A memory arranged to store computer executable instructions which when executed cause the processor to perform the steps of the low code based data analysis dashboard auto-generation method of any of claims 1 to 7.
  10. 10. A storage medium storing computer executable instructions which when executed perform the steps of the low code based data analysis dashboard auto-generation method of any of claims 1 to 7.

Description

Automatic generation method and system for data analysis instrument panel based on low codes Technical Field The present document relates to the field of computer technologies, and in particular, to a method and a system for automatically generating a data analysis dashboard based on low codes. Background Traditional data analysis instrument panel construction generally relies on professional developers to manually write SQL, configure charts and repeatedly adjust layout, and has long period and high cost. In recent years, although the threshold of the low-code platform is reduced through a dragging component, a user is still required to select the chart type, bind the data fields and typeset, and the dashboard generation requirement of a non-professional user is difficult to meet. The prior art has the defects that the recommendation component has large deviation from the user expectation due to the lack of fine granularity understanding of natural language intention, the layout optimization depends on static rules and can not continuously learn user interaction behavior, and the data binding process needs to manually specify fields, so that the automation degree is low. Disclosure of Invention One or more embodiments of the present disclosure provide a method for automatically generating a low-code-based data analysis dashboard, including: performing dependency syntax analysis and entity recognition on the input text by a natural language processing module, and extracting a dataset identifier and a user intention vector; mapping the user intention vector to a pre-constructed semantic index space, wherein the semantic index space is composed of a plurality of high-dimensional vectors, and each high-dimensional vector corresponds to the semantic description of a visualization component; Quantifying the matching degree of the semantic description of the visual component in the semantic index space and the user intention vector based on cosine similarity, and obtaining a candidate component set according to the matching degree; Constructing a weighted undirected graph by taking the candidate component set as a node and the semantic association degree among components as an edge weight, and generating an initial layout topology by adopting a maximum spanning tree algorithm; Binding data source fields for each candidate component based on a data set identifier according to the initial layout topology, wherein the binding is realized through pattern matching of field types and component input signatures; monitoring a user dragging event, recording component position offset, inputting the offset as a training sample into a reinforcement learning network, and updating layout strategy network parameters; And serializing the optimized layout topology into a configuration object conforming to the target runtime specification, and pushing the configuration object to the client. Further, the natural language processing module is specifically configured to: a BERT-based pre-trained language model for generating contextual embedding of the user intent vector; The BiLSTM-CRF based entity identifies the sub-network for extracting the dataset identifier. Further, the construction process of the semantic index space comprises the following steps: word2Vec training is carried out on the metadata of each visual component to obtain component-level Word vectors; And performing PCA dimension reduction on the word vector to 512 dimensions to form the high-dimension vector. Further, the maximum spanning tree algorithm adopts a Kruskal algorithm, and takes the sum of the canvas areas of components as a constraint condition when the sets are combined. Further, the pattern matching is achieved by the sub-steps of: the parsing component inputs the JSON Schema of the signature; searching a field subset meeting the JSON Schema in a data source field list; and automatically selecting an aggregation function according to the field statistical distribution and generating an SQL fragment. Further, the reinforcement learning network is a near-end strategy optimization network, a state space is embedded by a diagram of the current layout topology, an action space is a discretized grid of component position offset, and a reward function is a weighted sum of click rate and residence time. Further, the configuration object adopts a JSON Patch format and is transmitted in a gzip compressed stream through a WebSocket channel. One or more embodiments of the present specification provide a low-code-based data analysis dashboard automatic generation system, including: the semantic analysis module is used for executing dependency syntactic analysis and entity identification on the input text through the natural language processing module and extracting a data set identifier and a user intention vector; the semantic indexing module is used for mapping the user intention vector to a pre-constructed semantic indexing space, wherein the semantic indexing s