CN-121434158-B - Structured XML standard document intelligent query auxiliary system based on large model

CN121434158BCN 121434158 BCN121434158 BCN 121434158BCN-121434158-B

Abstract

The invention relates to the technical field of document query assistance, in particular to a structured XML standard document intelligent query assistance system based on a large model, which comprises a data receiving module, a document comparison module, a data dividing module, a data analysis module, a text generation module, a data statistics module and a data processing module, wherein whether the processing of each document is qualified or not is determined based on click probability, when the processing of each document is determined to be abnormal, a preset reference number is regulated to a corresponding value, whether the operation of the document generation module is qualified or not is determined based on the average pushing number of pushing documents which are output by the text generation module in an average single time, and when the operation of the document generation module is determined to be abnormal, the preset category number used for dividing the relevance of single batch of documents is regulated to the corresponding value. And the feedback of the user is analyzed, the system is optimized according to the use behavior of the user, and the query efficiency of the document is improved.

Inventors

ZHANG YONG
WANG YIYI
YU GANG
LI JUAN
ZHANG YANG

Assignees

中国标准化研究院

Dates

Publication Date: 20260505
Application Date: 20251030

Claims (6)

1. A structured XML standard document intelligent query assistance system based on a large model, comprising: the document comparison module is used for classifying the obtained documents based on the comparison result of the obtained documents and obtaining the category number; the data dividing module is connected with the document comparison module and used for determining the relevance of the single batch of documents based on the category number of the received single batch of documents; the data analysis module is respectively connected with the data dividing module and the document comparison module, and is used for generating index labels for the documents and adjusting preset reference quantity for classifying the documents when the correlation of the single batch of documents is determined to be strong correlation; a text generation module for generating a plurality of push documents arranged in descending order of relevance based on search data input by a user; The data processing module is respectively connected with the document comparison module and the data dividing module and is used for determining whether the processing of each document is qualified based on the counted click probability of the first push document of the first arrangement of the user clicks, adjusting the preset reference quantity to a corresponding value when determining that the processing of each document is abnormal, determining whether the operation of the text generating module is qualified based on the average push quantity of the push documents which are output by the text generating module for one time, and adjusting the preset category quantity used for dividing the relevance of the single batch of documents to the corresponding value when determining that the operation of the text generating module is abnormal; The document comparison module is used for classifying the documents based on the obtained comparison result of the documents and obtaining the category number, and comprises the following steps: The method comprises the steps of obtaining each high-frequency word of a single document, arranging the high-frequency words in a descending order according to the occurrence frequency of each high-frequency word, and selecting a preset reference number of high-frequency words as characterization words for the single document; the method comprises the steps of comparing characterization words of all documents in a single batch of documents received by a data receiving module; the method comprises the steps of dividing documents with identical characterization words into the same category; The data partitioning module is configured to determine relevance of single batch documents based on the category number of the single batch documents received by the data receiving module, and includes: if the number of categories is smaller than or equal to the number of preset categories, determining that the correlation of the single-batch documents is strong, and adjusting the preset reference number for classifying the documents to a corresponding value based on the number of categories; if the number of categories is greater than the number of preset categories, determining that the correlation of the single-batch documents is weak correlation, and controlling the data comparison module to continuously operate by using the current operation parameters; the data analysis module is used for generating index labels for all documents based on the relevance of single-batch documents, and comprises the following steps: if the correlation of the single batch of documents is determined to be strong, adjusting the preset reference number for classifying the documents to a corresponding value based on the number of the single batch of documents; If the correlation of the single batch of documents is determined to be weak, determining each characterization word corresponding to the documents as an index tag of the corresponding document; the data analysis module is used for adjusting the preset reference quantity used for classifying the documents to a corresponding value based on the quantity of the single batch of documents, wherein, The increasing amplitude of the preset reference number is positively correlated with the number of the single batch of documents; The data analysis module is used for controlling the document comparison module to redetermine the characterization words of each document under the condition of completing the adjustment of the preset reference quantity; The data analysis module is used for determining each newly determined characterization word corresponding to the document as an index tag of the corresponding document.
2. The large model based structured XML standard document intelligent query assistance system of claim 1, wherein the text generation module is configured to generate a proprietary search text based on search data entered by a user and perform a search to generate a plurality of push documents with descending relevance representation values, comprising: the method comprises the steps of matching keywords of a proprietary search text with index labels of documents in a database, and determining products of the number of the precisely matched keywords and preset precision coefficients as first association values of the corresponding search labels; The method comprises the steps of carrying out semantic analysis on keywords of a proprietary search text and index labels and carrying out fuzzy matching so as to determine the product of the number of the fuzzy matched keywords and a preset fuzzy coefficient as a second association value of the corresponding search label; the relevance characterization value is used for determining the sum of the first relevance value and the second relevance value as the relevance characterization value of the corresponding retrieval tag; The method comprises the steps of comparing search data with index labels, and determining documents corresponding to the index labels with selected relevance characterization values larger than preset relevance comparison values as pushed documents.
3. The intelligent query assistance system of structured XML standard documents based on a large model according to claim 2, wherein the data processing module for determining whether the processing for each document is acceptable based on the click probability on the condition that the operation duration of the data receiving module reaches an integer multiple of the preset operation duration, comprises: If the click probability is smaller than or equal to the preset click probability, determining that the processing of each document is abnormal, adjusting the preset reference quantity to a corresponding value based on the click probability, and determining whether the operation parameters of the text generation module are qualified or not based on the quantity of the pushed documents output by the text generation module; the magnitude of the increase in the preset reference number is inversely related to the click probability.
4. The large model based structured XML standard document intelligent query assistance system of claim 3, wherein if the click probability is greater than the preset click probability, determining that the processing for each document is acceptable, and controlling to continuously process each document using the current parameters.
5. The large model based structured XML standard document intelligent query assistance system of claim 4, wherein the data processing module to determine whether the operating parameters of the text generation module are acceptable based on the average push quantity comprises: if the average pushing quantity is larger than the preset pushing quantity, determining that the operation of the text generation module is abnormal, and adjusting the preset category quantity used for dividing the relevance of the single batch of documents to a corresponding value based on the average pushing quantity; The increasing amplitude of the preset category number is positively correlated with the average pushing number.
6. The intelligent query assistance system of structured XML standard documents based on a large model according to claim 5, wherein said data processing module continuously monitors the click probability under the condition of completing the adjustment for the number of preset categories, and when the operation duration of the data receiving module reaches again an integer multiple of the preset operation duration; The data processing module determining whether to modify an operating parameter for the text generation module based on the historical modification parameters, comprising: Drawing a probability time domain curve based on click probability in each preset operation time period in the acquired historical data, and determining the calculated slope of the curve at the current time node as a historical change parameter; If the historical change parameter is larger than the preset change parameter, judging that the operation parameter of the text generation module is qualified; if the history changing parameter is smaller than or equal to the preset changing parameter, the preset precision coefficient is adjusted to a corresponding value based on the history changing parameter; the increasing amplitude of the preset precision coefficient is inversely related to the history changing parameter.

Description

Structured XML standard document intelligent query auxiliary system based on large model Technical Field The invention relates to the technical field of document query assistance, in particular to a structured XML standard document intelligent query assistance system based on a large model. Background In large enterprises, departments may generate a large number of structured XML standard documents, such as technical specifications, business process descriptions, project reports, and the like. A single part can store a large number of files related to a single event in batches, and the situation that the number of pushed related files is large only by inquiring the files with strong relevance through part of keywords of the file names is difficult to screen files in a targeted manner, so that a user is difficult to quickly and accurately find a required file. Chinese patent publication No. CN105027115A discloses a query and index for documents, including generating a document index from a collection of documents and using it to identify documents that match one or more queries. A tree is generated for each document having nodes corresponding to each object of the document. Nodes of the generated tree are merged or combined to generate a document index, which is itself a tree. In addition, for each node of the index, an inverted index is generated that identifies one or more trees from which the node originated. Upon receiving a query, the query is first executed against the document index tree. During execution, the correct set operation is applied to the inverted index associated with the node to which the query matches. The resulting set identifies documents that may match the query. The query is then executed on the identified document, and therefore, the above technical solution has the problem that feedback of the user is not considered for analysis, so that the system cannot be optimized according to the use behavior of the user, and the query efficiency of the document is affected. Disclosure of Invention Therefore, the invention provides a structured XML standard document intelligent query auxiliary system based on a large model, which is used for solving the problems that feedback of a user is not considered for analysis in the prior art, so that the system cannot be optimized according to the use behavior of the user, and the query efficiency of the document is affected. In order to achieve the above object, the present invention provides a structured XML standard document intelligent query auxiliary system based on a large model, comprising: the document comparison module is used for classifying the obtained documents based on the comparison result of the obtained documents and obtaining the category number; the data dividing module is connected with the document comparison module and used for determining the relevance of the single batch of documents based on the category number of the received single batch of documents; the data analysis module is respectively connected with the data dividing module and the document comparison module, and is used for generating index labels for the documents and adjusting preset reference quantity for classifying the documents when the correlation of the single batch of documents is determined to be strong correlation; a text generation module for generating a plurality of push documents arranged in descending order of relevance based on search data input by a user; The data processing module is respectively connected with the document comparison module and the data dividing module and is used for determining whether the processing of each document is qualified based on the counted click probability of the first push document of the first arrangement of the user clicks, adjusting the preset reference quantity to a corresponding value when determining that the processing of each document is abnormal, determining whether the operation of the document generating module is qualified based on the average push quantity of the push documents which are output by the text generating module at one time, and adjusting the preset category quantity used for dividing the relevance of the single batch of documents to the corresponding value when determining that the operation of the document generating module is abnormal. Further, the text generation module is configured to generate a dedicated search text based on search data input by a user, and perform search to generate a plurality of push documents with relevance characterization values arranged in descending order, including: the method comprises the steps of matching keywords of a proprietary search text with index labels of documents in a database, and determining products of the number of the precisely matched keywords and preset precision coefficients as first association values of the corresponding search labels; The method comprises the steps of carrying out semantic analysis on keywords of a proprietary search te