CN-122019698-A - Automatic government procurement questioning function replying method based on text characteristics
Abstract
The invention provides an automatic replying method of government procurement questioning function based on text characteristics, which comprises the following steps of S1, constructing a question item text-question answer pair, preprocessing the questioning function based on accumulated questioning cases, constructing a keyword dictionary, extracting subject words, mapping the questioning item to a subject subset through TF-IDF and clustering, finally generating ordered question answer pairs, S2, automatically replying the questioning function based on a general big model, namely, decomposing the questioning item Q into a plurality of questioning items Q one by one through new questioning item ORC text processing according to the keyword, replying the new questioning item through an LDA model subject word subset by matching the corresponding subject word subset, and automatically generating a replying text. The invention converts the historical data into the structural knowledge which can be understood and utilized by the machine, and accurately guides the large model to generate professional reply, thereby improving the efficiency and quality of professional work of processing the question function.
Inventors
- TIAN SHENGLI
- OuYang Jinjie
- Liao Yaqiao
- ZHANG SHIBO
- SUN QIN
- WANG SHINA
- ZHU HUAN
- ZHAO JINGYING
- WU YUGE
- LUO HUIJUAN
- WANG HONG
Assignees
- 广州交易集团有限公司
- 广州市政府采购中心
Dates
- Publication Date
- 20260512
- Application Date
- 20251211
Claims (6)
- 1. The automatic reply method for the government procurement questioning function based on the text characteristics is characterized by comprising the following steps: S1, constructing a text-question-answer pair of a question item, namely preprocessing the question function based on accumulated question cases, constructing a keyword dictionary, extracting subject terms, mapping the question item to a subject subset through TF-IDF and clustering, and finally generating ordered question-answer pairs; S2, automatically replying the questioning function based on the general big model, namely firstly, carrying out text processing on the new questioning function ORC, decomposing the new questioning function ORC into a plurality of questioning matters Q according to keywords, then, carrying out the reply on the questioning matters Q one by one, carrying out the main words of the LDA model, matching corresponding main word subsets, finally, selecting 3-5 groups of text-question-answer pairs which are sequenced in the category of the corresponding main word subsets as information sources for the retrieval enhancement of the big model, replying the new questioning matters based on the general big model, and automatically generating reply texts.
- 2. The method for automatically replying to government procurement challenge questions based on text features according to claim 1, wherein the specific process of step S1 is as follows: s11, identifying and classifying text features of suspected matters in suspected letters, preprocessing a large number of existing suspected letters and suspected reply letters to form a suspected reply case set, converting the suspected reply case set into text documents through an OCR technology, screening out high-quality documents after cleaning and Chinese analysis are performed in advance, and marking out major categories of suspected matters Q text items and suspected matters reply A text items; s12, constructing representative keywords from the total number of text keywords in each large-class set, grabbing the constructed keywords, and constructing a keyword dictionary; s13, extracting subject words from the case set text through an LDA model respectively; S14, respectively calculating TF values and IDF values of the keywords and the subject words by adopting a TF-IDF statistical algorithm, and marking the questioning text to which the keywords belong into the corresponding subject words according to the calculation result to form a subject word subset; S15, adopting a repeated binary clustering algorithm to perform clustering calculation on the questioning matters Q marked with the text features in the step S11, and establishing a corresponding mapping relation with each item of the questioning matters Q and the document of the subject word subset according to a calculation result; s16, extracting reply text of the corresponding question items for the question items Q included in the divided question word subsets, numbering and sorting the reply text into text-question-answer pairs of the question items Q-question replies A, and numbering and marking basic information for the question-answer pairs.
- 3. The automatic reply method of government procurement suspicious content based on text features of claim 2 characterized by the specific steps of preprocessing the existing large number of suspicious content and suspicious reply content to form the set of suspicious reply cases in step S11 comprising: S11, cleaning, marking and arranging the questioning matters of a large number of existing government purchasing questioning function cases respectively, setting a keyword set by an LDA method, extracting the characteristics of each questioning matter in the questioning function by a subject term, classifying the questioning matters, and dividing and calculating the questioning matters with different fine categories; s12, evaluating the relation between each word in the questioning item by adopting a TF-IDF method, calculating TF-IDF values among similar questioning item texts, calculating a clustering center representing the questioning item of the specific category according to the TF-IDF values, and distributing single questioning item to clusters with high similarity; S13, carrying out cluster analysis on the questioning items Q marked with the text features by adopting a repeated bipartite clustering algorithm, and dividing the questioning items into a plurality of subsets according to analysis results.
- 4. The automatic reply method of government procurement questioning function based on text features according to claim 2, wherein in step S12, the major classes are pre-classified, and the high-frequency words with several similar meanings/expressions are classified into a representative keyword from the major classes, and the questioning items are pre-classified into the major classes as follows:
- 5. The automatic reply method for government procurement suspicious content based on text features of claim 2 characterized by, in the step S16, after numbering and labeling basic information on question-answer pairs, ranking a plurality of question-answer pairs in the same subset, wherein the higher the cluster relevance calculation value, the higher the ranking is.
- 6. The automatic reply method for government procurement challenge function based on text features of claim 2 characterized by the fact that in step S16, the marked basic information includes source item number, time, whether or not it is true.
Description
Automatic government procurement questioning function replying method based on text characteristics Technical Field The invention relates to the technical field of text reply generation, in particular to a government purchase questioning function automatic reply method based on text characteristics. Background The text generation reply system is an artificial intelligence-based technology capable of generating reply contents corresponding to comment text descriptions by processing and understanding natural language comment text. Such systems combine Natural Language Processing (NLP) and Computer Vision (CV) techniques to simulate human creativity and enable automatic generation of replies. At present, the interpretation and the reply of the questioning letter are manually completed in government purchasing, and the problems of unclear disassembly and missing items of the questioning matters, limitation of personal knowledge level and experience, large subjective difference of reply quality, non-uniform reply scale of the high-frequency questioning matters and the like exist in the processing process. Through searching, in the prior art, no report is given on a related automatic replying method specific to the government purchasing questioning function, so that an automatic replying method specific to the text characteristics of the government purchasing questioning function needs to be developed. Disclosure of Invention Aiming at the defects of the prior art, the invention provides an automatic reply method for government procurement suspicious content based on text characteristics, which converts historical data into structural knowledge which can be understood and utilized by a machine, and accurately guides a large model to generate professional reply so as to improve the efficiency and quality of professional work of processing the suspicious content. In order to achieve the technical scheme, the invention provides an automatic reply method for a government procurement questioning function based on text characteristics, which specifically comprises the following steps: S1, constructing a text-question-answer pair of a question item, namely preprocessing the question function based on accumulated question cases, constructing a keyword dictionary, extracting subject terms, mapping the question item to a subject subset through TF-IDF and clustering, and finally generating ordered question-answer pairs; S2, automatically replying the questioning function based on the general big model, namely firstly, carrying out text processing on the new questioning function ORC, decomposing the new questioning function ORC into a plurality of questioning matters Q according to keywords, then, carrying out the reply on the questioning matters Q one by one, carrying out the main words of the LDA model, matching corresponding main word subsets, finally, selecting 3-5 groups of text-question-answer pairs which are sequenced in the category of the corresponding main word subsets as information sources for the retrieval enhancement of the big model, replying the new questioning matters based on the general big model, and automatically generating reply texts. Preferably, the specific process of step S1 is as follows: s11, identifying and classifying text features of suspected matters in suspected letters, preprocessing a large number of existing suspected letters and suspected reply letters to form a suspected reply case set, converting the suspected reply case set into text documents through an OCR technology, screening out high-quality documents after cleaning and Chinese analysis are performed in advance, and marking out major categories of suspected matters Q text items and suspected matters reply A text items; s12, constructing representative keywords from the total number of text keywords in each large-class set, grabbing the constructed keywords, and constructing a keyword dictionary; s13, extracting subject words from the case set text through an LDA model respectively; S14, respectively calculating TF values and IDF values of the keywords and the subject words by adopting a TF-IDF statistical algorithm, and marking the questioning text to which the keywords belong into the corresponding subject words according to the calculation result to form a subject word subset; S15, adopting a repeated binary clustering algorithm to perform clustering calculation on the questioning matters Q marked with the text features in the step S11, and establishing a corresponding mapping relation with each item of the questioning matters Q and the document of the subject word subset according to a calculation result; s16, extracting reply text of the corresponding question items for the question items Q included in the divided question word subsets, numbering and sorting the reply text into text-question-answer pairs of the question items Q-question replies A, and numbering and marking basic information for the question-answer pairs. Prefe