CN-122019701-A - Meteorological knowledge question-answering method, system and storage medium

CN122019701ACN 122019701 ACN122019701 ACN 122019701ACN-122019701-A

Abstract

The invention provides a weather knowledge question-answering method and system, wherein the method comprises the following steps of data preprocessing, index construction, initial retrieval, reordering, dynamic fusion weighting and answer generation, wherein in the reordering, candidate contents corresponding to each candidate index in a preliminary candidate index set obtained in the initial retrieval are input into a multi-mode reordering model trained in advance to obtain a relevance score of each candidate content relative to a query problem, the score is normalized and ordered, and in the dynamic fusion weighting, the text modal weight and the image modal weight of a modal layer are calculated based on the normalized reordering score, the weight of the modal layer obtained by fusion calculation, the preset index layer weight and the normalized relevance score are fused, the final comprehensive score of each candidate content is calculated, and a plurality of candidate contents with the front of the comprehensive score are selected according to the score. The weather knowledge question-answering method and system have good accuracy, professionality and usability.

Inventors

ZHANG HAO
BAI WENLU
ZHENG RONGGUI
ZHU JIA
ZHAO ZHIQIANG
HU XIN
CHU CHENG

Assignees

航天天目(重庆)卫星科技有限公司

Dates

Publication Date: 20260512
Application Date: 20251222

Claims (10)

1. The weather knowledge question-answering method is characterized by comprising the following steps of: s1, preprocessing data, namely preprocessing multi-modal knowledge resources in the meteorological field, identifying text contents in the multi-modal knowledge resources, cutting paragraphs, and storing non-text contents into an image form with semantic labels; S2, index construction, namely generating a summary index and a semantic factor index for each piece of preprocessed text content or image data, wherein the summary index represents a core theme abstract of the content, and the semantic factor index is composed of a plurality of phrase units which independently bear clear semantic meanings; S3, initial retrieval, namely judging the type of the query questions input by the user, selecting different retrieval strategies based on different query question types, and retrieving the summary index and/or the semantic factor index to obtain a preliminary candidate index set; s4, reordering, namely inputting candidate contents corresponding to each candidate index in the preliminary candidate index set into a pre-trained multi-mode reordering model, obtaining a correlation score of each candidate content relative to a query problem, and carrying out normalization processing and ordering on the scores; S5, dynamically fusing and weighting, namely calculating text modal weight and image modal weight of a modal layer based on the reordered score after normalization, fusing the calculated weight of the modal layer, the preset index layer weight and the normalized relevance score, calculating the final comprehensive score of each candidate content, and screening out a plurality of candidate contents with the comprehensive scores being in front according to the scores; s6, generating answers, namely taking the screened multiple candidate contents as contexts, inputting the contexts and the user query questions into a visual language model, and generating final answers.
2. The method according to claim 1, wherein in the step S2, the method for generating the summary index and the semantic factor index is: inputting a preset text summary template and a text factor template into a large language model aiming at text content, and extracting abstract and semantic factor arrays in a JSON format; and inputting a preset image summary template and an image factor template into the visual language model aiming at the image content, and extracting the abstract and the semantic factor array of the image description in a JSON format.
3. The method according to claim 2, wherein the step S2 further comprises vectorizing the generated summary index and the semantic factor index, respectively, and storing the summary index and the semantic factor index in a vector database.
4. The method according to claim 1, wherein in the step S3, the searching of the summary index and/or the semantic factor index by selecting different searching strategies based on different query question types includes: Searching a summary index aiming at a conceptual type query problem; Aiming at the mechanism type query problem, searching the semantic factor index; Aiming at the comprehensive query problem, the summary index and the semantic factor index are searched at the same time, and the results are fused through a dynamic weighting strategy.
5. The method according to claim 1, wherein in the step S4, the score of the candidate content is normalized by using a min-max normalization method.
6. The method according to claim 1, wherein in the step S5, the method for calculating the dynamic weight of the modal layer is as follows: Let the sum of the normalized reordering scores of the text candidate set D_text be M_text and the sum of the normalized reordering scores of the image candidate set D_img be M_img, then the text mode weight α=M_text/(M_text+M_img), and the image mode weight β=M_img/(M_text+M_img).
7. The method according to claim 6, wherein in the step S5, the summary index weight w_sum and the semantic factor index weight w_fac in the index layer are set by: in the conceptual type query, the summary index weight w_sum is larger than the semantic factor index weight w_fac; In the mechanism type query, the summary index weight w_fac is larger than the semantic factor index weight w_sum; in the integrated query, the summary index weight w_sum is equal to the semantic factor index weight w_fac.
8. The method according to claim 7, wherein in the step S5, the method for calculating the final Score (d) of the candidate content d is as follows: First, index scores of candidate content d are calculated, score_index (d) =w_sum×score (i_sum) +w_fac×score (i_fac), wherein Score (i_sum) and Score (i_fac) are similarity scores in initial search of their summary index and semantic factor index, respectively; then, a final composite Score (d) of the candidate content d is calculated in combination with the modal weight, wherein Score (d) =α×score_index (d) if the candidate content d is text, and Score (d) =β×score_index (d) if the candidate d is an image.
9. A weather knowledge question-answering system, comprising: the data preprocessing module is used for preprocessing the multi-modal knowledge resources in the meteorological field, identifying text contents in the multi-modal knowledge resources, cutting paragraphs, and storing non-text contents in an image form with semantic labels; The index construction module is used for generating a summary index and a semantic factor index for each piece of preprocessed text content or image data, wherein the summary index represents a core theme abstract of the content, and the semantic factor index is composed of a plurality of phrase units which independently bear clear semantic meanings; The initial retrieval module is used for judging the type of the query questions input by the user, selecting different retrieval strategies based on different query question types and retrieving the summary index and/or the semantic factor index to obtain a preliminary candidate index set; The reordering module is used for inputting the candidate content corresponding to each candidate index in the preliminary candidate index set into a pre-trained multi-mode reordering model, obtaining the relevance score of each candidate content relative to the query problem, and carrying out normalization processing and ordering on the scores; the dynamic fusion weighting module is used for calculating the text modal weight and the image modal weight of the modal layer based on the normalized reordering score, fusing the calculated weight of the modal layer, the preset index layer weight and the normalized correlation score, calculating the final comprehensive score of each candidate content, and screening a plurality of candidate contents with the comprehensive scores being in front according to the scores; And the answer generation module is used for taking the screened multiple candidate contents as contexts, inputting the contexts and the user query questions into the visual language model together, and generating a final answer.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the weather knowledge question-answering method according to any one of claims 1 to 8.

Description

Meteorological knowledge question-answering method, system and storage medium Technical Field The invention relates to the field of weather and machine learning, in particular to a weather knowledge question-answering method, a system and a storage medium based on multi-mode hierarchical index and dynamic fusion weighting. Background With the development of meteorological science, the meteorological field generates and accumulates a large amount of heterogeneous knowledge resources, and the resources comprise standardized text materials (such as observation reports, technical manuals, scientific research papers and policy specifications) and a large amount of graphic and text mixed data (such as statistical charts, section schematic diagrams, manuscripts, formula tables and the like). At present, most weather question-answering systems mainly adopt text retrieval or adopt simple OCR linearization processing to the images for storage, which leads to flattening of the space structure, the notes and the mathematical expression of the chart, and frequently omission or misreading of key values, coordinate relations and graph semantics, thereby influencing the professionality of retrieval and answering. Another common problem is that the granularity of the index and the retrieval strategy are single, the text is usually indexed in paragraphs or full text units, and it is difficult to consider different requirements of conceptual queries (subject needs to be quickly located), mechanism queries (cause and parameters need to be precisely matched), and comprehensive queries (both). In the aspect of modal fusion, the image-text vector representation and the grading scale are different, the direct combination often causes weight inclination or information loss, and the static weighting strategy cannot be dynamically adjusted according to query intention and candidate distribution, so that simple problems are slow in response or key evidence cannot be retrieved from complex problems. Disclosure of Invention The invention aims to provide a meteorological knowledge question-answering method and system which can not only keep the semantics of the internal structure of an image, but also adaptively select the multi-mode retrieval granularity and the modal weight according to the type of a problem and dynamically fuse the multi-mode retrieval granularity and the modal weight, so as to practically improve the comprehensive performance of the meteorological knowledge question-answering system in the aspects of accuracy, specialty and usability. The embodiment of the invention provides a weather knowledge question-answering method, which comprises the following steps of: s1, preprocessing data, namely preprocessing multi-modal knowledge resources in the meteorological field, identifying text contents in the multi-modal knowledge resources, cutting paragraphs, and storing non-text contents into an image form with semantic labels; S2, index construction, namely generating a summary index and a semantic factor index for each piece of preprocessed text content or image data, wherein the summary index represents a core theme abstract of the content, and the semantic factor index is composed of a plurality of phrase units which independently bear clear semantic meanings; S3, initial retrieval, namely judging the type of the query questions input by the user, selecting different retrieval strategies based on different query question types, and retrieving the summary index and/or the semantic factor index to obtain a preliminary candidate index set; s4, reordering, namely inputting candidate contents corresponding to each candidate index in the preliminary candidate index set into a pre-trained multi-mode reordering model, obtaining a correlation score of each candidate content relative to a query problem, and carrying out normalization processing and ordering on the scores; S5, dynamically fusing and weighting, namely calculating text modal weight and image modal weight of a modal layer based on the reordered score after normalization, fusing the calculated weight of the modal layer, the preset index layer weight and the normalized relevance score, calculating the final comprehensive score of each candidate content, and screening out a plurality of candidate contents with the comprehensive scores being in front according to the scores; s6, generating answers, namely taking the screened multiple candidate contents as contexts, inputting the contexts and the user query questions into a visual language model, and generating final answers. In the embodiment of the present invention, in the step, the method for generating the summary index and the semantic factor index is as follows: inputting a preset text summary template and a text factor template into the large language model aiming at the text content, and extracting a summary and a semantic factor array in a format; And inputting a preset image summary template and an image factor te