CN-122019683-A - Patent search type generation method based on general large model

CN122019683ACN 122019683 ACN122019683 ACN 122019683ACN-122019683-A

Abstract

The invention relates to the technical field of data processing and discloses a patent retrieval type generation method based on a general large model, which comprises the steps of acquiring enterprise data, configuring each retrieval channel and the name of an enterprise internal mechanism as an expansion word list and a technical field expansion word list; the method comprises the steps of calling a first model to extract natural language search demand data to obtain initial keywords, expanding selected query channels and the initial keywords to obtain target channel lexicon expansion words and target custom lexicon expansion words, calling a second model to generate a final keyword set, calling a query interface to obtain search rules of corresponding query channels based on the selected query channels, calling a third model to generate patent search results, and executing the patent search results. By implementing the method and the device, the problem that the related technology cannot meet the comprehensive requirements of low threshold operation, high-precision matching, customized requirement adaptation and dynamic word stock updating in patent retrieval is solved.

Inventors

YU JING
WEI LEI
LIU YI
LI QIYU

Assignees

中化数智科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260209

Claims (10)

1. A patent search type generation method based on a general large model is characterized by comprising the following steps: Acquiring enterprise data, configuring a channel information table and a search rule table of each search channel based on the enterprise data, and naming an expanded word table and a technical field expanded word table by an enterprise internal organization; receiving natural language retrieval demand data input by a user and a selected query channel; Calling a first model to extract the natural language retrieval demand data to obtain initial keywords; Expanding the selected query channel and the initial keyword to obtain target channel thesaurus expansion words and target custom thesaurus expansion words; Invoking a second model to generate a final keyword set based on the natural language retrieval demand data, the initial keyword target channel thesaurus expansion words and the target custom thesaurus expansion words; Calling a query interface to acquire a retrieval rule of a corresponding query channel based on the selected query channel, and calling a third model to combine the natural language retrieval demand data, the final keyword set and the retrieval rule to generate a patent retrieval type; And executing the patent search formula and formatting the search result, and returning the patent search formula and the formatted search result to the user.
2. The method of claim 1, wherein the channel information table includes channel names, channel codes, channel types, calling modes, request addresses, verification information and calling parameters, the channel types are divided into advanced search types and word stock vocabulary types, the calling modes include a hypertext transfer protocol calling mode and a WebSocket protocol calling mode, and the search rule table includes search grammar specifications, field matching rules and logic operator use rules of each search channel.
3. The method of claim 2, wherein the first model is a generic large model optimized via patent domain corpus pre-training.
4. The method of claim 3, wherein expanding the selected query channel and the initial keyword to obtain a target channel thesaurus expansion word and a target custom thesaurus expansion word comprises: Inquiring the channel information table by calling the inquiring interface through a code node, judging whether the inquiring channel is provided with a thesaurus vocabulary inquiring interface, if the inquiring channel is provided with the thesaurus vocabulary inquiring interface, calling the thesaurus vocabulary inquiring interface and inputting the initial keyword, acquiring a mechanism expansion word and a technical field expansion word corresponding to the initial keyword in the channel thesaurus, and taking the mechanism expansion word and the technical field expansion word as target channel thesaurus expansion words; invoking the query interface through a code node to query whether a custom thesaurus query interface is configured in an enterprise, if the custom thesaurus query interface is configured in the enterprise, invoking the custom thesaurus query interface and inputting the initial keyword, acquiring a mechanism name expansion word corresponding to the initial keyword from the mechanism name expansion word list in the enterprise, acquiring a technical field expansion word corresponding to the initial keyword from the technical field expansion word list in the enterprise, and taking the mechanism name expansion word and the technical field expansion word as the target custom thesaurus expansion word; and performing duplication elimination processing on the target channel thesaurus expansion words and the target custom thesaurus expansion words to obtain the duplicated target channel thesaurus expansion words and the target custom thesaurus expansion words.
5. The method of claim 4, wherein the second model is a large-parameter general large model with a parameter scale of 100 hundred million, a semantic similarity calculation module and a mapping term context understanding module are built in the second model, the semantic similarity calculation module calculates association degrees of the target channel thesaurus expansion word and the target custom thesaurus expansion word with the initial keyword respectively by adopting a cosine similarity algorithm, and the mapping term context understanding module judges applicability of the target channel thesaurus expansion word and the target custom thesaurus expansion word in a mapping file retrieval scene by combining scene description in the natural language retrieval requirement data.
6. The method of claim 5, wherein the third model is a large-parameter general purpose large model with a parameter scale of 200 hundred million, the third model adapts the retrievable grammar of different query channels through a retrieval rule migration learning module, and the retrieval rule migration learning module converts the retrievable grammar specification, field matching rules and logic operator usage rules of each query channel into a vector representation.
7. The method of claim 6, wherein the first model, the second model, and the third model share an enterprise data knowledge-graph.
8. The method of claim 7, wherein the second model supports user-defined expanded word screening rules that include technical field priority, organization name relevance, and term timeliness.
9. The method of claim 8, wherein the calling a query interface based on the selected query channel obtains a search rule for a corresponding query channel, calling a third model to combine the natural language search requirement data, the final keyword set and the search rule to generate a patent search formula, and further comprising: The third model performs grammar check and logic check through a search type validity verification module, the grammar check matches search type grammar specifications of a corresponding query channel, and the logic check detects nesting rationality of logic operators in the search type; When the verification is passed, outputting the patent search type, And when the verification fails, readjusting the keyword combination and the logic relation of the patent search formula until the verification passes.
10. The method according to claim 9, wherein the method further comprises: And controlling the first model, the second model and the third model to output a processing log, wherein the processing log comprises model calling time, input parameters, intermediate results and output results, and the processing log is stored in association with the generated patent search formula and search results.

Description

Patent search type generation method based on general large model Technical Field The invention relates to the technical field of data processing, in particular to a patent retrieval type generation method based on a general large model. Background In intellectual property management and technological innovation, patent retrieval is a core link for acquiring technical information, avoiding infringement risk and excavating research and development directions, and along with explosive growth of patent data volume and subdivision deepening of technical field, the traditional patent retrieval method has difficulty in meeting high-efficiency and accurate retrieval requirements. The current patent retrieval field mainly has two main current implementation modes, namely a traditional selection condition retrieval method, and the method is further divided into simple retrieval and advanced retrieval. The simple search only provides a search entry of basic fields such as patent application number, applicant, invention name and the like, a user initiates the search through single or a small number of condition combinations, the operation is simple and convenient, but the search dimension is limited, the multi-dimensional complex search requirement in the survey and drawing file patent cannot be covered, the advanced search supports multi-field combined search and search writing, the complex search condition can be constructed through a logic operator, the search accuracy can be improved theoretically, the operation threshold is extremely high, the ordinary technician can be proficiently used only through system training, and the search efficiency and popularity are greatly limited. The other category is a retrieval method based on artificial intelligence semantic recognition, which carries out semantic coding on patent text through a machine learning model, calculates the similarity between the user query content and the patent text, and then returns a correlation result. The method solves the limitation of keyword face matching in the traditional retrieval, can identify partial synonyms or near-synonyms, but has obvious defects in a patent retrieval scene, and can not be combined with the internal business requirement of an enterprise to customize retrieval logic, for example, when the enterprise needs to retrieve a topographic mapping archive patent related to an internal research and development project A, the model can not be related to the internal enterprise organization name, for example, the research and development center A corresponds to a laboratory and the technical field classification, for example, the technical direction of the project A corresponds to a remote sensing data processing lower technology, so that the retrieval result is disjointed with the actual requirement. In addition, no matter the traditional search or the existing semantic search, the problems of word stock update lag and search flow splitting exist. Meanwhile, each link in the search flow is mutually independent, and the user needs to manually connect, for example, the user needs to convert natural language requirements into keywords first, then write the search according to channel rules, finally manually execute the search and sort out the result, the operation is complicated, and the search condition omission is easy to be caused by human errors, thereby further reducing the search efficiency and accuracy. In summary, the related art cannot meet the comprehensive requirements of low threshold operation, high-precision matching, customized requirement adaptation and dynamic word stock updating in patent retrieval. Disclosure of Invention In view of the above, the invention provides a general large model-based patent retrieval type generation method to solve the problem that the related technology cannot meet the comprehensive requirements of low threshold operation, high-precision matching, customized requirement adaptation and dynamic word stock updating in patent retrieval. The invention provides a patent retrieval type generation method based on a general large model, which comprises the steps of obtaining enterprise data, configuring a channel information table and a retrieval rule table of each retrieval channel based on the enterprise data, calling an expansion word table and a technical field expansion word table by an enterprise internal organization, receiving natural language retrieval demand data input by a user and a selected query channel, calling a first model to extract the natural language retrieval demand data to obtain an initial keyword, expanding the selected query channel and the initial keyword to obtain a target channel word bank expansion word and a target custom word bank expansion word, calling a second model to generate a final keyword set based on the natural language retrieval demand data, the initial keyword target word bank expansion word and the target custom word bank expansion word, calling a ret