CN-122019850-A - Patent recommendation method and system based on multi-mode heterogeneous graph attribute aggregation

CN122019850ACN 122019850 ACN122019850 ACN 122019850ACN-122019850-A

Abstract

The invention discloses a patent recommendation method and a system based on multi-modal heterogeneous graph attribute aggregation, which relate to the technical field of patent information service and comprise the steps of acquiring multi-modal data of a patent and carrying out feature fusion to generate a multi-modal characterization vector; the method comprises the steps of constructing a heterogeneous graph comprising multiple types of nodes, taking multi-modal characterization as patent node characteristics, calling a large language model to infer and generate an implicit technology association edge with weight to enhance the graph, dynamically determining an aggregation order and a range by utilizing the Fisher information matrix quantization precision requirement according to user recommendation requirements, performing hierarchical attribute aggregation on target nodes in the heterogeneous graph to generate a patent characterization suitable for requirements, constructing a user preference characterization based on user historical interaction behaviors, and generating a personalized patent recommendation list through matching. The method realizes deep semantic association mining, dynamic recommendation with self-adaptive requirements and accurate personalized matching of the patent, and improves the accuracy, flexibility and interpretability of patent recommendation.

Inventors

JIANG JIANJIAN
DAI QINGYUN
LAI PEIYUAN

Assignees

广东工业大学
广东技术师范大学

Dates

Publication Date: 20260512
Application Date: 20251230

Claims (8)

1. A patent recommendation method based on multi-mode heterogeneous graph attribute aggregation is characterized by comprising the following steps: Acquiring multi-modal data of patents from a patent database, and performing feature extraction and fusion on the multi-modal data to generate multi-modal characterization vectors of each patent; constructing a heterogeneous graph comprising patent nodes, technical keyword nodes, applicant nodes, inventor nodes and classification nodes, taking the multi-modal characterization vector as an initial characteristic of the patent nodes, calling a large language model to infer technical semantic relations among the patent nodes, generating an implicit technical association edge comprising relation types and confidence weights, and enhancing the heterogeneous graph; According to the recommendation requirement of a user, quantifying the precision requirement of the recommendation requirement by using a Fisher information matrix, dynamically determining the order and the range of attribute aggregation according to the precision requirement, and executing hierarchical attribute aggregation operation on a target patent node in the heterogeneous graph to generate a patent characterization vector adapting to the requirement; And constructing a user preference token based on the user history interaction behavior, matching the patent token vector with the user preference token, and generating and outputting a personalized patent recommendation list.
2. The patent recommendation method based on multi-modal heterogeneous graph attribute aggregation according to claim 1, wherein the multi-modal data of patents are obtained from a patent database, the multi-modal data are subjected to feature extraction and fusion, and a multi-modal characterization vector of each patent is generated, and the method comprises the following steps: The text data is segmented, segmented and identified in technical section, the image data is subjected to key region detection and structure information extraction, and the structured data is subjected to entity alignment and normalization treatment; Using a patent corpus pre-training language model to encode headlines and abstracts in the preprocessed text data to obtain technical topic vectors, and using segmentation attention encoding and chapter-level feature aggregation to obtain long text structure vectors for the whole text of the description, and outputting text characterization vectors based on the technical topic vectors and the long text structure vectors; Extracting global and local visual features from technical drawings in the preprocessed image data by using a visual transducer, fusing visual and topological features of the structured image by using a graph neural network, obtaining an image semantic vector, and outputting an image characterization vector based on the global and local visual features and the image semantic vector; Embedding the applicant and the inventor in the preprocessed structured data by adopting a heterogeneous graph node to obtain an entity vector, obtaining a classification level vector by adopting a hierarchical coding network for the IPC classification number, and outputting a structured characterization vector based on the entity vector and the classification level vector; Through semantic association of the cross attention calculation text token vector and the image token vector, a multi-head attention network among modes is constructed, complementary weights among the text token vector, the image token vector and the structural token vector are dynamically learned, a fused comprehensive feature vector is generated, and the comprehensive feature vector is mapped into a multi-mode token vector through a patent semantic space projection layer.
3. The patent recommendation method based on multi-modal heterogeneous graph attribute aggregation according to claim 1, wherein constructing a heterogeneous graph including patent nodes, technical keyword nodes, applicant nodes, inventor nodes and classification nodes, and taking the multi-modal characterization vector as an initial feature of the patent nodes comprises: defining heterogeneous graph node types comprising patent nodes, technical keyword nodes, applicant nodes, inventor nodes and classification nodes, and taking the multi-modal characterization vector as an initial characteristic of the patent nodes; Based on patent metadata, establishing a predefined explicit relation edge comprising a content attribution relation, a rights and creation relation, a legal and family relation and a technical diffusion relation between nodes, and giving a weight based on metadata statistics to the explicit relation edge; And dynamically inserting new patent nodes and associated nodes into the heterograms by adopting a dynamic increment construction strategy, calculating similarity by utilizing multi-mode characterization vectors of the patent nodes, providing similarity priori for implicit semantic relation reasoning based on a large language model, and taking the structure of the heterograms as context information of relation reasoning.
4. The patent recommendation method based on multi-modal heterogeneous graph attribute aggregation according to claim 1, wherein invoking a large language model to infer technical semantic relationships between patent nodes, generating an implicit technical association edge containing relationship types and confidence weights, and enhancing the heterogeneous graph comprises: For the patent node degree to be analyzed, obtaining core content abstracts of patents corresponding to two patent nodes and partial graph structure priori information extracted from the heterogeneous graph, and constructing structured context information oriented to relational reasoning; constructing a prompt instruction based on the context information, and performing comparative analysis from four dimensions of technical principle similarity, application scene overlapping degree, technical complementarity and technical evolution relation; based on the result of the comparative analysis, instructing a large language model to determine whether a predefined type of technical association exists, wherein the predefined type comprises a technical substitution relation, a technical complementation relation, a technical evolution relation and a technical migration relation, and outputting a confidence score for each determined technical association; Judging the existing technical relationship by the large language model, constructing an implicit technology association edge with a relationship type label between two corresponding patent nodes, taking the confidence score as the initial weight of the implicit technology association edge, and carrying out semantic enhancement on the heterogeneous graph.
5. The patent recommendation method based on multi-modal heterogeneous graph attribute aggregation according to claim 1, wherein quantifying the accuracy requirement of the recommendation requirement by using a fischer information matrix according to the recommendation requirement of the user comprises: acquiring a recommended demand of a user, carrying out semantic analysis on the recommended demand, extracting characteristics of preset semantic dimensions, wherein the preset semantic dimensions comprise technical degree of focus, range openness and field span tolerance, and generating a demand characterization vector of the recommended demand based on an analysis result; Modeling a recommendation process into an information processing model, wherein model parameters of the information processing model comprise an order of attribute aggregation, a distance attenuation coefficient and a meta-path selection weight, different recommendation demands correspond to expected optimal model parameter combinations, and the output of the information processing model is a conditional probability distribution of a recommendation result; And extracting a scalar measure value from the Fisher information matrix as a quantized recommendation precision requirement of the recommendation requirement.
6. The patent recommendation method based on multi-modal heterogeneous graph attribute aggregation according to claim 1, wherein dynamically determining an order and a range of attribute aggregation according to the accuracy requirement, performing hierarchical attribute aggregation operation on a target patent node in the heterogeneous graph, and generating a patent characterization vector adapted to requirements, includes: Determining a maximum aggregation order, a distance attenuation coefficient, an enabled heterogeneous element path set and a corresponding element path weight vector of attribute aggregation through a predefined decision map based on the quantization recommendation precision requirement; Taking a target patent node as a starting point, performing traversal based on a meta-path in the heterogeneous graph according to the maximum aggregation order and the enabled heterogeneous meta-path set, and sampling to obtain a heterogeneous multi-hop neighbor node set of the target node; For the neighbor nodes, calculating an aggregation weight according to the hop count distance with the target node, the type of the meta-path and the semantic attention score among the nodes, wherein the aggregation weight calculates a fusion distance attenuation factor and a meta-path weight vector, and the aggregation weight Expressed as: , Wherein the method comprises the steps of Representing a target node Is a heterogeneous multi-hop neighbor node set of (c), Representing neighbor nodes, a, Indicating that the weight score is not normalized, A path weight vector representing the meta-path, Representing a distance-based attenuation coefficient Distance decay function of (1) with hop count Increase and decrease at a rate of The control is carried out such that, Representing a non-linear activation function, Input feature vectors respectively representing the target patent node and the neighbor nodes, And carrying out weighted summation on the target patent node and all neighbor nodes according to the aggregation weight, and generating a patent characterization vector which is adaptive to requirements through nonlinear transformation.
7. The patent recommendation method based on multi-modal heterogeneous graph attribute aggregation according to claim 1, wherein constructing a user preference token based on user history interaction behavior, matching the patent token vector with the user preference token, generating and outputting a personalized patent recommendation list, comprises: Collecting interaction behavior data of a user and a patent, wherein the interaction behavior data comprises an explicit feedback behavior, an implicit feedback behavior, a negative and boundary behavior and a query and screening behavior, and setting a corresponding weight value for each behavior type according to the characterization strength and the occurrence time of user preference to construct a user patent interaction sequence with weight; Constructing three layers of user preference characterization based on the user patent interaction sequence, including long-term static preference characterization, short-term dynamic interest characterization and instant conversation intention characterization, and carrying out weighted summation on the three layers of user preference characterization through dynamically generated fusion weights to obtain final user preference characterization; Constructing a double-tower depth matching model, respectively taking the patent characterization vector and the user preference characterization as input, mapping the patent characterization vector and the user preference characterization to a matching space through a nonlinear projection layer, and calculating a situation perception matching score through dynamic matching; and sorting all candidate patents according to the matching scores, generating a personalized patent recommendation list, and generating interpretable natural language recommendation reasons for each recommendation result based on matching score analysis and heterogeneous primitive path analysis.
8. The patent recommendation system based on the multi-mode heterogeneous graph attribute aggregation is characterized by being used for realizing the patent recommendation method based on the multi-mode heterogeneous graph attribute aggregation, which is disclosed in any one of claims 1-7, and comprises a multi-mode data acquisition and preprocessing module, a multi-mode characterization learning and fusion module, a heterogeneous graph construction and dynamic maintenance module, a large language model enhanced relationship reasoning module, a user demand analysis and precision quantification module, a dynamic hierarchical attribute aggregation module and a user preference modeling and matching module; The multi-mode data acquisition and preprocessing module acquires multi-mode information of patents from a multi-source patent database and preprocesses the acquired data; The multi-modal characterization learning and fusion module converts the preprocessed multi-modal patent data into semantic vectors, fuses text, images and structural information, and generates multi-modal characterization vectors; the heterogeneous diagram construction and dynamic maintenance module constructs a heterogeneous diagram comprising patent nodes, technical keyword nodes, applicant nodes, inventor nodes and classification nodes based on patent data, and takes the multi-mode characterization vector as an initial characteristic of the patent nodes; the large language model enhanced relation reasoning module invokes a large language model to reason the technical semantic relation among the patent nodes, generates an implicit technical association edge containing relation types and confidence weights, and enhances the heterogeneous graph; The user demand analysis and precision quantification module quantifies the precision requirement of the recommendation demand by using a Fisher information matrix according to the recommendation demand of the user; The dynamic hierarchical attribute aggregation module dynamically determines the order and range of attribute aggregation according to the precision requirement, performs hierarchical attribute aggregation operation on the target patent nodes in the heterogeneous graph, and generates a patent characterization vector suitable for requirements; the user preference modeling and matching module constructs user preference characterization based on user history interaction behavior, matches the patent characterization vector with the user preference characterization, and generates and outputs a personalized patent recommendation list.

Description

Patent recommendation method and system based on multi-mode heterogeneous graph attribute aggregation Technical Field The invention relates to the technical field of patent information service, in particular to a patent recommendation method and system based on multi-mode heterogeneous graph attribute aggregation. Background The patent recommendation method based on multi-mode heterogeneous graph attribute aggregation is provided in the context of many challenges faced by the current patent information retrieval and recommendation technology. Along with the acceleration of the global technological innovation pace, the number of patents is increased in a explosive manner, and the patent data has the characteristics of multiple modes, complex structure and rich semantics, and comprises various information forms such as texts, images, classification numbers, applicants, quotation networks and the like. However, most of the existing patent recommendation systems still rely on the conventional information retrieval and recommendation technology, and have obvious limitations. The main current patent recommendation scheme mainly comprises a retrieval system based on keyword matching, an analysis method based on a quotation network, recommendation based on a classification system, collaborative filtering technology and the like. The keyword matching method generally adopts algorithms such as TF-IDF, BM25 and the like to calculate text similarity, but is often limited to literal matching, and the technical semantics and innovation connotation behind the patent are difficult to understand deeply. The quoted network analysis constructs a technical association graph through the quoted relation among patents, but the patent quoted relation is generally sparse and has hysteresis, and deep connection among the technologies is difficult to comprehensively reflect. The method based on International Patent Classification (IPC) or joint patent classification (CPC) can organize the technical field at a macroscopic level, but has thicker classification granularity and insufficient support for subdivision technology or cross-field technology fusion scenes. The collaborative filtering method depends on the user-patent interaction history, is easy to be restricted by the problems of data sparseness and cold start, and is difficult to explain the technical basis of the recommendation result. The shortcomings of the prior art are concentrated in the following aspects, namely, firstly, most systems fail to fully fuse and utilize multi-mode information of patents, especially rich technical contents contained in visual data such as technical drawings, chemical structural formulas, circuit diagrams and the like, so that the information utilization is incomplete. Secondly, semantic understanding of the technical content of the patent is shallow, and depth analysis and association mining of technical principles, innovation points and application scenes are lacked. Furthermore, existing methods rely mostly on structured relationships between patents, lacking the ability to automatically discover technical associations at the semantic level. In addition, the flexibility and individuation degree of the recommendation process are limited, and a recommendation result in a fixed range is generally provided, so that the breadth and depth of the recommendation cannot be dynamically adjusted according to the specific requirements of users. Finally, how to effectively combine the historical behavior and the real-time intention of the user to realize truly personalized patent recommendation is still a common difficulty faced by the current system. Disclosure of Invention In order to solve the technical problems, the invention provides a patent recommendation method and a system based on multi-mode heterogeneous graph attribute aggregation, which effectively solve the problems of insufficient utilization of multi-mode information, shallow semantic understanding, stiff recommendation granularity, limited individuation degree and the like of the traditional patent recommendation system, and improve the accuracy, flexibility and interpretability of patent recommendation. The invention provides a patent recommendation method based on multi-mode heterogeneous graph attribute aggregation, which comprises the following steps: Acquiring multi-modal data of patents from a patent database, and performing feature extraction and fusion on the multi-modal data to generate multi-modal characterization vectors of each patent; constructing a heterogeneous graph comprising patent nodes, technical keyword nodes, applicant nodes, inventor nodes and classification nodes, taking the multi-modal characterization vector as an initial characteristic of the patent nodes, calling a large language model to infer technical semantic relations among the patent nodes, generating an implicit technical association edge comprising relation types and confidence weights, and enhancing