CN-122022471-A - Intelligent risk early warning system and method based on multidimensional information fusion

CN122022471ACN 122022471 ACN122022471 ACN 122022471ACN-122022471-A

Abstract

The invention discloses an intelligent risk early warning system and method based on multidimensional information fusion. The intelligent risk early warning system comprises a name and right person similarity calculating module, a trademark commodity service comparing module, an image feature extracting module, a multidimensional feature database and a comprehensive risk calculating module. According to the invention, by organically integrating brand name, right person information, commodity service classification and identification image four-core dimension data, a collaborative strategy of text multi-dimension comparison, multi-channel characterization, multi-agent rule matching and image multi-view feature extraction is adopted, so that the brand approximate risk level is accurately judged, and key technical support is provided for brand registration examination, infringement monitoring and intellectual property management.

Inventors

CHEN QIYU
YU WENJING
GOU HAO
Qin Feiwei
Wang Chuncong
QIAN XIAOHU
WANG QING
Lv Xuhong
De Qinkun
WU HANQING
CHEN JIAZHOU
ZHENG YIFAN

Assignees

浙江省市场监督管理数字传媒中心
浙江金汇数字技术有限公司

Dates

Publication Date: 20260512
Application Date: 20260128

Claims (10)

1. An intelligent risk early warning system based on multi-dimensional information fusion is characterized by comprising a name and entitlement similarity calculation module, a trademark commodity service comparison module, an image feature extraction module, a multi-dimensional feature database and a comprehensive risk calculation module; The name and right person similarity calculation module is used for performing depth comparison on the brand name and right person name to be detected and the database data in the multidimensional feature database, and calculating to obtain a corresponding similarity score; The trademark commodity service comparison module is used for constructing a double-layer processing architecture of 'vector coarse screening-intelligent agent fine discharge', and carrying out final similarity judgment on commodity service description; the image feature extraction module performs refinement treatment on the original trademark image to calculate the trademark image similarity; The multi-dimensional feature database adopts a mixed storage architecture of a MySQL relational database and a vector database, and is used for uniformly storing basic metadata and multi-dimensional feature vectors of a base database and a trademark to be detected, supporting millisecond-level retrieval and multi-mode association analysis, wherein the relational database stores standardized basic metadata, and the vector database stores feature vectors extracted in a multi-dimensional way; And the comprehensive risk calculation module is used for calculating a final risk score by integrating the image-text mixed similarity, the dynamic commodity coefficient and the right person difference.
2. The intelligent risk early warning system based on multi-dimensional information fusion according to claim 1, wherein the name and rights man similarity calculation module operates as follows: The method comprises the steps of constructing an original text and translation double-track parallel processing mechanism aiming at brand names, generating translation fields by utilizing a translation model, wherein the translation model is a neural network translation model capable of supporting multilingual translation while retaining original text fields, obtaining comprehensive similarity of the brand names based on shape similarity of edit distances and visual hash, phonetic similarity based on pinyin or romanized transcription and meaning similarity based on multilingual sentence vectors on the basis of the original text and translation double-track parallel processing mechanism, extracting multichannel characteristics aiming at the right names, dividing right name similarity calculation into three channels of full name, short name and short pinyin, respectively calculating character string edit distances and visual hash similarity of each channel, calculating semantic similarity by combining sentence vector coding, and finally obtaining the comprehensive similarity of the right names through multichannel weighting.
3. The intelligent risk early warning system based on multi-dimensional information fusion according to claim 1, wherein the trademark commodity service comparison module operates as follows: The method comprises the steps of adopting a coarse screening-fine arranging two-stage retrieval mechanism, firstly utilizing a semantic vector model to carry out vectorization coding and cosine similarity calculation on commodity service description to complete candidate set recall, then constructing a multi-agent system, and carrying out final similarity judgment on extracting rule basis of similar commodity and service differentiation table, executing vector retrieval and semantic fine arranging and comprehensive rule information.
4. The intelligent risk early warning system based on multi-dimensional information fusion according to claim 1, wherein the image feature extraction module operates as follows: The method comprises the steps of constructing an image-text separation processing link based on deep learning, firstly carrying out foreground segmentation on an original trademark image of a trademark to be detected by using an energy minimization-based GrabCut algorithm, secondly, combining an optical character recognition module based on deep learning to position a character area in the foreground, independently extracting the character area as a font view, filling the character area by using an image restoration algorithm to obtain a pure graph view, finally, respectively extracting high-dimensional feature vectors of an original graph view, the font view and the pure graph view through a deep visual neural network, collectively called multi-view feature vectors, and calculating the similarity of the trademark image by using the obtained multi-view feature vectors and multi-view feature vectors of database data in a multi-dimensional feature database.
5. The intelligent risk early warning system based on multi-dimensional information fusion according to claim 1, wherein the comprehensive risk calculation module operates as follows: Firstly, performing calculation of nonlinear fusion graph-text dimension basic similarity, calculating graph-text hybrid similarity by adopting a ternary fusion strategy of maximum value leading, arithmetic average assistance and harmonic average bottom aiming at the similarity of trademark images and trademark names, secondly, performing dual-path preferential calculation of dynamic commodity coefficients, parallel calculation of soft gating coefficients and fusing cabin raising coefficients, and calculating the maximum value of the soft gating coefficients and the fusing cabin raising coefficients as dynamic commodity coefficients, and finally, introducing the difference degree of rights people as an exclusive factor, and synthesizing a final risk score through multiplication logic.
6. The intelligent risk early warning system based on multi-dimensional information fusion according to claim 2, wherein the input data of the similarity calculation module is a brand name of a brand to be detected and a right person name, namely an original text field; The method comprises the steps of (1) replacing a special name, a number and a bracket with placeholders, backfilling after translation is completed, (2) carrying out batch processing on the same language or the same length text, reducing the output random probability by adjusting decoding parameters to ensure consistency, (3) checking the integrity rate and the retention rate of the placeholders of the translation, carrying out recompilation if the integrity rate and the retention rate of the placeholders of the translation do not pass, and carrying out air-laying if the integrity rate and the retention rate of the translation do not pass after recompilation; calculating and fusing the shape similarity and the sound similarity of the brand name of the to-be-detected brand and each brand name in the database data, wherein the shape similarity is calculated based on standard fields of the brand names in the to-be-detected brand and the database data: Firstly, calculating normalized similarity of character string editing distance, rendering standard fields of trademark names in trademark to be detected and bottom library data into gray images by using a font rasterization text technology, calculating difference hash to obtain fixed-length fingerprints, performing exclusive-or operation on the fingerprints, and then counting Hamming distance Fusing the two according to coefficients to obtain shape similarity; The method comprises the steps of performing romanization transcription on standard fields of brand names in the to-be-detected brand and the database data to form pronunciation strings, and calculating normalized similarity of editing distances on the pronunciation strings to obtain the voice similarity; calculating sense similarity, namely encoding translation fields in the trademark to be detected and the database data into vectors respectively, calculating cosine similarity, and mapping the cosine similarity to the database data Obtaining semantic similarity Simultaneously, the semantic similarity is directly calculated based on standard fields of brand names Determining final sense similarity by adopting two-way fusion strategy : Finally, the shape similarity, the sound similarity and the meaning similarity are fused according to weights to obtain the brand name comprehensive similarity; Calculating and fusing the shape similarity and meaning similarity of each right name in the right name and database data of the trademark to be detected, wherein a multi-channel of short and short pinyin is introduced in the calculation of the shape similarity of the right name, firstly, generating short pinyin and short pinyin based on standard fields of the right name of the trademark to be detected, extracting short pinyin comprises removing sections in brackets, removing prefix areas, stripping tissue form suffixes, cutting off the tissue form suffixes before industry keywords when the industry keywords exist to obtain word sizes, performing pinyin transcription on the short pinyin and separating space to generate short pinyin, then calculating the shape similarity of three channels of full name, short pinyin and short pinyin respectively, and obtaining channel shape similarity for each channel c according to the following formula : Wherein, the Editing the normalized similarity of the distance for the channel c character string; Visual hash similarity for channel c; And fusing three channels according to maximum value of each element to obtain final shape similarity: Wherein, the The shape similarity of three channel parts of full scale, short for short and pinyin py for short respectively; the sense similarity calculation mode of each channel of the right name is consistent with the sense similarity calculation mode of the trademark name, and the sense similarity of the three channels is fused according to the maximum value of each element to obtain the final sense similarity; and fusing the final shape similarity and the final meaning similarity of the right names according to the equal weight to obtain the comprehensive similarity of the right names.
7. The intelligent risk early warning system based on multidimensional information fusion according to claim 3, wherein the trademark commodity service comparison module is specifically implemented as follows: firstly, performing vectorization recall, converting commodity service description of a trademark to be detected into dense vector representation by utilizing a pre-trained semantic vector coding model for representing semantic features of commodity service, performing cosine similarity calculation based on the dense vector representation, and screening Top-K semantic candidates from bottom library data; On the basis of the method, a multi-processing module mechanism of the collaborative work of multiple agents is adopted to carry out similarity check and risk judgment on commodity service descriptions of trademarks to be detected and output corresponding risk levels, specifically, firstly, the commodity service descriptions of the trademarks to be detected and a candidate set are input into a large language model together through a semantic fine-ranking agent to carry out repeated sequencing and similarity score fine adjustment, thus a more reliable candidate pool is obtained, higher-quality input is provided for a rule comparison stage, then, the rule is adopted to extract the principle that the commodity service descriptions of the trademarks to be detected and Top-K semantic candidates are respectively found out most relevant rule paragraph and annotation basis in a similar commodity and similarity score table, corresponding similarity group information and explanation are correspondingly summarized, finally, the rule comparison judging agent synthesizes text contents of the similar commodity and similarity group information of the similar label table and the candidate item and the similarity score by utilizing the large language model, compares the similarity group information of the similar commodity and the similarity score table and the similarity score with the respective explanation, and the similarity score fine-ranking score to output the final judged risk level, the classification basis is that the high-risk group and the similarity score of the candidate score are in the similar commodity service description and the similar commodity service score table, the similarity score and the similarity score of the candidate score are directly corresponding similarity score table and the similarity-correlated similarity score.
8. The intelligent risk early warning system based on multi-dimensional information fusion according to claim 1, wherein the multi-dimensional feature database is constructed specifically as follows: Under the non-Root authority, creating a user-level isolation environment named as db_test through Conda, deploying a MySQL relational database and a vector database to form a layered storage architecture of basic metadata and feature vectors, and ensuring the safety and isolation of data storage; in the process of initializing the database data, firstly analyzing the database data in a local XLSX source file, and importing corresponding basic metadata into a preset blank table structure of a MySQL relational database to finish the structural storage of basic information, wherein the basic metadata comprises standard fields of trademark names and right person names, corresponding translation fields, commodity service descriptions, original image views, font views and pure graphic views of original trademark images; Then synchronously executing multi-dimensional feature extraction on the database data, namely generating a multi-language semantic vector by brand name through 'original text + translation' double-track processing and shape, sound and sense multi-dimensional analysis, generating a multi-channel feature vector by right person name through 'full name, short name and short spelling' multi-channel analysis and shape and sense feature extraction, extracting multi-view feature vectors from three types of views after foreground segmentation and image-text separation processing; when updating early warning data, the storage architecture and deployment environment of a MySQL relational database and a vector database of the database data are completely multiplexed, the brand data to be detected and the database data coexist in the same database instance, basic metadata of the brand data to be detected are analyzed and imported into a MySQL corresponding data table, various feature vectors of the brand data to be detected are simultaneously extracted and stored into the vector database, the brand data to be detected are definitely marked as early warning data by adding a data type field, and the part of data are strictly consistent with the database data on a data structure, so that unified storage, classification management and quick retrieval of the brand data to be detected are realized only through logic division.
9. The intelligent risk early warning system based on multidimensional information fusion according to claim 5, wherein the risk assessment of the integrated risk calculation module comprises three stages of core stages: calculating the basic similarity of nonlinear fusion image-text dimension Firstly, processing the brand image dimension and the brand name dimension to obtain similarity scores based on an image feature extraction module and a name and entitlement similarity calculation module And Namely, the brand image dimension highest similarity score and the brand name dimension highest similarity score corresponding to the brand to be detected and the database data after being compared one by one are initialized and defined as weight coefficients 、 And And calculating the image-text mixed similarity according to the following formula : In the course of this formula (ii) the formula, The function is used for extracting the maximum value in the similarity result of the trademark image dimension and the trademark name dimension; The function is used for calculating the average value of similarity results of the trademark image dimension and the trademark name dimension; the function is used for calculating the harmonic mean of the similarity result of the trademark image dimension and the trademark name dimension; Secondly, calculating the dynamic commodity coefficient preferentially by a dual-path, and determining the final dynamic commodity coefficient by adopting a dynamic mechanism of preferential output after parallel calculation of a soft gating path and a strong characteristic fusing path; in the soft gating path, a base value corresponding to the awareness of the brand to be detected is predefined first Commodity service similarity determined by risk level using base value Obtaining soft gating coefficient by linear interpolation operation, and combining the comprehensive similarity of trade name dimension in fusing path With a predefined cabin-lifting threshold coefficient Multiplying to obtain fused cabin lifting coefficient, and finally determining dynamic commodity coefficient by using the following preferred formula : Third, the difference degree of the rights is introduced to synthesize the final score, and after the image-text mixed similarity and the dynamic commodity coefficient are obtained, the difference degree of the rights is introduced As a final exclusive regulatory factor, The method comprises the steps of (1) synthesizing the indexes of three dimensions of the mixed similarity of the picture and the text, the dynamic commodity coefficient and the difference of the rights by adopting multiplication logic, and calculating the final trademark approximate risk score by the following formula : 。
10. An intelligent risk early warning system method based on multidimensional information fusion is characterized by comprising the following steps: Step 1, constructing a multidimensional feature database: The multi-dimensional feature database adopts a mixed storage architecture of a MySQL relational database and a vector database, and is used for uniformly storing basic metadata and multi-dimensional feature vectors of a base database and a trademark to be detected, supporting millisecond-level retrieval and multi-mode association analysis, wherein the relational database stores standardized basic metadata, and the vector database stores feature vectors extracted in a multi-dimensional way; step 2, carrying out depth comparison on the brand name and the right person name to be detected and database data in the multidimensional feature database, and calculating to obtain a corresponding similarity score; Constructing an original text and translation double-track parallel processing mechanism aiming at brand names, generating translation fields by utilizing a translation model, wherein the translation model is a neural network translation model capable of supporting multilingual translation while retaining original text fields, and on the basis, obtaining the comprehensive similarity of the brand names based on the shape similarity of edit distances and visual hash, the phonetic similarity based on pinyin or romanized transcription and the meaning similarity based on multilingual sentence vectors by weighted fusion; step 3, constructing a double-layer processing architecture of 'vector coarse screening-intelligent agent fine discharge', and judging the final similarity of the commodity service description; The method comprises the steps of adopting a coarse screening-fine arranging two-stage retrieval mechanism, firstly utilizing a semantic vector model to carry out vectorization coding and cosine similarity calculation on commodity service description to complete candidate set recall, then constructing a multi-agent system, wherein each agent node works cooperatively, namely extracting rule basis of similar commodity and service differentiation table, carrying out vector retrieval and semantic fine arranging, and carrying out final similarity judgment on comprehensive rule information; step 4, carrying out refinement treatment on the original trademark image to calculate the trademark image similarity; Firstly, carrying out foreground segmentation on an original trademark image of a trademark to be detected by using an energy minimization-based GrabCut algorithm, secondly, independently extracting a character area in the foreground to form a font view by combining an optical character recognition module based on the deep learning, filling the character area by using an image restoration algorithm so as to obtain a pure graph view, finally, respectively extracting high-dimensional feature vectors of the original graph view, the font view and the pure graph view through a deep visual neural network, and finally, calculating the similarity of the trademark image by using the obtained multi-view feature vectors and multi-view feature vectors of database data in a multi-dimensional feature database; Step 5, calculating final risk scores by integrating the image-text mixed similarity, the dynamic commodity coefficients and the right person difference; Firstly, performing calculation of nonlinear fusion graph-text dimension basic similarity, calculating graph-text hybrid similarity by adopting a ternary fusion strategy of maximum value leading, arithmetic average assistance and harmonic average bottom aiming at the similarity of trademark images and trademark names, secondly, performing dual-path preferential calculation of dynamic commodity coefficients, parallel calculation of soft gating coefficients and fusing cabin raising coefficients, and calculating the maximum value of the soft gating coefficients and the fusing cabin raising coefficients as dynamic commodity coefficients, and finally, introducing the difference degree of rights people as an exclusive factor, and synthesizing a final risk score through multiplication logic.

Description

Intelligent risk early warning system and method based on multidimensional information fusion Technical Field The invention belongs to the technical field of trademark risk early warning in the intellectual property field, in particular to an intelligent risk early warning system based on multi-dimensional information fusion, which organically fuses four core dimension data of trademark name, right person information, commodity service classification and identification image, the method adopts a collaborative strategy of text multidimensional comparison, multichannel characterization, multi-agent rule matching and image multi-view feature extraction to accurately judge the approximate risk level of the trademark, and provides key technical support for trademark registration examination, infringement monitoring and intellectual property management. Background Along with the penetration of global economy integration and the awareness of market main brands, the trademark is taken as an important carrier of enterprise core intellectual property and brand value, and the registration application amount of the trademark is in an explosive growth situation. In a mass of trademark data and a complex market competition environment, trademark registration inspection, infringement monitoring and full life cycle risk management face serious challenges. The existing trademark risk prejudging technology is mainly used for serving inspectors or agency institutions, and the core aim is to maintain market competition order and protect legal rights of rights and persons. However, in the face of increasingly complex new risk scenarios such as cross-category association infringement, malicious rush to register, confusion of the same name and different types, non-standard commodity service description and the like, the traditional technical means are left-overshadowed, and the practical requirements of high accuracy and intellectualization are difficult to meet. The prior art has obvious limitation on single dimension feature extraction and comparison. In the text dimension, the traditional method is mostly based on the edit distance (LEVENSHTEIN DISTANCE) of the character strings or simple word vector matching to perform similarity calculation, usually only the superposition of the character morphology is concerned, but the deep association of the multi-phonetic characters, the shape near characters and the semantic layer is ignored, and the hidden infringement risk of different characters and different meanings is difficult to identify. In the image dimension, the existing methods based on the traditional Computer Vision (CV) feature extraction algorithm (such as SIFT, SURF and the like) or the shallow neural network are mostly limited to the comparison of the overall outline or color distribution of the image, lack of independent stripping capability for the significant part in the graphic combined trademark, and are difficult to capture fine visual infringement behaviors such as fine tuning of the font design style, local plagiarism of the graphic and the like. In addition, the correlation analysis mechanism lacking the dimension of the rights is also a big and short board, the existing system always sees each application subject in isolation, and the defensive cross-class layout of the same subject or the malicious stocking behavior of the correlation company cannot be identified by constructing a knowledge graph, so that the normal defensive registration is misjudged as infringement, or the complex correlation subject infringement risk is omitted. In the core link of commodity service item comparison, the bottleneck of the prior art is particularly prominent. Traditional commodity service similarity judgment mainly depends on fixed code matching or rule-based keyword matching of similar commodity and service differentiation table. However, in actual business, descriptions of commodity services become widespread, and descriptions of non-canonical expressions, industry colloquial, emerging concepts, or semantic ambiguities often appear. Traditional Natural Language Processing (NLP) techniques often suffer from ambiguity due to a lack of context-awareness capabilities when processing such short text. For example, simple keyword matching cannot distinguish the context difference of "apple (fruit)" from "apple (electronic product)", and it is also difficult to cope with the matching deviation caused by core attribute deletion or synonym substitution (such as "smart terminal" and "mobile phone"). The prior art generally lacks a mature Large Language Model (LLM) enabling mechanism, and fails to utilize a large model mass knowledge base and strong logic reasoning capability (such as thinking chain analysis) to extract core elements (functions, purposes and consumption objects) of commodities, so that robustness is seriously insufficient when complex semantic scenes are processed, and misjudgment rate and missed judgment rate are h