CN-121996718-A - Intellectual property data acquisition method and system based on multi-source data fusion

CN121996718ACN 121996718 ACN121996718 ACN 121996718ACN-121996718-A

Abstract

The invention relates to the technical field of intellectual property data processing and discloses an intellectual property data acquisition method and system based on multi-source data fusion, wherein the method comprises the steps of acquiring an original information set, and clustering and grouping the original information set to obtain a classified data set; performing feature verification and matching analysis on the classified data set to obtain feature matching scores, if the feature matching scores are higher than a preset analysis threshold, performing structural analysis on the classified data set to obtain a preliminary analysis field, performing standardized mapping processing on the preliminary analysis field to obtain unified format data, performing association value mining on the unified format data to obtain high-value association information, and fusing the high-value association information to a preset integral data set to obtain a target intellectual property data set. The method can realize the efficient integration and depth value mining of the multi-source heterogeneous data, and remarkably improves the acquisition precision and standardization level of the intellectual property data.

Inventors

YANG LIUQING

Assignees

深圳市霏凡网络科技有限公司

Dates

Publication Date: 20260508
Application Date: 20251218

Claims (8)

1. An intellectual property data collection method based on multi-source data fusion is characterized by comprising the following steps: acquiring an original information set, and carrying out clustering grouping treatment on the original information set to obtain a classified data set; Performing feature verification and matching analysis on the classified data set to obtain feature matching scores; If the feature matching score is higher than a preset analysis threshold, carrying out structural analysis on the classified data set to obtain a preliminary analysis field; carrying out standardized mapping treatment on the preliminary analysis field to obtain unified format data; Performing association value mining on the unified format data to obtain high-value association information; And fusing the high-value associated information to a preset integral data set to obtain a target intellectual property data set.
2. The method for collecting intellectual property data based on multi-source data fusion according to claim 1, wherein the steps of obtaining an original information set, and performing clustering grouping processing on the original information set to obtain a classified data set include: acquiring an original information set, and performing data cleaning and de-duplication processing on the original information set to obtain finishing information data; performing source characteristic extraction and grouping processing on the arrangement information data to obtain grouping source characteristic data; and carrying out abnormal isolation and reclassification treatment on the grouping source characteristic data to obtain the classified data set.
3. The method for collecting intellectual property data based on multi-source data fusion according to claim 1, wherein the performing feature verification and matching analysis on the classified data group to obtain feature matching scores comprises: Carrying out structural feature extraction and normalization processing on the classified data set to obtain normalized feature data; Extracting the update frequency parameters of the classified data group to obtain update frequency parameters; Performing similarity comparison on the normalized feature data and a preset feature library to obtain a preliminary matching result; if the preliminary matching result is lower than a preset matching threshold, carrying out weight self-adaptive adjustment on the normalized characteristic data by utilizing the updated frequency parameter to obtain corrected characteristic data; And carrying out secondary matching calculation on the corrected feature data to obtain the feature matching score.
4. The method for collecting intellectual property data based on multi-source data fusion according to claim 1, wherein if the feature matching score is higher than a preset analysis threshold, performing structural analysis on the classified data set to obtain a preliminary analysis field, including: If the feature matching score is higher than a preset analysis threshold, element type separation is carried out on the classified data set to obtain a text element set; Performing field positioning and attribute detection on the text element set to obtain field attribute parameters; If the field attribute parameters do not accord with a preset length threshold, performing boundary expansion and neighborhood stitching on the text element set to obtain an optimized field position; And capturing the content of the text element set according to the optimized field position to obtain the preliminary analysis field.
5. The intellectual property data collection method based on multi-source data fusion according to claim 1, wherein the standardized mapping process is performed on the preliminary analysis field to obtain unified format data, and the method comprises the following steps: Performing structure alignment analysis on the preliminary analysis field and a preset standard template to obtain a field offset parameter; If the field offset parameter exceeds a preset offset threshold, carrying out field distribution reconstruction on the preliminary analysis field to obtain reconstructed field distribution; Performing template mapping and logic consistency verification on the reconstructed field distribution to obtain verification result data; and carrying out format normalization processing according to the verification result data to obtain the unified format data.
6. The intellectual property data collection method based on multi-source data fusion according to claim 1, wherein the performing association value mining on the unified format data to obtain high-value association information comprises: Performing distribution characteristic analysis on the unified format data to obtain field association strength; if the field association strength is lower than a preset strength threshold, performing feature dimension expansion and complementation on the unified format data to obtain a reorganization association mode; performing logic verification on the reorganization association mode to obtain a verification hidden mode; And carrying out matching analysis on the verification hiding mode and a preset key information feature mapping table to obtain the high-value associated information.
7. The multi-source data fusion-based intellectual property data collection method of claim 1, wherein the fusing the high-value associated information to a preset integral data set to obtain a target intellectual property data set comprises: Performing redundancy analysis and quantitative calculation on the high-value associated information and the preset integral data set to obtain a redundancy statistical index; If the redundancy statistics index exceeds a preset quantity threshold, performing de-duplication fusion and conflict resolution on the high-value associated information to obtain a refined classification result; And carrying out data merging operation according to the refined classification result to obtain the target intellectual property data set.
8. An intellectual property data collection system based on multi-source data fusion, comprising: The data acquisition and grouping module is used for acquiring an original information set, and clustering and grouping the original information set to obtain a classified data set; The feature matching module is used for carrying out feature verification and matching analysis on the classified data set to obtain feature matching scores; the structure analysis module is used for carrying out structural analysis on the classified data set to obtain a preliminary analysis field if the feature matching score is higher than a preset analysis threshold; the format unification module is used for carrying out standardized mapping processing on the preliminary analysis field to obtain unified format data; the value mining module is used for carrying out association value mining on the unified format data to obtain high-value association information; and the data fusion module is used for fusing the high-value associated information to a preset integral data set to obtain a target intellectual property data set.

Description

Intellectual property data acquisition method and system based on multi-source data fusion Technical Field The invention relates to the technical field of intellectual property data processing, in particular to an intellectual property data acquisition method and system based on multi-source data fusion. Background At present, intellectual property data has become a core strategic resource for enterprise technical innovation and market competition. In the face of massive patent documents, trademark information, scientific journals and network public data, how to efficiently acquire and integrate valuable information from scattered, heterogeneous and dynamically-changed data sources has become a key direction for the application of big data acquisition and big data processing technology in the intellectual property field. Accurate and comprehensive intellectual property data is not only the basis for technical layout of enterprises, but also an important basis for avoiding infringement risks and mining potential technical information. In one prior art, the acquisition of intellectual property data relies primarily on a single database interface call or a simple web crawler tool based on preset keywords. Such methods typically store text, image and form data obtained from different sources directly, and subsequent processing is often limited to basic deduplication or format conversion, lacking in depth analysis and adaptation of the multi-source data features. For example, when processing patent documents containing complex formulas or engineering drawings, existing systems often cannot effectively parse unstructured data, and lack intelligent fusion and verification mechanisms in the face of data conflicts from different sources. Therefore, the technical problem of low collection and analysis efficiency when aiming at multi-source heterogeneous data exists in the prior art. Disclosure of Invention The invention provides an intellectual property data acquisition method and system based on multi-source data fusion, which are used for solving the technical problem of low acquisition and analysis efficiency when aiming at multi-source heterogeneous data in the prior art. In order to solve the above technical problems, the present invention provides an intellectual property data collection method based on multi-source data fusion, including: acquiring an original information set, and carrying out clustering grouping treatment on the original information set to obtain a classified data set; Performing feature verification and matching analysis on the classified data set to obtain feature matching scores; If the feature matching score is higher than a preset analysis threshold, carrying out structural analysis on the classified data set to obtain a preliminary analysis field; carrying out standardized mapping treatment on the preliminary analysis field to obtain unified format data; Performing association value mining on the unified format data to obtain high-value association information; And fusing the high-value associated information to a preset integral data set to obtain a target intellectual property data set. In a second aspect, the present invention provides an intellectual property data collection system based on multi-source data fusion, comprising: The data acquisition and grouping module is used for acquiring an original information set, and clustering and grouping the original information set to obtain a classified data set; The feature matching module is used for carrying out feature verification and matching analysis on the classified data set to obtain feature matching scores; the structure analysis module is used for carrying out structural analysis on the classified data set to obtain a preliminary analysis field if the feature matching score is higher than a preset analysis threshold; the format unification module is used for carrying out standardized mapping processing on the preliminary analysis field to obtain unified format data; the value mining module is used for carrying out association value mining on the unified format data to obtain high-value association information; and the data fusion module is used for fusing the high-value associated information to a preset integral data set to obtain a target intellectual property data set. Compared with the prior art, the invention has the following beneficial effects: (1) The large data processing and format normalization mechanism aiming at the multi-source heterogeneous data effectively solves the problems of confusion of intellectual property data structures and incompatibility of formats of different sources (such as texts and images), realizes accurate conversion of unstructured information to a unified standard format, and lays a high-quality data foundation for subsequent analysis. (2) The method and the device have the advantages that through carrying out association value mining on the unified format data and utilizing an association r