CN-122019613-A - Scientific and technological achievement supply and demand matching method and system based on index fingerprinting and dynamic bitmap indexing
Abstract
The invention discloses a scientific and technological achievement supply and demand matching method and system based on index fingerprinting and dynamic bitmap indexing, and belongs to the technical field of information retrieval and text data processing. Aiming at the problems of low matching efficiency, low accuracy, low interpretability and the like caused by fuzzy enterprise demand expression and fragmented result index description, the method comprises the following steps of firstly carrying out normalization processing on index terms of results and demand texts to generate binary index fingerprints and construct a result dynamic bitmap index, secondly analyzing demand to generate demand fingerprints and quantitative constraint information, then carrying out primary screening through multi-strategy mixed recall, carrying out dynamic gating pruning based on coverage, and finally carrying out scoring and sorting by integrating semantic similarity and index satisfaction and outputting interpretable matching results. According to the invention, rapid preliminary screening is realized through fingerprinting bitmap indexes, and the butt joint efficiency and the result reliability are improved by combining multi-strategy fusion, so that the defects of the existing method in the aspects of matching efficiency, accuracy and interpretability are effectively overcome.
Inventors
- WU MIN
- LUO HUI
- LIU YOU
- LUO LIANGLIANG
- Yang Tingqi
- HUANG DECHANG
- XIONG HUILING
Assignees
- 江西省科技事务中心
Dates
- Publication Date
- 20260512
- Application Date
- 20260205
Claims (9)
- 1. A scientific and technological achievement supply and demand matching method based on index fingerprinting and dynamic bitmap indexing is characterized by comprising the following steps: S1, carrying out index term normalization processing on a scientific and technological achievement text, generating K-bit binary achievement fingerprints based on a standard index identification set, maintaining a version mapping relation from index identifications to fingerprint bits, and constructing a compressed bitmap index library by utilizing the achievement fingerprints; s2, analyzing the enterprise demand text to extract demand index identifiers and quantization constraint information, and generating K-bit binary demand fingerprints based on a standard index identifier set; s3, performing semantic retrieval and keyword retrieval in parallel to obtain a first candidate set, performing bit operation screening on a compressed bitmap index library based on a required fingerprint to obtain a second candidate set, and combining and de-duplicating to form an initial candidate set; S4, calculating index coverage of result fingerprints and demand fingerprints of the candidate technological achievements in the initial candidate set, and screening based on a dynamic gating threshold to obtain a gated candidate set; s5, obtaining candidate technological achievement index realization values from the gated candidate sets, calculating comprehensive scores by combining the semantic similarity and the index satisfaction, and sequencing the comprehensive scores to output matching results and interpretable matching information.
- 2. The method of claim 1, wherein the index term normalization in step S1 comprises pre-classifying technical fields of the technological achievement text before term mapping, calling a term-index association knowledge base of a corresponding field to perform disambiguation based on a pre-classification result, calculating mapping confidence levels for candidate term-index mapping, setting a confidence level threshold T, and marking the candidate term as a candidate matching item for subsequent keyword search leakage repairing or proofreading processing without executing a corresponding fingerprint position bit when the confidence level of the candidate mapping is lower than the threshold T.
- 3. The method of claim 1, wherein the K-bit binary result fingerprint comprises core technical bits and extension bits, dynamic extension is achieved through a versioning bit mapping mechanism, when standard index identifiers are newly added, new fingerprint bits are added to the extension bits, the versioning mapping relation is updated, meanwhile incremental updating is performed on the compressed bitmap index library only aiming at the newly added fingerprint bits, the compressed bitmap index library respectively builds corresponding compressed bitmaps aiming at K bits of the result fingerprint, each compressed bitmap is recorded in a technological result identifier set with the value of 1 corresponding bit, and the compressed bitmaps adopt Roaring Bitmap structures.
- 4. The method of claim 1, wherein the quantization constraint information in step S2 employs a generalized constraint description specification, and the operators include at least " ”“ ”“ "And interval constraint And further support the "minimum/Maximize" class constraint as a ordering preference.
- 5. The method according to claim 1, wherein the bit operation filtering in the step S3 includes dividing the requirement index identifier into a necessary index set and a priority selection index set, performing a cross operation on the compressed bitmaps corresponding to the necessary index set to obtain a necessary candidate set, performing a sum operation on the compressed bitmaps corresponding to the priority selection index set to obtain a scoring candidate set, and obtaining a second candidate set based on the necessary candidate set and the scoring candidate set, wherein in the merging and de-duplication process, when the first candidate set and the second candidate set have the same technological achievement, records meeting the constraint of the necessary candidate set are reserved.
- 6. The method of claim 1, wherein the step S4 of adaptively determining the dynamic gating threshold according to the candidate set coverage distribution comprises taking a preset quantile of the coverage distribution, setting according to the relation between a coverage mean value and a standard deviation, or adopting sampling to estimate the coverage distribution when the initial candidate set scale exceeds a preset threshold.
- 7. The method according to claim 1, wherein the storage structure of the index implementation value in step S5 stores the index implementation value and unit information of the candidate scientific and technological achievements by using the scientific and technological achievements identification and the index identification as keys, the index satisfaction is calculated in the form of Sigmoid including a monotonic gain function, the comprehensive score is calculated by using a gain-risk separation model, and structured interpretable matching information is output in the matching result, and the interpretable matching information includes coverage, missing index set, index difference information, baseline confidence and evidence segment information corresponding to the index implementation value.
- 8. The method of claim 7, wherein the baseline value is derived based on a fractional bucket statistical inference in industry, enterprise scale, and technical domain, wherein the baseline confidence decreases as the fractional bucket dispersion increases, increases as the sample size increases, and is calculated from the fractional bucket variance, the mean, and the sample size by expanding the statistical aperture when the fractional bucket sample size is less than a predetermined minimum value to perform the degradation inference.
- 9. The scientific and technological achievement supply and demand matching system based on index fingerprinting and dynamic bitmap indexing is characterized by comprising a fingerprint and index construction module, a compression bitmap index library, a dynamic bitmap index database and a dynamic bitmap index database, wherein the fingerprint and index construction module is used for carrying out index term normalization on scientific and technological achievement texts to generate K-bit binary achievement fingerprints, and constructing a compression bitmap index library based on the achievement fingerprints; The demand processing module is used for analyzing the enterprise demand text to generate K-bit binary demand fingerprints and quantization constraint information; The mixed recall and pruning module is used for executing semantic retrieval and keyword retrieval in parallel to obtain a first candidate set, executing bit operation screening on the compressed bitmap index library based on the required fingerprint to obtain a second candidate set, merging and de-duplicating the candidate sets to form an initial candidate set, and performing gating pruning based on coverage and a dynamic threshold to obtain a gated candidate set; and the fusion output module is used for acquiring the index realization value of the candidate technological achievements in the candidate set after gating, calculating the comprehensive score by fusing the semantic similarity and the index satisfaction, sequencing, and outputting a matching result and structured interpretable information.
Description
Scientific and technological achievement supply and demand matching method and system based on index fingerprinting and dynamic bitmap indexing Technical Field The invention relates to the technical field of information retrieval and text data processing, in particular to a scientific and technological achievement supply and demand matching method and system based on index fingerprinting and dynamic bitmap indexing. Background Under the strategic background of the advanced integration of technological innovation and industrial innovation and the development of new quality productivity, the scientific research ecology constructed for guiding has become a core task for strengthening the main status of the enterprise innovation. The accurate technology supply and demand butt joint is used as a key tie for connecting high-quality technology supply and industrial application, and is a prerequisite for playing the role of supporting and guiding technology innovation and realizing industrial multiplication development. If the matching mechanism fails, not only can a great amount of scientific research resources be deposited inefficiently, but also the difficulty of the dislocation of the science and technology and industry of two skins is aggravated, and the promotion of regional core competitiveness is seriously restricted. The existing matching mode is developed from manual screening to a keyword retrieval platform and a semantic retrieval scheme, and the efficiency bottleneck is still faced. Although the AI scheme is introduced to match, the matching is often simplified into text similarity calculation, and the semantic difference between the enterprise demand accentuation and the achievement description specialization cannot be overcome. The expression noise causes nonstandard training data, technical indexes lack of unified dimension and are difficult to quantify and compare, and a matching model lacks interpretability and robustness, so that uncertainty risks are brought to enterprise decision. The paper provides a two-stage technology supply and demand matching method research (journal: informatics) based on text matching-configuration optimization, which firstly carries out technology text matching, then combines configuration optimization to form a two-stage matching framework, and verifies in a real technology market scene, so that supply and demand docking feasibility can be improved to a certain extent. However, the framework mainly solves the problems of excessive matching results and complex macro screening and decision making, and is not an important point for unified caliber, unit conversion, hard threshold verification of quantifiable indexes in enterprise technical requirements and a level rapid filtering mechanism facing mass libraries. Aiming at the problem, the method uses a link of index normalization, fingerprinting, bitmap indexing, gating pruning and interpretable scoring to search, check and interpretable forward the hard index to the recall and gating stage, thereby meeting the requirements of enterprise technical index alignment and quantifiable comparison. The patent 'intelligent scientific and technological achievement recommending method based on feature clustering and similarity calculation matching' (publication number: CN 119760221A) proposes that key data and text semantic information are extracted from scientific and technological achievements, field identification and cluster feature word construction are carried out, and recommendation is achieved by combining user browsing and demand analysis, so that coverage and efficiency of achievement recommendation can be improved. The core of the scheme is still driven by clustering and similarity/preference, and the rigid index constraint in the enterprise demand lacks uniform structural expression and verifiable satisfaction judgment. Aiming at the problem, the invention maps achievements and demands to a standardized index space, generates K-bit binary fingerprints, eliminates candidates of 'index dimension not covered' at the recall stage by using compressed bitmap index, bitwise operation and coverage gating, and fuses semantic similarity and index satisfaction output sequencing and interpretation report, thereby taking into account the reliability of hard constraint, the retrieval efficiency and the interpretability. Therefore, there is a need for a supply and demand matching method and system that can unify the caliber of the index and realize rapid screening of mass results, so as to obtain more accurate and interpretable matching results under complex semantic differences and index constraint conditions. Disclosure of Invention In order to solve the technical problems, the invention provides a scientific and technological achievement supply and demand matching method and system based on index fingerprinting and dynamic bitmap indexing, which solve the problems that in the prior art, enterprise demands are difficult to express