Search

CN-121979932-A - Aviation safety knowledge mixed retrieval system based on information creation distributed database

CN121979932ACN 121979932 ACN121979932 ACN 121979932ACN-121979932-A

Abstract

The invention belongs to the technical field of aviation safety information processing and intelligent retrieval, in particular relates to an aviation safety knowledge hybrid retrieval system based on a credit distributed database, and solves the problem that the prior art cannot meet the requirements on aviation safety information processing accuracy, efficiency and reliability. The invention firstly relies on a spread-spectrum and a Feitengxin creating server and OceanBase distributed databases to be matched with an integrated storage engine and a physical cooperative distribution module to construct a multi-type data fusion storage base; and combining a special embedded model in the field with an RRF optimization algorithm through cooperation of two engines of an intelligent engine layer, combining a finite state machine and a positive and negative counter algorithm to realize semantic accurate retrieval, offline data management and increment synchronization of a weak network, applying a standardized API of a service layer and a multi-tenant isolation strategy, and adapting to integration of a multi-terminal system and a third-party system. The invention obviously improves the accuracy, efficiency and reliability of aviation safety information processing.

Inventors

  • ZENG RUIQI
  • LI SISI
  • FAN YANJIE
  • ZHOU XIN
  • LI YUFEI
  • YAO BINGYU
  • LI JIAXIN
  • YANG LEI
  • YANG SHI
  • SU JIANFEI
  • WANG ZHONGXING
  • LI ZEMING
  • Lv Keyuan
  • LIU WEIREN
  • ZHANG ZIHONG

Assignees

  • 南航科技(广东横琴)有限公司

Dates

Publication Date
20260505
Application Date
20260407

Claims (10)

  1. 1. Aviation security knowledge hybrid retrieval system based on information creation distributed database, which is characterized in that the system comprises: the data storage and calculation layer is deployed in a distributed database cluster conforming to the credit-wound standard and comprises an integrated storage engine and a physical cooperative distribution module; The integrated storage engine is configured to receive structured service data and unstructured text data in the aviation security field, integrally integrate and store the structured service data and semantic vector data generated by the unstructured text data, and synchronously construct a B+ tree index aiming at the structured service data and a HNSW index aiming at the semantic vector data; The physical collaborative distribution module is configured to perform physical collaborative distribution storage on an entity table and a relationship table of a knowledge graph by using a table group mechanism of the distributed database so as to enable the entity table and the relationship table taking the entity as a starting point to be located on the same physical storage node; an intelligent engine layer comprising a hybrid search engine and a conflict coordination engine; the mixed search engine is configured to respond to a query request, call the B+ tree index and the HNSW index in parallel to search, and adopt a reciprocal rank fusion algorithm to carry out fusion sequencing on two paths of search results so as to generate a final search result list; The conflict coordination engine is configured to manage offline data operation of the mobile terminal, execute incremental data synchronization after network recovery, and resolve data conflicts based on predefined business rules; And the application service layer is configured to provide a data service interface, receive an external request, distribute the external request to the intelligent engine layer and package and return the processing result of the intelligent engine layer.
  2. 2. The information-based distributed database aviation security knowledge hybrid retrieval system according to claim 1, wherein the integrated fusion storage is implemented by the following steps: Atomically writing the structured business data and the writing request of the semantic vector data into a memory table; when the memory table is dumped, row-column mixed micro blocks are constructed in the generated SSTable file, column storage is adopted for the structured service data, dictionary or travel code compression is adopted for the structured service data, and continuous memory layout is adopted for the semantic vector data; In the SSTable file generation process, a B+ tree index for the structured business data and a HNSW index for the semantic vector data are constructed in parallel, and the HNSW index fragment is used as an index block of the file to be durable.
  3. 3. The aerial security knowledge hybrid retrieval system based on a credit-invasive distributed database according to claim 1, wherein the specific method for storing physical collaborative distribution is as follows: based on statistical analysis of aviation safety knowledge graph history query modes, determining a high-frequency access path taking the incidence relation of retrieval from an entity as a dominant, and determining a partition key of the relation table as a starting point entity identifier according to the high-frequency access path; Creating a table group in the distributed database, and adding the entity table and the relation table into the same table group; And forcing the entity table and the relation table to be physically distributed according to a unified hash function result input by taking the entity identifier through a constraint mechanism of the table group, so that the fact that any entity record and all relation records taking the entity record as a starting point are positioned at the same storage node is realized.
  4. 4. The system of claim 1, wherein the hybrid search engine comprises a query optimizer configured with a vector-aware cost model that filters by scalar index and then calculates vector distance by quantitative evaluation, approximates the estimated resource consumption of two execution paths, namely a nearest neighbor search and a scalar condition verification, with the vector, and automatically selects the path with lower consumption to generate a query execution plan.
  5. 5. The information-based distributed database aviation security knowledge hybrid search system according to claim 4, wherein the method for generating the final search result list is as follows: Receiving the query request, analyzing the natural language text in the request, simultaneously executing word segmentation to extract a keyword set, and calling a domain-specific embedded model to convert the natural language text into a high-dimensional semantic query vector; Searching in the full text index associated with the B+ tree index based on the keyword set to obtain a first candidate list based on keyword matching and a first correlation score thereof, and searching in the HNSW index for approximate nearest neighbor based on the semantic query vector to obtain a second candidate list based on semantic similarity and a second correlation distance thereof; Calculating the fusion score of each document by adopting a reciprocal rank fusion algorithm according to the ranks of the documents in the first candidate list and the second candidate list, wherein the smoothing constant used by the reciprocal rank fusion algorithm is a value which is determined by grid search pre-optimization based on an aviation security field query test set; And sorting all candidate documents in a descending order according to the fusion score, selecting the top K documents with highest rank, and attaching matching type identification and knowledge graph association information to the top K documents to form the final retrieval result list.
  6. 6. The system of claim 1, wherein the predefined business rules in the conflict coordination engine include a finite state machine arbitration matrix for handling state type conflicts, the finite state machine arbitration matrix defining whether the system should perform accept, reject or fork operations when a specific combination conflict occurs between a server business state and a client commit state, wherein fork operations refer to saving modified content of a client as a new association record instead of overwriting original data.
  7. 7. The information-based distributed database aviation security knowledge hybrid search system of claim 1, wherein the predefined business rules in the conflict coordination engine further comprise a specific algorithm for handling numeric field concurrent modification conflicts, the specific algorithm being a positive-negative counter-based algorithm.
  8. 8. The information-based distributed database aviation security knowledge hybrid search system according to claim 5, wherein the semantic vector data generated by the unstructured text data is generated by the following method: Based on a Chinese corpus in the aviation safety field, performing first-stage pre-training on a basic language model by adopting an entity perception masking algorithm, wherein the entity perception masking algorithm selects a complete entity span in a text with preset probability, and uniformly replaces all marks of the complete entity span with mask marks; Constructing a triplet training sample based on the corpus, wherein the triplet comprises an anchor text, a positive sample text and a difficult negative sample text, and the difficult negative sample is obtained by searching a document with the literal similarity of the anchor text in a preset interval from the corpus through a sparse search algorithm; The model obtained after the first stage of pre-training is subjected to fine tuning by giving higher punishment weight to the difficult negative sample so as to distinguish weighted information noise contrast estimation loss functions of texts with similar literals but different semanteme, so that a special embedded model in the field is obtained; and processing the unstructured text data by applying the field-specific embedded model to generate the corresponding semantic vector data.
  9. 9. The information-based distributed database aviation security knowledge hybrid search system according to claim 5 or 8, wherein the smoothing constant used by the reciprocal rank fusion algorithm is determined by the following domain-adaptive parameter optimization method: Constructing a golden test set in the aviation security field, wherein the test set comprises a plurality of typical query and standard answer document sets; Generating a plurality of candidate values of the smoothing constant in a preset numerical value interval in a fixed step length; evaluating each candidate value by using the golden test set, and calculating an average normalized damage accumulated gain index of the search result; and selecting a candidate value with the highest average normalized breakage accumulated gain index as a smoothing constant finally adopted by an algorithm.
  10. 10. The information-based distributed database aviation security knowledge hybrid search system according to claim 7, wherein the algorithm based on positive and negative counters comprises the following steps: maintaining independent increment operation records and decrement operation records for each numerical value field, wherein the records take equipment identification as a key and take the accumulated operation quantity of the equipment on the field as a value; When data combination is carried out, for each equipment identifier, the maximum value of the increment record value and the decrement record value in the server side version and the client side version is respectively taken as a new value after combination; When the field value is read, the difference between the sum of the increment record values and the sum of the decrement record values of all devices is calculated as the final consistent value of the field.

Description

Aviation safety knowledge mixed retrieval system based on information creation distributed database Technical Field The invention belongs to the technical field of information processing, and particularly relates to an aviation security knowledge hybrid retrieval system based on a credit-wound distributed database. Background The information management system in the aviation security field adopts a 'chimney type' layered architecture for a long time, structured data (such as flight numbers, occurrence time and ADREP classification codes) are stored in a traditional relational database (such as Oracle 11g and MySQL 5.7), unstructured data (such as legal documents and accident investigation reports) are stored in a NAS file server, and an inverted index is built according to an independent full-text search engine (such as ELASTICSEARCH, SOLR) to support inquiry. The mobile terminal is used as a thin client of the Web terminal, only depends on instant network connection, and can only cache a small amount of browsing records in an offline scene, and lacks a perfect bidirectional synchronization mechanism. The method has the following core defects that firstly, search dependence keywords are precisely matched, synonyms (such as heavy flight and heavy flight) in the aviation field cannot be processed, ambiguities (such as physical and management meanings of gesture) are difficult to consider, secondly, a storage and separation architecture depends on an ETL tool to perform data synchronization, second-level to minute-level delay exists, cross-system transaction consistency is difficult to guarantee, dirty reading and suspension reference problems are prone to occur, thirdly, a traditional database is difficult to efficiently process multi-hop associated query of high-dimensional vector data and a knowledge map, the query response time is remarkably prolonged along with the increase of data quantity, fourthly, an offline data processing mechanism of a mobile terminal is absent, normal operation cannot be performed in a weak network environment, and data loss is prone to occur due to a simple coverage strategy during data synchronization. Along with the upgrading of the requirements of aviation safety research on comprehensiveness, accuracy and intellectualization, the existing architecture cannot meet the reliability and high efficiency requirements of key business scenes, and an integrated, high-cooperation and strong-adaptation technical scheme is needed. Disclosure of Invention In order to solve the problems in the prior art, namely the problems that the recall ratio and precision ratio in the prior art are insufficient, the data synchronization is delayed and the consistency is poor, vector data and map association query are difficult to process efficiently, the mobile terminal lacks an offline processing mechanism, and further the requirements on the accuracy, efficiency and reliability of aviation security information processing cannot be met, the invention provides an aviation security knowledge hybrid retrieval system based on a credit-based distributed database, which comprises the following steps: the data storage and calculation layer is deployed in a distributed database cluster conforming to the credit-wound standard and comprises an integrated storage engine and a physical cooperative distribution module; The integrated storage engine is configured to receive structured service data and unstructured text data in the aviation security field, integrally integrate and store the structured service data and semantic vector data generated by the unstructured text data, and synchronously construct a B+ tree index aiming at the structured service data and a HNSW index aiming at the semantic vector data; The physical collaborative distribution module is configured to perform physical collaborative distribution storage on an entity table and a relationship table of a knowledge graph by using a table group mechanism of the distributed database so as to enable the entity table and the relationship table taking the entity as a starting point to be located on the same physical storage node; an intelligent engine layer comprising a hybrid search engine and a conflict coordination engine; the mixed search engine is configured to respond to a query request, call the B+ tree index and the HNSW index in parallel to search, and adopt a reciprocal rank fusion algorithm to carry out fusion sequencing on two paths of search results so as to generate a final search result list; The conflict coordination engine is configured to manage offline data operation of the mobile terminal, execute incremental data synchronization after network recovery, and resolve data conflicts based on predefined business rules; And the application service layer is configured to provide a data service interface, receive an external request, distribute the external request to the intelligent engine layer and package and return the processing result