CN-121766418-B - Automatic revision method and system of rail transit regulation system based on knowledge base and LLM

CN121766418BCN 121766418 BCN121766418 BCN 121766418BCN-121766418-B

Abstract

The application discloses an automatic revision method and system of rail transit regulations based on a knowledge base and LLM, and relates to the technical field of revision of rail transit regulations, wherein the method comprises the steps of firstly constructing a regulation knowledge base, analyzing an original document into a plain text, slicing, vectorizing, storing the plain text, and associating and recording related IDs; the method comprises the steps of receiving revision objects and selected original document information uploaded by a user, analyzing the revision objects, slicing to generate a first array, identifying paragraphs to be revised in the array through a first large language model, aggregating to form a set to be revised, searching similar candidate sets in a locking knowledge base, inputting revision contents into a second large language model to generate revision results in a standard format and aggregating after cleaning and standardization, and finally automatically revising and warehousing according to preset rules. The application solves the problem that the traditional character comparison cannot understand the word adjustment but the semantics are unchanged or the large-section logic reorganization through large model identification, and realizes the automation of the regulation revision process.

Inventors

TANG ZHONGJUN
LU XINYU
YANG SHIYUAN
Wei Tinghu
LI ZHE
SONG HONGXIA
LI YUEZONG
GUO HAITAO
ZHANG FAN
YI YONGSHENG

Assignees

成都运达科技股份有限公司

Dates

Publication Date: 20260505
Application Date: 20260305

Claims (7)

1. The automatic revising method of the rail transit regulation system based on the knowledge base and the LLM is characterized by comprising the following steps: S1, constructing a regulation system knowledge base and recording a regulation system knowledge base ID, analyzing an original document into a plain text, and then cutting the original document into a plurality of small slices according to clauses, paragraphs or semantic rules; s2, receiving a revised object uploaded by a user, and acquiring a regulatory knowledge base ID and an original document ID corresponding to an original document selected by the user; S3, circularly traversing all candidate paragraphs in the first array, inputting all candidate paragraphs in the first array into a first large language model in the circulation process, identifying and extracting the paragraphs needing revising through the first large language model to generate a second array; s4, carrying out vector retrieval on each piece of revising content in the content set to be revised based on the scope of the knowledge base locked in the step S2, and returning K first candidate sets most similar to the piece of revising content, cleaning data in all the first candidate sets, reserving data consistent with the original document ID candidate content acquired in the step S2, and obtaining a second candidate set; And S5, automatically updating and revising all target revising results in the third array based on a preset document revising rule, and storing the target revising results in a warehouse.
2. The automated revising method of the rail transit regulation system based on the knowledge base and LLM of claim 1, wherein in step S1, if there is a picture in the original document, the picture is stored in the form of a web page link.
3. The method for automatically revising the rail transit regulation system based on the knowledge base and the LLM according to claim 1, wherein an abnormal retry mechanism is further introduced in step S3, wherein the abnormal retry mechanism is used for checking whether the second array returned by the first large language model is legal, if the returned second array is illegal, the large language model is triggered to retry, and a new second array is generated by using the first large language model, wherein the retry frequency does not exceed a preset maximum limit frequency.
4. The automated revising method of rail transit regulations based on a knowledge base and LLM of claim 1, wherein the output format of constraining all target revisions in step S4 specifically comprises a type, a text, an original document ID and a paragraph ID, the type comprising an addition, a modification and a deletion, the text comprising original regulatory content and modified content, the modified content being empty if the type is a deletion, the paragraph ID being-1 if the type is an addition, otherwise the original document ID.
5. The automated revising method of the rail transit regulations based on the knowledge base and LLM of claim 4, wherein the preset document revising rules in step S5 specifically comprise: the modified decision rule includes that the type is modification, the modified content is not equal to the original regulation content, and the original document ID and the paragraph ID are correct; The judging rule of deletion comprises that the type is deletion, the original regulation content is not empty, the modified content is empty, and the original document ID and the paragraph ID are correct; The newly added decision rule comprises that the type is newly added, the original regulation content is empty, the modified content is not empty, and the original document ID is correct, and the paragraph ID is equal to-1.
6. The automatic revising method of the rail transit regulation system based on the knowledge base and the LLM according to claim 1, wherein the step S5 is further provided with a manual calibration confirming step, specifically comprising manually calibrating all target revising results in the third array after updating and revising, confirming whether revising is performed manually, and if revising is confirmed, replacing the original content with new content, and storing.
7. An automated system for revising track traffic regulations based on a knowledge base and LLM for use in the automated method for revising track traffic regulations based on a knowledge base and LLM according to any one of claims 1-6, comprising: The knowledge base construction module is used for completing analysis, slicing, vectorization processing and regulation knowledge base construction of the original document and associating the regulation knowledge base ID, the original document ID and the small slice ID; The document preprocessing module is used for receiving the revision object and the original document selection, completing the analysis and slicing of the revision object and generating a first array; the modification identification module is used for identifying and extracting paragraphs needing revising of all candidate paragraphs in the first array through the first large language model, verifying the validity of the result through the abnormal retry mechanism and generating a content set to be revised in an aggregation mode; the revising decision module is used for obtaining a candidate set through vector retrieval, inputting a second large language model after screening and standardization, and generating a third array according to a specified format; And the automatic execution module automatically updates and revises the target revision result in the third array according to the preset document revision rule, calibrates the revision result and finally stores the revision result in a warehouse.

Description

Automatic revision method and system of rail transit regulation system based on knowledge base and LLM Technical Field The invention relates to the technical field of revision of rail transit regulations, in particular to an automatic revision method and system of rail transit regulations based on a knowledge base and LLM. Background In the track traffic industry, the regulation system is used as an important basis for guaranteeing traffic safety and standardizing the operation flow, and has the characteristics of large volume, quick updating, frequent revision of regulations along with technology upgrading, operation strategy adjustment or change of superior policies, high strict requirement, potential safety hazard possibly caused by error and leakage of any clause, difficult version tracing, low efficiency, difficulty in structuring record of modified content and difficulty in subsequent tracing due to the fact that the traditional revision mode is manually compared and modified in a Word editor and the like. Existing regulatory recognition is typically performed using a character difference based alignment tool (e.g., diff algorithm) or simple keyword matching. These methods have significant drawbacks in that: 1) The revision efficiency is low, the manual comparison time is long, and the manual revision cost is obviously increased when the regulation system scale is large; 2) The problems of omission, error change and the like are easy to occur, and the consistency and the accuracy of revision are difficult to ensure; 3) Lack of knowledge correlation-inability to automatically locate modified content to specific term locations in the original large regulatory knowledge base; 4) The operation is not automatic, namely, after the difference is identified, the deletion, insertion or replacement operation still needs to be manually executed; Therefore, there is a need for a technical solution capable of automatically identifying the point of modification in the regulations and completing the revision of the regulations, for improving the revision efficiency and reliability. Disclosure of Invention The invention aims to overcome the defects of the prior art and provides an automatic revising method and system for a rail transit regulation system based on a knowledge base and LLM. The aim of the invention is realized by the following technical scheme: In a first aspect, the application discloses an automatic revision method of rail transit regulations based on a knowledge base and LLM, comprising the following steps: S1, constructing a regulation system knowledge base and recording a regulation system knowledge base ID, analyzing an original document into a plain text, and then cutting the original document into a plurality of small slices according to clauses, paragraphs or semantic rules; s2, receiving a revised object uploaded by a user, and acquiring a regulatory knowledge base ID and an original document ID corresponding to an original document selected by the user; S3, circularly traversing all candidate paragraphs in the first array, inputting all candidate paragraphs in the first array into a first large language model in the circulation process, identifying and extracting the paragraphs needing revising through the first large language model to generate a second array; s4, carrying out vector retrieval on each piece of revising content in the content set to be revised based on the scope of the knowledge base locked in the step S2, and returning K first candidate sets most similar to the piece of revising content, cleaning data in all the first candidate sets, reserving data consistent with the original document ID candidate content acquired in the step S2, and obtaining a second candidate set; And S5, automatically updating and revising all target revising results in the third array based on a preset document revising rule, and storing the target revising results in a warehouse. Based on the first aspect, in step S1, if there is a picture in the original document, the picture is stored in a web page link form. Based on the first aspect, an abnormal retry mechanism is further introduced in step S3, and is used for checking whether the second array returned by the first large language model is legal, if the returned second array is illegal, triggering the large language model to retry, and generating a new second array by using the first large language model, wherein the retry frequency does not exceed the preset maximum limit frequency. Based on the first aspect, the output format of constraining all the target revision results in step S4 specifically includes a type, a text, an original document ID and a paragraph ID, where the type includes new addition, modification and deletion, the text includes original regulatory content and modified content, if the type is deletion, the modified content is empty, if the type is new addition, the paragraph ID is-1, otherwise the original document ID. Based on th