CN-122021638-A - Method and system for extracting and analyzing stroke text based on large language model

CN122021638ACN 122021638 ACN122021638 ACN 122021638ACN-122021638-A

Abstract

The invention belongs to the technical field of artificial intelligence, but not limited to, and particularly relates to a method and a system for extracting and analyzing a stroke text based on a large language model, wherein the method comprises the following steps of S1, establishing a case and uploading the stroke; the method comprises the steps of S2, single-copy stroke element extraction, S3, element merging analysis, S4, element merging and relation adjustment, S5, relation map updating, S6, contradiction analysis, S7, case detail generation and S8, result display. The invention can realize comprehensive automation of the analysis of the strokes, remarkably improves the processing efficiency and accuracy, supports element fusion and contradiction analysis of multiple strokes, improves the integrity and reliability of the case analysis, has better extraction and analysis precision than the traditional rule or statistical method by understanding the semantic context through LLM, has good expandability, and is suitable for analysis scenes of various cases.

Inventors

XU CHUNMEI
YOU JIANYOU
JIANG LIANGLIANG
Gan Chenjiang

Assignees

南威软件股份有限公司

Dates

Publication Date: 20260512
Application Date: 20251223

Claims (10)

1. A method for processing the stroke information of an integrated artificial intelligence algorithm is characterized by comprising the following steps: step1, establishing a case and receiving at least 1 stroke file; Step 2, extracting the person, the article, the event, the time and the place from the single stroke list through a language model; Step 3, inputting the elements of the current stroke and the existing elements in the case into a language model together to obtain combinable elements; Step 4, executing merging operation on the combinable elements and adjusting the association relation of the combinable elements; step 5, writing the combined elements into a relation map of the case; step 6, identifying character contradictions, article contradictions and event contradictions in the relation map through the language model; and 7, generating case details according to all the records of the cases through a language model.
2. The method of claim 1, wherein in step 1, multiple continuous uploading of the strokes is supported and automatically associated to the same case.
3. The method of claim 1, wherein the sources of contradiction are categorized in step 6, the categorization comprising contradictions between multiple records and between multiple records by a single person.
4. The method of claim 1, wherein the merging operation in step 4 includes generating a unified element identification and redirecting all relationships directed to the original element to the unified element.
5. The method of claim 1, wherein in step 7, if the case details already exist, the original case details and the latest entry content are input into the language model together, and updated case details are obtained.
6. A transcript information processing system for implementing the method of claim 1, comprising: the case building and writing uploading module is used for building cases and receiving writing files; The element extraction module is used for calling the language model to extract characters, articles, events, time, places and association relations in the stroke list; the merging analysis module is used for judging whether the current stroke record element and the existing element belong to the same entity; The relation adjustment module is used for executing element combination and relation adjustment; The map updating module is used for writing the elements into the relation map; The contradiction analysis module is used for detecting contradictions in the relation map; The detail generation module is used for generating case details.
7. The system of claim 6, wherein the conflict analysis module reads all elements and relationships in the relationship graph, identifies character conflicts, item conflicts, and event conflicts through the language model, and distinguishes between multi-person or single-person multi-person conflicts.
8. The system of claim 6, wherein the element extraction module is configured to output structured element data from a single transcript and generate a set of associations.
9. A transcript information processing electronic device comprising a processor and a memory, the memory storing a program executable on said processor, the program when executed implementing the method steps of claim 1.
10. The electronic device of claim 9, wherein the electronic device includes a display interface for interactively displaying a case timeline, a relationship map, case details, and contradictory information.

Description

Method and system for extracting and analyzing stroke text based on large language model Technical Field The invention belongs to the technical field of artificial intelligence, but is not limited to, and particularly relates to a method and a system for extracting and analyzing a stroke text based on a large language model. Background At present, the analysis of the records of the cases is mostly dependent on manual reading and arrangement, or adopts traditional natural language processing technologies such as keyword matching based on rules, entity recognition and the like. The methods generally can only extract limited types of information, are difficult to automatically identify complex relationships among elements, and cannot effectively integrate information in multiple strokes and perform contradiction analysis. In view of the above analysis, the technical problem that needs to be solved in the prior art is: (1) The manual analysis efficiency is low, the manual analysis is easily influenced by subjective factors, and the consistency is poor; (2) The traditional automatic method can not understand the context semantics, and the accuracy rate of extracting the elements and the relations is low; (3) The same elements in the multiple strokes cannot be automatically identified and combined; (4) Automatic discovery and analysis capabilities for contradictory descriptions are lacking. Related prior art, such as the method disclosed in US20200218744A1 (titled "Extracting entity relations from semi-structured information"), determines relationships between entities by extracting feature vectors from unstructured portions of a record, weighting based on topic vectors of the structured portions, and classifying with a machine learning model. However, the prior art still has the following technical problems that on one hand, the identified entity relationship is limited to the inside of a single document or record, and the adjustment of the entity merging and association relationship across multiple strokes and across time dimension is difficult to realize, and on the other hand, the type of contradiction (such as character contradiction and event contradiction) and the source thereof are not clearly identified, and the contradiction analysis and the dynamic case detail generation based on a language model are not realized, so that the completeness and the time sequence dynamic updating capability of the whole case knowledge graph are insufficient. Disclosure of Invention Aiming at the problems existing in the prior art, the invention provides a method and a system for extracting and analyzing a stroke text based on a large language model by integrating an AI algorithm, which aim to solve the problems in the prior art and realize the automation and the intelligent analysis of the stroke content, and comprise element extraction, relationship construction, element combination, contradiction identification and case detail generation. The invention is realized in such a way that the method for extracting and analyzing the written text based on the large language model integrated with the AI algorithm comprises the following steps: S1, uploading a case establishment and a stroke list; s2, extracting single-copy stroke elements; S3, element combination analysis; s4, element combination and relation adjustment; S5, updating a relation map; s6, contradiction analysis; s7, generating case details; s8, displaying results. Further, S1 specifically comprises the steps of creating a case through a management module, supporting multiple uploading of multiple written files, and automatically recording and associating the system to the same case. S2, obtaining single-copy written content, transmitting the single-copy written content into an LLM model for analysis, and extracting the following elements: Person, article, event, time, place, and element. Further, S3 specifically comprises the steps of reading existing element data of the case from the relation map, transmitting the element data and the element data extracted from the current stroke list into the LLM model, judging which elements are the same entity by the LLM, and outputting a combinable element list. Further, S4 specifically comprises the steps of merging the combinable elements through code logic, unifying the combinable elements into one element node, and adjusting all association relations to enable the association relations to point to the merged elements, so that the consistency of the atlas is ensured. S5, integrating the combined elements, the newly added elements and the original elements into structural data, and updating the structural data into a relational graph database of the case. Further, S6 comprises reading all elements and relations in the relation map, and transmitting the relation map into the LLM model for carrying out contradiction analysis to identify article contradiction, character contradiction and event contradiction; The contradiction