CN-121996801-A - Legal element layering extraction method and system based on large model and knowledge graph

CN121996801ACN 121996801 ACN121996801 ACN 121996801ACN-121996801-A

Abstract

The invention relates to the technical field of artificial intelligence and judicial big data, and particularly provides a legal element layering extraction method and system based on a big model and a knowledge graph. The method comprises the steps of obtaining basic information and fact text of a case, automatically matching an confused crime group Gi according to the basic information and the fact text, reading four elements and element metadata of the confused crime group Gi through a knowledge graph KG, combining multiple crime elements and modeling unified fields to generate an annotated confused crime group template, constructing a prompt word and calling a large language model LLM to conduct reasoning so as to extract case elements of the confused crime group template, analyzing LLM output according to the case elements, checking and generating and storing a confused crime group element list.

Inventors

WANG FANG
ZHANG XU
LIU JIHUI
Qu Cunquan

Assignees

山东大学

Dates

Publication Date: 20260508
Application Date: 20260409

Claims (9)

1. The method for hierarchically extracting legal elements based on the large model and the knowledge graph is characterized by comprising the following steps of: Step 1, obtaining basic information and fact text of a case; Step 2, automatically matching the confusing criminal name group Gi according to the basic information and the fact text; Step 3, reading four elements and element metadata of the confusing crime name group Gi through the knowledge graph KG; Step 4, combining multiple criminal name elements and modeling with unified fields to generate a mixed criminal name group template with comments; step 5, constructing prompt words and calling a large language model LLM to perform reasoning so as to extract case elements of the confusing crime name group template; and 6, analyzing LLM output according to the case elements, checking, and generating and storing an confusing crime name group element list.
2. The method according to claim 1, wherein the basic information in the step 1 includes case number case_no, case crime name charge, case crime fact FD, case institute thinks hold_that.
3. The method according to claim 2, wherein the step 2 comprises: according to the case crime name or the pre-coarse granularity classification result, the mapping relation of the [ crime name- & gt confusing crime name group ] automatically determines the confusing crime name group Gi corresponding to the case, and if the confusing crime name group is not matched, the flow is directly ended or other processing paths are entered.
4. A method according to claim 3, wherein said step 3 comprises: Four elements and element metadata corresponding to each crime name in the confusing crime name group Gi are read through the knowledge graph KG, wherein the read contents comprise candidate value set content and annotation note information, and the element metadata comprise behavior essence, dangerous behavior, behavior characteristics, known elements and accomplished standards; The adopted knowledge graph KG is organized according to the following modes: The root node is criminal law knowledge graph, and the child nodes are specific criminal names; Under the crime name node, the method is decomposed into: a crime object, objectively, subjectively, a crime subject, subjectively, accomplished standard; recording at leaf nodes of each element: content is a candidate value set of the current element under the current crime name, and note is annotation information of the current element content to supplement legal connotation and applicable boundaries briefly expressed in the content and used for generating description fields in a confusing crime name group template.
5. The method according to claim 4, wherein the step 4 comprises: recursively combining the criminal name element structures, transversely aligning and de-duplicating the behavior essence, the dangerous behavior, the behavior characteristics, the obvious elements and accomplished standards through a multi-criminal name element combination and field alignment algorithm, respectively reserving a candidate value set of each criminal name for each field, and forming a unified and multi-level element metadata structure; and generating a JSON template according to the element metadata, and automatically adding annotation information to each field in the template to obtain an annotated confusing criminal name group template for large language model reasoning.
6. The method according to claim 5, wherein said step 5 comprises: and (3) taking the illustrative text, the annotated mixed crime group template and the case crime facts FD as input, submitting the input to a large language model, requesting to fill elements for each crime in the mixed crime group according to a template structure, and limiting field values from a given candidate value set in template annotation.
7. The method according to claim 6, wherein the step 6 includes: Extracting a JSON paragraph or a stripping mark symbol from an output text of LLM by using a regular expression, calling a JSON analysis function to convert the JSON paragraph or the stripping mark symbol into a dictionary structure, checking field integrity and checking validity of candidate values of analysis results, and sorting the checked results into a confusing crime name group element list and storing the confusing crime name group element list for subsequent analysis and modeling.
8. The method of claim 7, further comprising a downstream application after step 6; The confusing crime name group element list is respectively provided for a crime/qualitative auxiliary analysis module, a class search module and a crime prediction or sentency prediction model, and is respectively used for judging the crime name element which the case accords with, searching according to the element similarity and inputting as the characteristic.
9. A legal element layering extraction system based on a large model and a knowledge graph, the system comprising: the data preprocessing module is used for acquiring basic information and fact text of the case; The automatic matching module of the confusing criminal name group is used for automatically matching the confusing criminal name group Gi according to the basic information and the fact text; The knowledge graph management and field metadata extraction module is used for storing criminal law knowledge graphs KG, and reading four elements and element metadata of the confusing criminal name group Gi through the knowledge graphs KG; The multi-crime element merging and confusing crime group module is used for multi-crime element merging and unified field modeling and generating a annotated confusing crime group template; The large language model prompt construction and reasoning module is used for constructing prompt words and calling a large language model LLM to conduct reasoning so as to extract case elements of the confusing crime name group template; And the analysis output and validity verification module is used for analyzing the LLM output according to the case elements, verifying, and generating and storing an confusing crime name group element list.

Description

Legal element layering extraction method and system based on large model and knowledge graph Technical Field The invention relates to the technical field of artificial intelligence and judicial big data, in particular to a legal element layering extraction method and system based on a big model and a knowledge graph. Background Along with the advancement of the systems of the official document surfing the internet, judicial disclosure and the like, a large number of criminal official documents are formed in judicial practice. The documents are mostly recorded in natural language, and have loose structure, long space and scattered information distribution. Structured processing has become the basis for judicial big data applications. In criminal theory and trial practice, identification of criminal names is generally analyzed around four elements of criminal law, including criminal objects, namely, the normal benefit protected by torture law and the infringed object, objective aspects including contents such as behavior essence, dangerous behavior, dangerous result, composition mode, behavior characteristics, behavior stage and the like, criminal subjects including types of subjects, responsible ages, responsible capacities and the like, subjective aspects including criminal forms, criminal purposes, known elements, motivations, psychological states and the like. In a specific case, multiple types of crime names are crossed and overlapped on the elements, and particularly a plurality of typical confusing crime names such as fraud crimes, contract fraud crimes, job encroachment crimes, money stealing crimes and the like. Such crime names have commonalities in terms of dangerous behavior, constitution patterns, whether there is a contractual relationship, crime object, crime purpose, and the like, and also have differences in key constitution elements. Once mischaracterization occurs, the misinterpretation of the agent will be directly affected, and the public belief of the judicial will be adversely affected. In the prior art, (1) a traditional extraction method based on rules and keywords is to extract fields such as the information of a reported person, the name of a crime, the amount of a case involved, time and the like from a document through manually set rules, regular expressions, keyword dictionary and the like, and generally rely on keyword triggering and manual experience for preliminary judgment on complex elements such as behavior description, subjective malignancy and the like. For example, in research and implementation of relation extraction technology between knowledge elements oriented to legal text, entity recognition is performed based on rules and dictionary, and the relation is extracted by combining semi-supervised template matching iteration. However, the method has the following defects that ① rule writing and maintenance cost is high, the method depends on field expert experience seriously, large-scale adjustment is needed once laws or trial scales are changed, ② is limited in recognition of complex sentence patterns and hidden elements (such as 'explicit knowledge' and 'illegal occupation objective'), long-distance dependence and hidden expression are difficult to process, ③ is difficult to naturally express four elements of criminal methods and structural information of multiple layers and multiple dimensions of confusing criminal name groups, ④ is generally only extracted around a certain given criminal name, and element filling and comparison analysis can not be carried out on a plurality of confusing criminal name groups in parallel on the same case. (2) The sequence labeling and classifying method based on deep learning is to utilize CRF, biLSTM, BERT and other models to conduct entity recognition, element classification and sentence-level label prediction on legal documents. For example, the JLB-BiLSTM-CRF model is proposed in the technical study of criminal legal knowledge graph construction, the BERT is used for enhancing characterization and entity recognition, and the GENERATIVE NAMED ENTITY recognition framework for CHINESE LEGAL domain explores the use of a sequence-to-sequence generation type framework for legal entity recognition. The scheme has the following problems that the conventional model ① adopts a flat label system, a hierarchical structure of four elements, sub elements and specific criminal names is difficult to directly bear, the ② model output is usually a label ID or a brief category, one or more rounds of rule and normalization processing are needed to map to a unified and comparable element set, when the ③ faces to a plurality of confusing criminal names, labels or models are needed to be designed for each criminal name or a small range of criminal names, comparison and migration are not facilitated under a unified framework, the labeling cost corresponding to ④ is high, and once a new criminal name and a new element system are introduced, the new crimin