CN-121980292-A - Policy consistency assessment method and system based on large model and iterative clustering

CN121980292ACN 121980292 ACN121980292 ACN 121980292ACN-121980292-A

Abstract

The application discloses a policy consistency assessment method and system based on a large model and iterative clustering. The method comprises the steps of obtaining multiple policy texts, carrying out semantic vectorization representation, generating candidate policy clusters with moderate scale by adopting an iterative density clustering algorithm capable of dynamically adjusting radius based on text vectors, extracting structural elements of texts in each cluster, inputting elements into a pre-training large language model, and carrying out multidimensional consistency reasoning to generate an evaluation result. According to the scheme, iterative clustering is combined with large-model intelligent reasoning, so that automation of policy consistency assessment is realized. The flow focuses the potential conflict text through semantic vectorization and self-adaptive clustering, and then the deep semantic understanding capability of the large model is utilized for comparison and analysis, and an evaluation report is automatically generated. The method effectively overcomes the defects of low efficiency and insufficient coverage of the traditional manual evaluation, and can carry out high-efficiency and objective consistency analysis on massive multi-source policy files.

Inventors

NIU JUNYU
XU YINGXIAO
Zheng Canhan
JI XIANGYI
LI ZEZHEN
ZHOU RUIXIANG
BAI JINGJING
ZHENG QIAOFEI

Assignees

复旦大学

Dates

Publication Date: 20260505
Application Date: 20260408

Claims (10)

1. The policy consistency assessment method based on the large model and the iterative clustering is characterized by comprising the following steps of: S1, acquiring multiple policy texts in a target policy library; S2, carrying out semantic vectorization representation on the policy text to obtain a corresponding text vector; s3, clustering the policy texts by adopting an iterative density clustering algorithm based on the text vectors to generate a plurality of candidate policy clusters, wherein the iterative density clustering algorithm dynamically adjusts clustering radius parameters to classify the policy texts of which the text vector distribution meets a preset density condition and the number of the texts in the clusters falls into a preset number interval into the same candidate policy cluster; s4, extracting structural elements containing policy texts for each candidate policy cluster; S5, inputting the structural elements into a pre-trained large language model to perform multidimensional consistency reasoning, and generating a consistency evaluation result aiming at the candidate policy cluster.
2. The method according to claim 1, further comprising, prior to step S2: Slicing each policy text according to the text structure mark to obtain a plurality of policy text slices; The step S2 specifically comprises the step of carrying out semantic vectorization representation on each policy text slice to obtain a corresponding slice vector.
3. The method according to claim 2, characterized in that in step S2, the text vector or slice vector is generated using a pre-trained language model based on a transducer architecture.
4. The method according to claim 1, wherein in step S3, the iterative density clustering algorithm is an iterative DBSCAN algorithm; The dynamic adjustment of the cluster radius parameter specifically comprises the following steps: S31, setting an initial cluster radius, a minimum cluster scale, a maximum cluster scale and a radius adjustment step length; s32, performing DBSCAN clustering according to the current clustering radius to obtain a preliminary clustering result; S33, judging whether a policy cluster with the text quantity in the cluster exceeding the maximum cluster size exists in the preliminary clustering result; if yes, the current cluster radius is reduced, and the step S32 is returned; If not, outputting the policy cluster with the number of the text in the cluster not smaller than the minimum cluster size in the current clustering result as the candidate policy cluster.
5. The method according to claim 1, wherein in step S4, the extraction of the structured elements comprises: Extracting metadata of a policy text, wherein the metadata comprises at least one of a release mechanism, an effective date and an application range; at least one of key terms, policy intents, and policy entity relationships is extracted from the semantics of the policy text.
6. The method according to claim 1, wherein in step S5, the multidimensional consistent reasoning comprises: based on the structural elements, consistency comparison is carried out on at least one dimension among logic contradiction, coverage deletion and redundancy repetition among policy texts.
7. The method of claim 1, wherein in step S5, the consistency assessment results include at least one of an overall consistency score, a conflict point list, a complementation point list, and a revision suggestion.
8. A policy consistency assessment system based on a large model and iterative clustering, comprising: The policy text acquisition module is used for acquiring a plurality of policy texts in the target policy library; The vectorization module is used for carrying out semantic vectorization representation on the policy text to obtain a corresponding text vector; The iterative clustering module is used for clustering the policy texts by adopting an iterative density clustering algorithm based on the text vectors to generate a plurality of candidate policy clusters, wherein the iterative density clustering algorithm is used for classifying the policy texts, of which the text vector distribution meets a preset density condition and the number of the texts in the clusters falls into a preset number interval, into the same candidate policy cluster by dynamically adjusting a clustering radius parameter; the element extraction module is used for extracting structural elements containing policy texts for each candidate policy cluster; And the consistency reasoning module is used for inputting the structural elements into the pre-trained large language model to perform multidimensional consistency reasoning and generating a consistency assessment result aiming at the candidate policy cluster.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 7 when the program is executed by the processor.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1 to 7.

Description

Policy consistency assessment method and system based on large model and iterative clustering Technical Field The invention belongs to the technical field of public policy analysis, and particularly relates to a policy consistency assessment method and system based on a large model and iterative clustering. Background The policy consistency assessment is a key link for ensuring that various public policies of various levels are coordinated and compatible with implementation logics in the target and tool, forming a management resultant force, the traditional assessment method mainly relies on manual comparison of domain experts or post judgment based on the generated cases, has the problems of low coverage rate, strong subjectivity, serious hysteresis and the like, in recent years, although natural language processing technologies such as TF-IDF, BERT and the like are combined with clustering methods such as K-MEANS, DBSCAN and the like to be used for generating candidate policy clusters so as to improve the assessment efficiency, in the professional fields such as tax and the like, semantic characterization of the professional terms by general word vectors is not accurate enough, so that similarity difference of the clusters is obvious, the traditional clustering methods are easily interfered by noise and abnormal values, the quality of the clusters is influenced, in addition, in the policy consistency judgment stage, the existing research has not yet established an intelligent assessment framework capable of deeply fusing domain knowledge and large language model reasoning capability, efficient, accurate and interpretable automatic analysis is difficult to realize, and in particular, in the face of massive, multi-source and cross-level policy texts, the hidden conflict and logic contradiction and intelligent assessment of the prior art are difficult to systematically identify. Disclosure of Invention Object of the invention In order to overcome the defects, the invention aims to provide a policy consistency assessment method and system based on a large model and iterative clustering, so as to solve the problems of inaccurate professional semantic characterization, unstable clustering quality and the like in the generation stage of candidate policy clusters in the conventional policy consistency assessment method, and the technical problems of low assessment efficiency, insufficient accuracy and difficult large-scale application caused by the lack of an intelligent framework for integrating domain knowledge and large model reasoning in the consistency judgment stage. (II) technical scheme In order to achieve the above purpose, the technical scheme provided by the application is as follows: a policy consistency assessment method based on a large model and iterative clustering comprises the following steps: S1, acquiring multiple policy texts in a target policy library; s2, carrying out semantic vectorization representation on the policy text to obtain a corresponding text vector; S3, clustering the policy texts by adopting an iterative density clustering algorithm based on the text vectors to generate a plurality of candidate policy clusters, wherein the iterative density clustering algorithm dynamically adjusts clustering radius parameters to classify the policy texts with the text vector distribution meeting a preset density condition and the number of the texts in the clusters falling into a preset number interval into the same candidate policy cluster; S4, extracting structural elements containing policy texts for each candidate policy cluster; s5, inputting the structural elements into a pre-trained large language model to perform multidimensional consistency reasoning, and generating a consistency evaluation result aiming at the candidate policy cluster. By combining iterative density clustering with large language model intelligent reasoning, a set of efficient and automatic policy consistency assessment flow is constructed. The method comprises the steps of firstly converting unstructured policy texts into machine-understandable numerical representation by utilizing a semantic vectorization technology, laying a foundation for subsequent analysis, then adaptively generating candidate policy clusters with moderate scale and compact semantics by an iterative density clustering method, effectively focusing a potential inconsistent range, remarkably improving screening efficiency and accuracy, and finally comparing structural elements and carrying out logic analysis on the texts in the policy clusters by means of deep semantic understanding and multidimensional reasoning capability of a large language model, and automatically generating an detailed consistency evaluation report. The whole scheme not only solves the problems of strong subjectivity, insufficient coverage and serious hysteresis of the traditional manual evaluation, but also overcomes the defects of inaccurate professional semantic characteri