CN-120449885-B - Multi-language contract intelligent comparison method, system, equipment and storage medium

CN120449885BCN 120449885 BCN120449885 BCN 120449885BCN-120449885-B

Abstract

The invention discloses a multi-language contract same intelligent comparison method, a system, equipment and a storage medium, and the technical scheme is characterized in that a first text content and a second text content are obtained by preprocessing a first language version contract and a second language version contract; the method comprises the steps of carrying out wrongly written word correction on a first text content and a second text content to obtain a first corrected text and a second corrected text, traversing all clauses in the first corrected text and the second corrected text, calculating overall semantic similarity and multi-level semantic similarity between the two clauses, calculating global matching degree between the two clauses, calculating similarity scores between the two clauses according to the overall semantic similarity, the multi-level semantic similarity and the global matching degree, judging whether the clauses are problem clauses according to the similarity scores, identifying problem categories of the problem clauses, and generating corresponding correction suggestions according to the problem categories. The invention realizes automatic comparison of the multilingual contracts and ensures the consistency and accuracy of the multilingual contracts.

Inventors

WU XIAOMING
Wu Jienan
YU PING
XU YING

Assignees

广州工程技术职业学院

Dates

Publication Date: 20260508
Application Date: 20250424

Claims (8)

1. The intelligent multi-language contract comparison method at least comprises a first language version contract and a second language version contract, wherein the second language version contract is a contract for translating the first language version contract into the second language, and the intelligent multi-language contract comparison method is characterized by comprising the following steps: Preprocessing the first language version contract to obtain first text content, and preprocessing the second language version contract to obtain second text content; performing wrongly written word correction on the first text content to obtain a first corrected text, and performing wrongly written word correction on the second text content to obtain a second corrected text; Calculating the overall semantic similarity and the multi-level semantic similarity between a first term in the first correction text and a second term in the second correction text, wherein the second term is a term for translating the first term into a second language, and calculating the global matching degree between the first correction text and the second correction text; Calculating to obtain a similarity score between the first clause and the second clause according to the overall semantic similarity, the multi-level semantic similarity and the global matching degree, and judging whether the second clause is a problem clause according to the similarity score; identifying a problem category of the problem clause, and generating a corresponding correction proposal according to the problem category; The calculating a multi-level semantic similarity between the first term and a term of a second corrected text includes: calculating word similarity, phrase similarity and sentence similarity between the first clause and the second clause; Calculating the average value of the word similarity, the phrase similarity and the sentence similarity to obtain multi-level semantic similarity; correcting wrongly written words of text contents to obtain corrected text, wherein when the text contents are first text contents, the corrected text is first corrected text, and when the text contents are second text contents, the corrected text is second corrected text, and the method comprises the following steps: calculating the word similarity between the words of the text content and the words in the industry word stock by adopting a first similarity calculation formula, wherein the first similarity calculation formula is as follows: ; ; wherein w represents words of text content, t represents words in an industry word stock, and Representing the edit distance between word w and word t, said Representing the maximum value of the character string lengths in the word w and the word t, wherein the maximum value is obtained by Cosine similarity between word vector representing word w and word vector of word t, said Cosine similarity between the embedded vector representing word w and the embedded vector of word t, said representation A first weight of Representing a second weight of Representing a third weight of Representing the similarity between word w and word t; And under the condition that the word similarity is larger than the correction threshold value, replacing the words of the text content by the words in the industry word stock to obtain the correction text.
2. The multilingual co-intelligence comparison method of claim 1 wherein the calculating word similarity between the first term and the second term comprises: converting each word of the first clause into a first word vector to obtain a first word vector set; converting each word of the second clause into a second word vector to obtain a second word vector set; Calculating cosine similarity between each word in the first word vector set and all words in the second word vector set, taking the maximum cosine similarity as the word vector similarity of each word, and calculating the mean value of the word vector similarity to obtain the word similarity of the first term and the second term; The calculating phrase similarity between the first clause and the second clause includes: Converting each phrase of the first clause into a first phrase vector with weight to obtain a first phrase vector set; Converting each phrase of the second clause into a second phrase vector with weight to obtain a second phrase vector set; And calculating cosine similarity between each phrase in the first phrase vector set and all phrases in the second phrase vector set, taking the maximum cosine similarity as the phrase vector similarity of each phrase, and calculating the average value of the phrase vector similarity to obtain the phrase similarity of the first clause and the second clause.
3. The multilingual simultaneous intelligent comparison method of claim 1 wherein calculating the global matching between the first correction text and the second correction text comprises: Converting each sentence in the first correction text into a first sentence vector, and carrying out maximum pooling on all the first sentence vectors to obtain the maximum value of each dimension, wherein the maximum value of all the dimensions forms a first key semantic feature; Converting each sentence in the second correction text into a second sentence vector, and carrying out maximum pooling on all second sentence vectors to obtain a maximum value of each dimension, wherein the maximum value of all dimensions forms a second key semantic feature; and calculating cosine similarity between the first key semantic features and the second key semantic features to obtain global matching degree.
4. The multi-language contract intelligent comparison method according to claim 1, wherein preprocessing the contract to obtain text contents, when the contract is a first language version contract, the text contents are first text contents, and when the contract is a second language version contract, the text contents are second text contents, comprising: Image preprocessing is carried out on the scanning piece of the contract to obtain a corresponding image; performing layout analysis and region segmentation on the image to obtain a text region, a table region and a picture region, wherein the picture region comprises pictures, headers/footers, invalid characters, official stamps, signatures and the like; Converting the text region to obtain a text, identifying a clause level of the text by adopting a regular expression, identifying title keywords of the text with the identified clause level by adopting an NLP model, obtaining semantic titles of each clause level of the text, generating a tree structure of the text according to father-son relations of the clause level, and generating a hierarchical text according to the tree structure and the semantic titles of each clause level; identifying the row, column and cell structure of the table in the table area, and reconstructing the table structure; and merging the hierarchical text and the corresponding table structure to obtain text content.
5. The multilingual simultaneous intelligent comparison method of claim 1 wherein identifying the problem category of the terms of the problem comprises: Judging whether the word similarity between the first clause and the second clause in the problem clauses is lower than phrase similarity and sentence similarity, and whether the overall semantic similarity is lower than a preset threshold value, if so, judging that the problem category is translation deviation; Extracting keywords of a first term and a second term in the problem terms, comparing the keywords of the second term with the keywords of the first term, and judging that the problem category is the content deletion of the second term if the second term lacks the keywords of the first term; Checking whether different legal terms are used between the first term and the second term by using an industry legal dictionary, and if so, judging that the problem category is term change; and carrying out dependency syntax analysis on the first clause and the second clause, judging whether the grammar relation structure between the first clause and the second clause is different, and if so, judging that the problem type is grammar adjustment.
6. The multi-language contract intelligent comparison system at least comprises a first language version contract and a second language version contract, wherein the second language version contract is a contract for translating the first language version contract into the second language, and the multi-language contract intelligent comparison system is characterized by comprising the following components: The preprocessing module is used for preprocessing the first language version contract to obtain first text content and preprocessing the second language version contract to obtain second text content; the correction module is used for correcting the wrongly written characters of the first text content to obtain a first corrected text, and correcting the wrongly written characters of the second text content to obtain a second corrected text; the computing module is used for computing the overall semantic similarity and the multi-level semantic similarity between a first clause in the first correction text and a second clause in the second correction text, wherein the second clause is a clause for translating the first clause into a second language, and computing the global matching degree between the first correction text and the second correction text; The calculation judging module is used for calculating to obtain a similarity score between the first clause and the second clause according to the overall semantic similarity, the multi-level semantic similarity and the global matching degree, and judging whether the second clause is a problem clause according to the similarity score; The suggestion generation module is used for identifying the problem category of the problem clause and generating a corresponding correction suggestion according to the problem category; The calculating a multi-level semantic similarity between the first term and a term of a second corrected text includes: calculating word similarity, phrase similarity and sentence similarity between the first clause and the second clause; Calculating the average value of the word similarity, the phrase similarity and the sentence similarity to obtain multi-level semantic similarity; correcting wrongly written words of text contents to obtain corrected text, wherein when the text contents are first text contents, the corrected text is first corrected text, and when the text contents are second text contents, the corrected text is second corrected text, and the method comprises the following steps: calculating the word similarity between the words of the text content and the words in the industry word stock by adopting a first similarity calculation formula, wherein the first similarity calculation formula is as follows: ; ; wherein w represents words of text content, t represents words in an industry word stock, and Representing the edit distance between word w and word t, said Representing the maximum value of the character string lengths in the word w and the word t, wherein the maximum value is obtained by Cosine similarity between word vector representing word w and word vector of word t, said Cosine similarity between the embedded vector representing word w and the embedded vector of word t, said representation A first weight of Representing a second weight of Representing a third weight of Representing the similarity between word w and word t; And under the condition that the word similarity is larger than the correction threshold value, replacing the words of the text content by the words in the industry word stock to obtain the correction text.
7. Computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1-5 when the computer program is executed.
8. Computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1-5.

Description

Multi-language contract intelligent comparison method, system, equipment and storage medium Technical Field The invention belongs to the technical field of automatic office, and particularly relates to a multi-language integrated intelligent comparison method, a system, equipment and a storage medium. Background In a nationwide business collaboration, enterprises often need to sign multi-language contracts, and therefore enterprises often need to compare contracts of different language versions to ensure content compliance. However, at the time of drafting, manual input may introduce wrongly written words, so that accuracy of contract comparison is reduced, and translation errors may exist, so that contracts of different language versions are semantically deviated. In addition, the traditional contract comparison mode is usually manual comparison, and manual comparison is time-consuming and labor-consuming. Disclosure of Invention The invention aims to provide a multi-language contract intelligent comparison method, a system, equipment and a storage medium, which can realize automatic comparison of multi-language contracts and ensure consistency and accuracy of the multi-language contracts. The first aspect of the invention provides an intelligent comparison method of multi-language contracts, wherein the multi-language contracts at least comprise a first language version contract and a second language version contract, and the second language version contract is a contract for translating the first language version contract into the second language, and the intelligent comparison method comprises the following steps: Preprocessing the first language version contract to obtain first text content, and preprocessing the second language version contract to obtain second text content; performing wrongly written word correction on the first text content to obtain a first corrected text, and performing wrongly written word correction on the second text content to obtain a second corrected text; Calculating the overall semantic similarity and the multi-level semantic similarity between a first term in the first correction text and a second term in the second correction text, wherein the second term is a term for translating the first term into a second language, and calculating the global matching degree between the first correction text and the second correction text; Calculating to obtain a similarity score between the first clause and the second clause according to the overall semantic similarity, the multi-level semantic similarity and the global matching degree, and judging whether the second clause is a problem clause according to the similarity score; and identifying the problem category of the problem clause, and generating a corresponding correction proposal according to the problem category. In some embodiments, the computing the multi-level semantic similarity between the first term and the term of the second corrected text comprises: calculating word similarity, phrase similarity and sentence similarity between the first clause and the second clause; and calculating the average value of the word similarity, the phrase similarity and the sentence similarity to obtain the multi-level semantic similarity. In some embodiments, the computing the word similarity between the first term and the second term comprises: converting each word of the first clause into a first word vector to obtain a first word vector set; converting each word of the second clause into a second word vector to obtain a second word vector set; Calculating cosine similarity between each word in the first word vector set and all words in the second word vector set, taking the maximum cosine similarity as the word vector similarity of each word, and calculating the mean value of the word vector similarity to obtain the word similarity of the first term and the second term; The calculating phrase similarity between the first clause and the second clause includes: Converting each phrase of the first clause into a first phrase vector with weight to obtain a first phrase vector set; Converting each phrase of the second clause into a second phrase vector with weight to obtain a second phrase vector set; And calculating cosine similarity between each phrase in the first phrase vector set and all phrases in the second phrase vector set, taking the maximum cosine similarity as the phrase vector similarity of each phrase, and calculating the average value of the phrase vector similarity to obtain the phrase similarity of the first clause and the second clause. In some embodiments, the computing a global match between the first correction text and the second correction text comprises: Converting each sentence in the first correction text into a first sentence vector, and carrying out maximum pooling on all the first sentence vectors to obtain the maximum value of each dimension, wherein the maximum value of all the dimensions forms a first key sema