CN-115905535-B - Contract classification method, system and related equipment based on deep learning
Abstract
The invention discloses a contract classifying method, system and related equipment based on deep learning, wherein the method comprises the steps of obtaining a contract title of a contract to be classified, obtaining a title category through a trained title classifying model according to the contract title, wherein the title category is any one of a plurality of preset categories, selecting and obtaining a target content classifying model from a plurality of preset trained content classifying models according to the title category, wherein one trained content classifying model corresponds to one preset category, obtaining contract content of the contract to be classified, and obtaining the contract category corresponding to the contract to be classified through the target content classifying model according to the contract content, and the title classifying model and the content classifying model are based on a neural network model of deep learning. Compared with the prior art, the method and the device are beneficial to improving the accuracy of contract classification.
Inventors
- XUE ZIQIANG
- YANG MIN
- LEI YU
Assignees
- 中国科学院深圳先进技术研究院
- 深圳得理科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20221130
Claims (7)
- 1. A deep learning-based contract classification method, the method comprising: acquiring a contract title of a contract to be classified, and acquiring a title category through a trained title classification model according to the contract title, wherein the title category is any one of a plurality of preset categories; selecting and acquiring a target content classification model from a plurality of preset trained content classification models according to the title category, wherein one trained content classification model corresponds to one preset category; acquiring contract contents of the contracts to be classified, and acquiring contract types corresponding to the contracts to be classified through the target content classification model according to the contract contents; Wherein the title classification model and the content classification model are deep learning-based neural network models; The method comprises the steps of acquiring a contract title of a contract to be classified, acquiring a title category through a trained title classification model according to the contract title, extracting characteristics of the contract title through a preset characteristic extractor and acquiring a title characteristic vector corresponding to the contract title; The method comprises the steps of obtaining contract contents of a contract to be classified, obtaining contract categories corresponding to the contract to be classified through a target content classification model according to the contract contents, extracting content feature vectors corresponding to the contract contents through a preset feature extractor according to the contract contents, inputting the content feature vectors into the target content classification model to obtain the probability of each category through a normalization index function, and obtaining the predicted contract category which is output by the target content classification model and obtains the maximum probability as the contract category corresponding to the contract to be classified; The content feature vector is formed by splicing a first content feature sub-vector and a second content feature sub-vector, the content feature vector corresponding to the contract content is obtained through extraction of the preset feature extractor according to the contract content, the content feature vector comprises the pre-processing content which is obtained after the contract content is processed according to preset pre-processing operation, the pre-processing operation comprises deletion of stop words, the pre-processing content is divided into a first part of content and a second part of content according to a preset contract division word number threshold, when the total word number of the pre-processing content does not exceed the contract division word number threshold, the first part of content comprises all the pre-processing content, the second part of content is empty, when the total word number of the pre-processing content exceeds the contract division word number threshold, the first part of content comprises a front contract division word number threshold of the pre-processing content, all the pre-processing content except the first part of content is divided into a first part of content and the second part of content, the first part of content is divided into the first part of content and the second part of content according to the preset contract division word number threshold, the first part of content is extracted into the first part of content and the second part of content is obtained through extraction of the feature vector, and the second part of feature vector is obtained through extraction of the first part of feature vector and the second part of feature vector is obtained.
- 2. The deep learning based contract classification method according to claim 1, wherein when the second partial content is not empty, the feature extraction of the second partial content by the preset feature extractor and the obtaining of the second content feature sub-vector comprise: Performing sentence division on the second part of content to obtain a plurality of sentences to be processed; respectively calculating word frequency-inverse document frequency corresponding to each statement to be processed; Obtaining the number of target sentences, selecting the number of target sentences with highest word frequency-inverse document frequency from the plurality of target sentences to be processed, and forming a target paragraph; Inputting the target paragraph into the preset feature extractor, and carrying out feature extraction on the target paragraph through the preset feature extractor to obtain the second content feature sub-vector.
- 3. Deep learning based contract classification method according to claim 1 or 2, characterized in that the title classification model is trained according to the following steps: Inputting training contract title feature vectors in title training data into the title classification model, and acquiring training title categories corresponding to the training contract title feature vectors through the title classification model, wherein the title training data comprises a plurality of contract title data sets, and each contract title data set comprises training contract title feature vectors and labeling title categories corresponding to the training contract title feature vectors; And adjusting model parameters of the title classification model according to the training title category and the labeling title category, and continuously executing the step of inputting the training contract title feature vector in the title training data into the title classification model until a first preset training condition is met, so as to obtain a trained title classification model.
- 4. A deep learning based contract classification method according to claim 3, characterized in that each of said content classification models is trained using different content training data, one of said content classification models being trained according to the steps of: Acquiring a preset category corresponding to the content classification model, and selecting a group of content training data from a plurality of preset groups of content training data according to the preset category as target content training data corresponding to the content classification model, wherein the target content training data comprise a plurality of target contract content data groups, each target contract content data group comprises a training contract content feature vector and a labeling contract category corresponding to the training contract content feature vector, and the title category of a contract corresponding to the training contract content feature vector is the same as the preset category corresponding to the content classification model; inputting the training contract content feature vector in the target content training data into the content classification model, and acquiring the training contract category corresponding to the target content training data through the content classification model; And adjusting model parameters of the content classification model according to the training contract category and the labeling contract category, and continuously executing the step of inputting the training contract content feature vector in the target content training data into the content classification model until a second preset training condition is met, so as to obtain a trained content classification model.
- 5. A deep learning-based contract classification system, the system comprising: The title classification module is used for acquiring the title of the contract to be classified, and acquiring the title category through a trained title classification model according to the title of the contract, wherein the title category is any one of a plurality of preset categories; A content classification model selection module, configured to select and obtain a target content classification model from a plurality of preset trained content classification models according to the title category, where one of the trained content classification models corresponds to one of the preset categories; The contract classification module is used for acquiring contract contents of the contracts to be classified, and acquiring contract types corresponding to the contracts to be classified through the target content classification model according to the contract contents; Wherein the title classification model and the content classification model are deep learning-based neural network models; The title classification module is specifically used for acquiring a contract title of the contract to be classified, extracting the characteristics of the contract title through a preset characteristic extractor and acquiring a title characteristic vector corresponding to the contract title, inputting the title characteristic vector into the trained title classification model and acquiring the title category output by the title classification model; The contract classifying module is specifically used for acquiring contract contents of the contract to be classified, extracting and acquiring content feature vectors corresponding to the contract contents through the preset feature extractor according to the contract contents, inputting the content feature vectors into the target content classifying model to obtain probability of each category through a normalized exponential function, and acquiring predicted contract categories which are output by the target content classifying model and obtain maximum probability as contract categories corresponding to the contract to be classified; The content feature vector is formed by splicing a first content feature sub-vector and a second content feature sub-vector, the contract classification module is further specifically configured to process the contract content according to a preset preprocessing operation to obtain preprocessed content, wherein the preprocessing operation comprises deleting stop words, the preprocessed content is divided into first partial content and second partial content according to a preset contract division word number threshold, when the total word number of the preprocessed content does not exceed the contract division word number threshold, the first partial content comprises all the preprocessed content, the second partial content is empty, when the total word number of the preprocessed content exceeds the contract division word number threshold, the first partial content comprises all the preceding contract division word number threshold words of the preprocessed content, the second partial content comprises all the contents except the first partial content, feature extraction is performed on the first partial content through the preset feature extractor, and the first partial content is obtained, and when the total word number of the preprocessed content exceeds the contract division word number threshold, the first partial content is obtained, the feature extractor is empty, and the second partial content is obtained, and when the feature vector is obtained.
- 6. An intelligent terminal comprising a memory, a processor, and a deep learning based contract classification program stored on the memory and executable on the processor, the deep learning based contract classification program when executed by the processor implementing the deep learning based contract classification method steps of any of claims 1-4.
- 7. A computer readable storage medium, wherein a deep learning based contract classification program is stored on the computer readable storage medium, which when executed by a processor, implements the deep learning based contract classification method steps of any of claims 1-4.
Description
Contract classification method, system and related equipment based on deep learning Technical Field The invention relates to the technical field of text classification, in particular to a contract classification method, a contract classification system and related equipment based on deep learning. Background With the development of social economy, people have increasingly greater demands for signing contracts, and reasonable management of contracts has also been increasingly emphasized. When managing contracts, contract classification is required. In the prior art, a text classification model is generally used for classifying contracts, specifically, all contents in the contracts are input into the text classification model to be classified by the text classification model. The problem in the prior art is that when a text classification model is used for classifying the content of the whole contract, the same processing is required to be carried out on all the content of the contract, and the different influences of different parts of the contract on the contract classification are not considered, so that the accuracy of the contract classification is not beneficial to improvement. Accordingly, there is a need for improvement and development in the art. Disclosure of Invention The invention mainly aims to provide a contract classifying method, a contract classifying system and related equipment based on deep learning, and aims to solve the problem that in the prior art, when a text classifying model is used for classifying the content of a whole contract, different influences of different parts of the contract on contract classification are not considered, and the accuracy of contract classification is not improved. In order to achieve the above object, a first aspect of the present invention provides a deep learning-based contract classification method, wherein the deep learning-based contract classification method includes: Acquiring a contract title of a contract to be classified, and acquiring a title category through a trained title classification model according to the contract title, wherein the title category is any one of a plurality of preset categories; Selecting and acquiring a target content classification model from a plurality of preset trained content classification models according to the title category, wherein one of the trained content classification models corresponds to one of the preset categories; acquiring the contract content of the contract to be classified, and acquiring the contract category corresponding to the contract to be classified through the target content classification model according to the contract content; Wherein the title classification model and the content classification model are neural network models based on deep learning. Optionally, the acquiring the contract title of the contract to be classified, according to the contract title, acquiring the title category through a trained title classification model includes: Acquiring the contract titles of the contracts to be classified; Extracting the characteristics of the contract titles through a preset characteristic extractor and obtaining title characteristic vectors corresponding to the contract titles; and inputting the title feature vector into the trained title classification model and obtaining the title category output by the title classification model. Optionally, the acquiring the contract content of the contract to be classified, according to the contract content, acquiring the contract category corresponding to the contract to be classified through the target content classification model includes: acquiring contract contents of the contracts to be classified; according to the contract content, extracting and obtaining a content feature vector corresponding to the contract content through the preset feature extractor; And inputting the content feature vector into the target content classification model and obtaining the contract category corresponding to the contract to be classified, which is output by the target content classification model. Optionally, the content feature vector is formed by splicing a first content feature sub-vector and a second content feature sub-vector, and the extracting, by the preset feature extractor, the content feature vector corresponding to the contract content according to the contract content includes: Processing the contract content according to a preset preprocessing operation to obtain preprocessing content, wherein the preprocessing operation comprises deleting stop words; Dividing the preprocessing content into a first part of content and a second part of content according to a preset contract division word number threshold, wherein when the total word number of the preprocessing content does not exceed the contract division word number threshold, the first part of content comprises all the preprocessing content, the second part of content is empty, and when the