CN-121997307-A - Enterprise document access right determining method and system
Abstract
The invention provides an enterprise document access authority determination method and system, the method comprises the steps of obtaining query sentences in a preset history period and obtaining authority rules, constructing a marked sample set based on the query sentences, constructing a primary document semantic analysis model based on the marked sample set, constructing a representative sample set by using the document semantic analysis model, constructing a target sample set according to the representative sample set and the marked sample set, training the primary document semantic analysis model again based on the target sample set to obtain a document semantic analysis model, obtaining the query sentences of a current employee, inputting the query sentences into the document semantic analysis model, outputting probability distribution of each authority attribute, and determining whether the current employee has access authority or not according to the authority rules and the probability distribution of each authority attribute. According to the invention, a high-precision document semantic analysis model is constructed by a layer-by-layer screening mechanism at minimum labeling cost, so that the accuracy of enterprise document authority control is improved.
Inventors
- ZHU TAO
- WANG XUEJUN
- DU KAI
Assignees
- 四川易软信息技术有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260408
Claims (10)
- 1. A method for determining access rights to an enterprise document, comprising: acquiring inquiry sentences and acquiring authority rules in a preset history period, wherein each authority rule comprises a main body job level, an operation type, a document security level and a corresponding access authority result; Constructing a marked sample set based on the query statement, constructing a primary document semantic analysis model based on the marked sample set, and constructing a representative sample set by utilizing the primary document semantic analysis model; constructing a target sample set according to the representative sample set and the marked sample set, and training the primary document semantic analysis model again based on the target sample set to obtain a document semantic analysis model, wherein the document semantic analysis model is used for outputting probability distribution of each authority attribute of a query statement, and the authority attributes comprise operation type attributes, document type attributes and document security attributes; Acquiring the query statement of the current employee, inputting the query statement into a document semantic analysis model, outputting probability distribution of each authority attribute, and determining whether the current employee has access authority according to the authority rule and the probability distribution of each authority attribute.
- 2. The method of claim 1, wherein constructing a labeled sample set based on the query statement, constructing a primary document semantic parsing model based on the labeled sample set, and constructing a representative sample set using the primary document semantic parsing model, comprises: The method comprises the steps of randomly selecting a preset number of query sentences from all query sentences, taking the query sentences as samples, acquiring marking information corresponding to each sample to form a marked sample set, wherein the marking information comprises probability distribution of various authority attributes, and the probability distribution is probability of different categories under each authority attribute; Training a convolutional neural network model by using a marked sample set to obtain a primary document semantic analysis model, inputting an unlabeled query sentence into the primary document semantic analysis model as an unlabeled sample to obtain the predictive probability distribution of various authority attributes of the unlabeled sample, and screening the unlabeled sample based on the predictive probability distribution of the various authority attributes of the unlabeled sample to obtain a representative sample set.
- 3. The method for determining access rights of an enterprise document according to claim 2, wherein the screening of unlabeled samples based on a predictive probability distribution of a plurality of rights attributes of unlabeled samples to obtain a representative sample set includes: for each unlabeled sample, calculating the prediction entropy of each authority attribute, wherein the calculation formula of the prediction entropy is as follows Wherein Adding the prediction entropy on each authority attribute to obtain the sample value of each unlabeled sample, sorting according to the sample value from high to low, selecting the first K unlabeled samples and marking the first K unlabeled samples as preliminary screening samples to be selected; The method comprises the steps of carrying out clustering operation on to-be-selected primary screening samples to obtain a plurality of to-be-selected primary screening sample clusters, counting the number of to-be-selected primary screening samples contained in each to-be-selected primary screening sample cluster, sequencing the to-be-selected primary screening sample clusters from large to small according to the number of the to-be-selected primary screening samples contained, selecting the first N to-be-selected primary screening sample clusters as target clusters, and screening to-be-selected primary screening samples based on the target clusters to obtain a representative sample set.
- 4. The method for determining access rights of an enterprise document according to claim 3, wherein screening the preliminary screening samples to be selected based on the target cluster to obtain a representative sample set includes: Calculating the similarity between each to-be-selected primary screening sample which does not belong to any target cluster and each to-be-selected primary screening sample in the target cluster, taking the maximum similarity as the similarity between the to-be-selected primary screening sample and the number of the target clusters, and if the similarity between the number of the classes is larger than a preset first similarity threshold and smaller than a preset second similarity threshold, marking the to-be-selected primary screening sample as a peripheral sample corresponding to the target cluster; Collecting surrounding samples corresponding to each target cluster and to-be-selected preliminary screening samples contained in each target cluster, performing de-duplication operation after collecting to obtain a preliminary screening sample set, calculating semantic features, syntax features and keyword features of each preliminary screening sample in the preliminary screening sample set, respectively calculating semantic similarity, syntax similarity and keyword similarity corresponding to every two preliminary screening samples based on the semantic features, the syntax features and the keyword features of each preliminary screening sample to form a semantic similarity matrix, a syntax similarity matrix and a keyword similarity matrix, and obtaining a representative sample set based on the semantic similarity matrix, the syntax similarity matrix and the keyword similarity matrix.
- 5. The method for determining access rights to an enterprise document according to claim 4 wherein deriving the representative sample set based on a semantic similarity matrix, a syntactic similarity matrix, and a keyword similarity matrix comprises: Respectively calculating information entropy of the semantic similarity matrix, the syntactic similarity matrix and the keyword similarity matrix, calculating reciprocal of the three information entropy, summing the three reciprocal to obtain reciprocal sum, dividing reciprocal corresponding to the semantic similarity matrix, the syntactic similarity matrix and the keyword similarity matrix by reciprocal sum to obtain weight of the semantic similarity, the syntactic similarity and the keyword similarity in the comprehensive similarity calculation; The method comprises the steps of placing all primary screening samples into a first candidate pool, initializing an empty set to serve as a representative queue, calculating the sum of the comprehensive similarity of each primary screening sample in the first candidate pool and all other primary screening samples in the first candidate pool, wherein the comprehensive similarity between the two primary screening samples is obtained by weighting and summing semantic similarity, syntactic similarity and keyword similarity according to weights, and obtaining the representative sample set based on the sum of the comprehensive similarity and the first candidate pool.
- 6. An enterprise document access rights determination system, comprising: the acquisition module is used for acquiring inquiry sentences and acquisition permission rules in a preset history period, wherein each permission rule comprises a main body job level, an operation type, a document security level and a corresponding access permission result; The construction module is used for constructing a marked sample set based on the query statement, constructing a primary document semantic analysis model based on the marked sample set and constructing a representative sample set by utilizing the primary document semantic analysis model; constructing a target sample set according to the representative sample set and the marked sample set, and training the primary document semantic analysis model again based on the target sample set to obtain a document semantic analysis model, wherein the document semantic analysis model is used for outputting probability distribution of each authority attribute of a query statement, and the authority attributes comprise operation type attributes, document type attributes and document security attributes; The determining module is used for acquiring the query statement of the current employee, inputting the query statement into the document semantic analysis model, outputting the probability distribution of each authority attribute, and determining whether the current employee has access authority according to the authority rule and the probability distribution of each authority attribute.
- 7. The enterprise document access rights determination system of claim 6, wherein the building module comprises: The labeling unit is used for randomly selecting a preset number of query sentences from all query sentences and taking the query sentences as samples, acquiring labeling information corresponding to each sample to form a labeled sample set, wherein the labeling information comprises probability distribution of various authority attributes, and the probability distribution is probability of different categories under each authority attribute; The screening unit is used for training the convolutional neural network model by using the marked sample set to obtain a primary document semantic analysis model, inputting unlabeled query sentences into the primary document semantic analysis model as unlabeled samples to obtain the predictive probability distribution of various authority attributes of the unlabeled samples, and screening the unlabeled samples based on the predictive probability distribution of the various authority attributes of the unlabeled samples to obtain a representative sample set.
- 8. The enterprise document access rights determination system of claim 7, wherein the screening unit comprises: the sorting unit is used for calculating the prediction entropy of each unlabeled sample on each authority attribute, wherein the calculation formula of the prediction entropy is as follows Wherein Adding the prediction entropy on each authority attribute to obtain the sample value of each unlabeled sample, sorting according to the sample value from high to low, selecting the first K unlabeled samples and marking the first K unlabeled samples as preliminary screening samples to be selected; The clustering unit is used for carrying out clustering operation on the to-be-selected primary screening sample clusters to obtain a plurality of to-be-selected primary screening sample clusters, counting the number of the to-be-selected primary screening samples contained in each to-be-selected primary screening sample cluster, sequencing the to-be-selected primary screening sample clusters according to the contained number of the to-be-selected primary screening samples from large to small, selecting the first N to-be-selected primary screening sample clusters as target clusters, and screening the to-be-selected primary screening samples based on the target clusters to obtain a representative sample set.
- 9. The system for determining access rights to an enterprise document according to claim 8, wherein the clustering unit includes: The first calculation unit is used for calculating the similarity between each to-be-selected primary screening sample which does not belong to any target cluster and each to-be-selected primary screening sample in the target cluster, taking the maximum similarity as the number-class similarity between the to-be-selected primary screening sample and the target cluster, and if the number-class similarity is larger than a preset first similarity threshold value and smaller than a preset second similarity threshold value, marking the to-be-selected primary screening sample as a peripheral sample corresponding to the target cluster; The second calculation unit is used for collecting the peripheral samples corresponding to each target cluster and the to-be-selected preliminary screening samples contained in each target cluster, performing de-duplication operation after collecting the peripheral samples and the to-be-selected preliminary screening samples to obtain a preliminary screening sample set, calculating semantic features, syntax features and keyword features of each preliminary screening sample in the preliminary screening sample set, and respectively calculating semantic similarity, syntax similarity and keyword similarity corresponding to every two preliminary screening samples based on the semantic features, the syntax features and the keyword features of each preliminary screening sample to form a semantic similarity matrix, a syntax similarity matrix and a keyword similarity matrix, and obtaining a representative sample set based on the semantic similarity matrix, the syntax similarity matrix and the keyword similarity matrix.
- 10. The enterprise document access rights determination system of claim 9, wherein the second computing unit comprises: The third calculation unit is used for calculating the information entropy of the semantic similarity matrix, the syntax similarity matrix and the keyword similarity matrix respectively, calculating the reciprocal of the three information entropy, summing the three reciprocal to obtain the sum of the reciprocal, dividing the reciprocal corresponding to the semantic similarity matrix, the syntax similarity matrix and the keyword similarity matrix by the sum of the reciprocal to obtain the weight of the semantic similarity, the syntax similarity and the keyword similarity in the comprehensive similarity calculation; The fourth calculation unit is used for placing all the preliminary screening samples into the first candidate pool and initializing an empty set to serve as a representative queue, calculating the sum of the comprehensive similarity of each preliminary screening sample in the first candidate pool and all other preliminary screening samples in the first candidate pool, wherein the comprehensive similarity between the two preliminary screening samples is obtained by weighting and summing semantic similarity, syntactic similarity and keyword similarity according to weights, and obtaining the representative sample set based on the sum of the comprehensive similarity and the first candidate pool.
Description
Enterprise document access right determining method and system Technical Field The invention relates to the technical field of enterprise management, in particular to a method and a system for determining access rights of enterprise documents. Background With the continuous deep information construction of enterprises, a large amount of enterprise documents related to sensitive information such as business confidentiality, technical data, financial data and the like are stored in a document management system. The quick search of required documents by staff through natural language inquiry in daily work has become an important means for improving the work efficiency. However, how to ensure that employees can only access documents within their rights, preventing sensitive information from being revealed is a core security challenge faced by enterprise document management systems. Disclosure of Invention The invention aims to provide a method and a system for determining access rights of enterprise documents, so as to solve the problems. In order to achieve the above object, the present application provides the following technical solutions: in one aspect, an embodiment of the present application provides a method for determining access rights of an enterprise document, where the method includes: acquiring inquiry sentences and acquiring authority rules in a preset history period, wherein each authority rule comprises a main body job level, an operation type, a document security level and a corresponding access authority result; Constructing a marked sample set based on the query statement, constructing a primary document semantic analysis model based on the marked sample set, and constructing a representative sample set by utilizing the primary document semantic analysis model; constructing a target sample set according to the representative sample set and the marked sample set, and training the primary document semantic analysis model again based on the target sample set to obtain a document semantic analysis model, wherein the document semantic analysis model is used for outputting probability distribution of each authority attribute of a query statement, and the authority attributes comprise operation type attributes, document type attributes and document security attributes; Acquiring the query statement of the current employee, inputting the query statement into a document semantic analysis model, outputting probability distribution of each authority attribute, and determining whether the current employee has access authority according to the authority rule and the probability distribution of each authority attribute. In a second aspect, the present application provides an enterprise document access rights determination system, the system comprising: the acquisition module is used for acquiring inquiry sentences and acquisition permission rules in a preset history period, wherein each permission rule comprises a main body job level, an operation type, a document security level and a corresponding access permission result; The construction module is used for constructing a marked sample set based on the query statement, constructing a primary document semantic analysis model based on the marked sample set and constructing a representative sample set by utilizing the primary document semantic analysis model; constructing a target sample set according to the representative sample set and the marked sample set, and training the primary document semantic analysis model again based on the target sample set to obtain a document semantic analysis model, wherein the document semantic analysis model is used for outputting probability distribution of each authority attribute of a query statement, and the authority attributes comprise operation type attributes, document type attributes and document security attributes; The determining module is used for acquiring the query statement of the current employee, inputting the query statement into the document semantic analysis model, outputting the probability distribution of each authority attribute, and determining whether the current employee has access authority according to the authority rule and the probability distribution of each authority attribute. In a third aspect, the present application provides an enterprise document access rights determination device comprising a memory and a processor. The processor is used for realizing the steps of the enterprise document access right determining method when executing the computer program. In a fourth aspect, the present application provides a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the enterprise document access rights determination method described above. The beneficial effects of the invention are as follows: 1. According to the method, firstly, the prediction entropy of each unlabeled sample on three authority attributes of an ope