CN-122023047-A - Risk identification method, device, equipment and medium for enterprise reimbursement data
Abstract
The application provides a risk identification method, device, equipment and medium for enterprise reimbursement data, which relate to the technical field of enterprise data processing and comprise the steps of obtaining reimbursement data and reimbursement expenditure associated data of target enterprises to be approved; the method comprises the steps of constructing a target reimbursement subgraph based on reimbursement data and reimbursement expense associated data, extracting features of the target reimbursement subgraph to obtain a plurality of feature vectors, inputting the feature vectors and the target reimbursement subgraph into a pre-trained risk recognition model to obtain a risk recognition result, respectively carrying out rule check and time sequence abnormality analysis on the reimbursement data and the reimbursement expense associated data to obtain a risk score, and determining a comprehensive risk recognition result of the reimbursement data and the reimbursement expense associated data based on the risk recognition result and the risk score. The application improves the accuracy and the comprehensiveness of risk identification of the reimbursement data of enterprises.
Inventors
- Ma Chunchuo
- YU DEMING
- WANG WEIDONG
Assignees
- 北京合思汇智信息技术有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260203
Claims (10)
- 1. A method for risk identification of business reimbursement data, the method comprising: acquiring reimbursement data and reimbursement expenditure associated data of a target enterprise to be approved; constructing a target reimbursement subgraph based on the reimbursement data and the reimbursement expense associated data; Extracting features of the target reimbursement subgraph to obtain a plurality of feature vectors; Inputting a plurality of feature vectors and target reimbursement subgraphs into a pre-trained risk identification model to obtain a risk identification result; Performing rule check and time sequence anomaly analysis on the reimbursement data and the reimbursement expenditure associated data respectively to obtain a risk score; And determining a comprehensive risk identification result of the reimbursement data and the reimbursement expense associated data based on the risk identification result and the risk score.
- 2. The method of claim 1, wherein the reimbursement data comprises reimbursement sheets; constructing a target reimbursement subgraph based on the reimbursement data and the reimbursement expense associated data, including: according to the configured entity type template, entity extraction is carried out on the reimbursement data and the reimbursement expenditure associated data by utilizing a pre-trained entity identification model to obtain an entity set, wherein the entity set comprises reimbursement single nodes; analyzing the reimbursement data and the reimbursement expenditure associated data to obtain the relationship types among different entities in the entity set; Performing entity alignment on different entities in the entity set to obtain an aligned entity set and relationship types among different entities in the aligned entity set; each entity in the aligned entity set is used as a node respectively, and according to the relationship types among different entities in the aligned entity set, the corresponding sides among the nodes are used as edges corresponding to the corresponding entities, and the configured reimbursement heterogeneous association graph network is updated; And taking the reimbursement single node as a center, and extracting a target reimbursement subgraph from the updated reimbursement heterogeneous association graph network.
- 3. The method of claim 1, wherein performing feature extraction on the target cancellation subgraph to obtain a plurality of feature vectors, comprises: and extracting the characteristics of different nodes in the target reimbursement subgraph to obtain a plurality of characteristic vectors.
- 4. The method of claim 2, wherein the risk identification model comprises: the input layer is used for inputting a plurality of feature vectors and a target cancellation subgraph, wherein the feature vectors are node feature vectors; the node level attention layer is used for analyzing the topological structure of the target reimbursement subgraph, extracting the connection relation between different nodes in the target reimbursement subgraph and constructing a subgraph adjacency matrix; The feature aggregation layer is used for aggregating the node feature vectors of each node according to the direct neighbor node set and the indirect neighbor node set of each node in the target cancellation subgraph to obtain the node features after preliminary aggregation; the multi-layer risk propagation layer is used for carrying out multi-order neighborhood aggregation operation on the preliminarily aggregated node characteristics to obtain target feature vectors of all nodes in the target reimbursement subgraph; The risk probability output layer is used for determining the risk probability of the reimbursement bill to be approved according to the target feature vector of each node in the target reimbursement subgraph, and taking the risk probability of the reimbursement bill to be approved as the risk identification result.
- 5. The method of claim 1, wherein performing a rule check and timing anomaly analysis on the reimbursement data and the reimbursement expense association data, respectively, to obtain a risk score, comprises: Inputting the reimbursement data and the reimbursement expenditure associated data into a pre-trained rule recognition engine to obtain a first risk score; inputting the reimbursement data and the reimbursement expenditure associated data into a pre-trained reimbursement baseline time sequence analysis model to obtain a second risk score; and calculating a risk score according to the first risk score and the second risk score.
- 6. The method of claim 4, wherein determining a composite risk identification result for the reimbursement data and the reimbursement expense association data based on the risk identification result and the risk score comprises: mapping the risk probability of the reimbursement sheets to be approved into a third risk score; Weighting and summing the third risk score and the risk score to obtain a comprehensive risk score; Matching target risk levels corresponding to the comprehensive risk scores from the configured comparison tables of different risk scores and different risk levels; and generating a comprehensive risk identification result based on the comprehensive risk score and the target risk level.
- 7. The method of claim 6, wherein after determining the composite risk identification result of the reimbursement data and the reimbursement expense association data, the method further comprises: matching target reimbursement list processing approval strategies corresponding to the target risk levels from the configured comparison tables of different risk levels and different reimbursement list processing approval strategies; and automatically approving the reimbursement bill based on the target reimbursement bill processing approval strategy.
- 8. A risk identification device for corporate reimbursement data, the device comprising: the acquisition unit is used for acquiring reimbursement data and reimbursement expenditure associated data of target enterprises to be approved; a construction unit for constructing a target reimbursement sub-graph based on the reimbursement data and the reimbursement expense association data; The extraction unit is used for extracting the characteristics of the target reimbursement subgraph to obtain a plurality of characteristic vectors; The recognition unit is used for inputting the plurality of feature vectors and the target reimbursement subgraphs into a pre-trained risk recognition model to obtain a risk recognition result; the analysis unit is used for respectively carrying out rule check and time sequence abnormality analysis on the reimbursement data and the reimbursement expenditure associated data to obtain a risk score; And the determining unit is used for determining the comprehensive risk identification result of the reimbursement data and the reimbursement expenditure associated data based on the risk identification result and the risk score.
- 9. An electronic device, characterized in that the electronic device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are in communication with each other through the communication bus; a memory for storing a computer program; a processor for implementing the method of any of claims 1-7 when executing a program stored on a memory.
- 10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when executed by a processor, implements the method of any of claims 1-7.
Description
Risk identification method, device, equipment and medium for enterprise reimbursement data Technical Field The application relates to the technical field of enterprise data processing, in particular to a risk identification method, device, equipment and medium for enterprise reimbursement data. Background With the expansion of enterprise scale and diversification of business, employee reimbursement and charge control management on public payment and the like face great challenges. It is counted that the losses caused by false reimbursement, repeated reimbursement and compliance risks of large enterprises account for 5-10% of the total cost each year. The existing approval of the reimbursement data mainly depends on manual spot check and rule-based system approval, but the manual spot check has low efficiency and narrow coverage, cross departments and cross time related fraud are difficult to find, and the rule-based system approval is too stiff and is easily bypassed by staff through a bill disassembly mode and the like, so that risks in the reimbursement data of an enterprise cannot be comprehensively and accurately identified. Disclosure of Invention The embodiment of the application aims to provide a risk identification method, device, equipment and medium for enterprise reimbursement data, which are used for solving the problems in the prior art and realizing comprehensive and accurate risk identification of the enterprise reimbursement data. In a first aspect, a risk identification method for business reimbursement data is provided, and the method may include: acquiring reimbursement data and reimbursement expenditure associated data of a target enterprise to be approved; constructing a target reimbursement subgraph based on the reimbursement data and the reimbursement expense associated data; Extracting features of the target reimbursement subgraph to obtain a plurality of feature vectors; Inputting a plurality of feature vectors and target reimbursement subgraphs into a pre-trained risk identification model to obtain a risk identification result; Performing rule check and time sequence anomaly analysis on the reimbursement data and the reimbursement expenditure associated data respectively to obtain a risk score; And determining a comprehensive risk identification result of the reimbursement data and the reimbursement expense associated data based on the risk identification result and the risk score. In an alternative implementation, the reimbursement data includes reimbursement sheets; constructing a target reimbursement subgraph based on the reimbursement data and the reimbursement expense associated data, including: according to the configured entity type template, entity extraction is carried out on the reimbursement data and the reimbursement expenditure associated data by utilizing a pre-trained entity identification model to obtain an entity set, wherein the entity set comprises reimbursement single nodes; analyzing the reimbursement data and the reimbursement expenditure associated data to obtain the relationship types among different entities in the entity set; Performing entity alignment on different entities in the entity set to obtain an aligned entity set and relationship types among different entities in the aligned entity set; each entity in the aligned entity set is used as a node respectively, and according to the relationship types among different entities in the aligned entity set, the corresponding sides among the nodes are used as edges corresponding to the corresponding entities, and the configured reimbursement heterogeneous association graph network is updated; And taking the reimbursement single node as a center, and extracting a target reimbursement subgraph from the updated reimbursement heterogeneous association graph network. In an alternative implementation, feature extraction is performed on the target cancellation subgraph to obtain a plurality of feature vectors, including: and extracting the characteristics of different nodes in the target reimbursement subgraph to obtain a plurality of characteristic vectors. In an alternative implementation, the risk identification model includes: the input layer is used for inputting a plurality of feature vectors and a target cancellation subgraph, wherein the feature vectors are node feature vectors; the node level attention layer is used for analyzing the topological structure of the target reimbursement subgraph, extracting the connection relation between different nodes in the target reimbursement subgraph and constructing a subgraph adjacency matrix; The feature aggregation layer is used for aggregating the node feature vectors of each node according to the direct neighbor node set and the indirect neighbor node set of each node in the target cancellation subgraph to obtain the node features after preliminary aggregation; the multi-layer risk propagation layer is used for carrying out multi-order neighborhood aggregation ope