Search

CN-122023044-A - Enterprise tax payer associated tax risk assessment method based on ripple set

CN122023044ACN 122023044 ACN122023044 ACN 122023044ACN-122023044-A

Abstract

The invention discloses an enterprise taxpayer associated tax risk assessment method based on a ripple set, which comprises the steps of 1) constructing a tax risk knowledge graph database, 2) selecting seed nodes to generate candidate subgraphs, and 3) identifying high tax risk subgraphs. The method improves the recognition capability of tax risks hidden in enterprise taxpayer groups by introducing the knowledge graph with multiple relation fusion, realizes high-efficiency subgraph analysis on a large-scale graph through the ripple set and the threshold mechanism, reduces the influence of static rule dependence and subjective experience on associated tax risk assessment through a classical subgraph mode, is superior to the existing method in accuracy, expandability and calculation efficiency, and has high application value and popularization value.

Inventors

  • ZHAO XIA
  • FAN YINGDAN
  • YANG JIE
  • HUANG BOWEN
  • ZHANG TAO
  • YAN XIAO
  • YU ZHONGZHONG

Assignees

  • 北京工商大学

Dates

Publication Date
20260512
Application Date
20260131

Claims (4)

  1. 1. A method for evaluating the associated tax risk of an enterprise taxpayer based on a ripple set comprises the following steps: A. constructing a tax risk knowledge graph database, which comprises the following specific steps: A1. Defining tax risk knowledge graphs G= { E and R }, wherein E and R respectively represent an entity and a relation set, E contains 2 entity types of natural people and enterprise tax payers, and R contains 8 relation types in Table 1; A2. the method comprises the steps of extracting relevant data of investment information, stock information, payment information and transaction information from a tax management system, constructing relations among enterprise tax payers, natural people and natural people, and marking the relations as relations (subjects and objects); A3. extracting attribute data corresponding to the relation from the tax management system, and recording the attribute data as attribute values of the relation as relation (subject and object); B. selecting seed nodes to generate a candidate sub-graph G, wherein the specific steps are as follows: B1. selecting a seed node u, wherein the main reference elements comprise the following characteristics: transaction with obvious tax difference and related enterprise tax payers; there are correlated corporate taxpayers with high proportion of the correlated transaction amount in the corporate total transaction; Related enterprise taxpayers who frequently generate large funds to and from; B2. Calculating the weight coefficient of each attribute of the 4 relationships, and taking the transaction relationship as an example, calculating the weight coefficient omega j (j=1, 2) of 2 attributes such as the transaction times, the transaction amounts and the like of the relationship; B3. The seed node u is diffused layer by using a graph searching method, a ripple set S of the u is generated, and the specific steps are as follows: B3.1 Defining a layer 0 set of associated nodes of a seed node u as ; B3.2 Construction of a layer 1 candidate associated node set for a seed node u The method comprises the following specific steps: B3.2.1 Defining a set of layer 1 associated nodes as , ; B3.2.2 The original relation strength between a single node and a node u in the set is calculated, and the single node u 1 is taken as an example, and the specific steps are as follows: B3.2.2.1 The number of the relations between u and u 1 is a, and a is less than or equal to 4; B3.2.2.2 Calculating the relation strength of each relation between u and u 1 , taking a trade relation as an example, wherein the standardized relation attribute values between u and u 1 are y 11 、y 12 respectively, and calculating the relation strength of the trade relation between u and u 1 by adopting a formula (5) according to the weight coefficients omega j (j=1, 2) of the 2 attributes calculated in the step B2; B3.2.2.3 Based on historical inspection case statistical analysis and expert evaluation, setting importance weights beta= [ beta 1 , β 2 ,β 3 , β 4 ] of different types of relations in tax risk propagation through combination with a 1-9 scale method of Saaty, wherein the weights reflect the relative importance of each relation type in risk conduction; B3.2.2.4 Obtaining the relation strength of each relation between u and u 1 by using a B3.2.2.2 calculation method, and calculating the original relation strength between u and u 1 according to the weight beta of 4 relations ; B3.2.3 According to the method of step B3.2.2, a layer 1 set of associated nodes is calculated The original relationship strength W u ui , i=1, 2, L for each node to u; B3.2.4 The original relation strength (W u u1 , W u u2 ,…W u uL ) of the L association nodes and u is standardized, and the original relation strength is standardized by adopting a formula (6); B3.2.5 Will be The standard relation strength of each node and the seed node u is expressed as a triplet Or (b) Wherein the value of r i represents the normalized relationship strength w u ui between node u and node u i ; b3.2.6 Screening associated nodes corresponding to the strength w u ui of the standardized relationship not less than d, wherein d is a configurable parameter, and forming the nodes meeting the conditions into a layer 1 candidate associated node set , ( L1≤L); B3.3 Construction of seed node u to obtain a layer 1 ripple subset The number of triples in the set is denoted as a set of L1 triples, as follows: Wherein h, h i represent head nodes, r i represent standardized relationship strength w u ui ,t、t i between node u and node u i represent tail nodes, i represent sequence numbers of nodes, and G represents tax risk assessment knowledge graphs; B3.4 With each node u i (i=1, 2,., L1) in the existing outermost ripple subset of u as a temporary seed node, a layer 1 ripple subset of each temporary seed node is constructed according to the method of steps B3.1, B3.2, B3.3 ; B3.5 Subset of the layer 1 corrugations of the temporary seed nodes The new layer of ripple subset S 2 , which is u, is merged and represented as follows: the number of triples in the set is noted as L2; B3.6 According to the methods of steps B3.4 and B3.5, iteratively expanding to generate a ripple subset S k (k=3, 4,..h) of a subsequent level of seed nodes u, the specific steps are as follows; B3.6.1 According to the method of step B3.4, the tail node of each triplet in the existing outermost corrugated subset S k-1 is used as a temporary seed node u i , and a1 st corrugated subset of each temporary node is constructed ; B3.6.2 Layer 1 ripple subset of all temporary nodes according to the method of step B3.5 Merging, namely an outermost corrugated subset S k to be generated as u; b3.7 Combining the k-layer ripple subsets into a ripple set of seed node u , Representing a subset of corrugations of a kth layer, H being a maximum corrugated layer; B4. Fusing and recombining the triples (h, r, t) of each layer of ripple subset in the ripple set S to generate a candidate sub-graph G s of the seed node; C. the method for identifying the high-risk tax avoidance subgraph comprises the following specific steps: C1. the method comprises the following specific steps of constructing a high risk avoid a tax mode library: C1.1 Based on historical auditing cases and tax expert knowledge, a typical high risk avoid a tax mode set P= { P 1 ,p 2 ,...,p r } is defined, and r is the total number of avoid a tax modes, and the specific steps are as follows: C1.1.1 Each avoid a tax mode p j (j is more than or equal to 1 and less than or equal to r) is represented by a sub-graph with specific node roles, topological structures and association relations; C1.1.2 Example P 1 there are 5 nodes in avoid a tax mode "P01, P02, C01, C02, C03, C04, C05, etc. to form a avoid a tax mode" for an associated transaction, avoid a tax mode diagram is shown in FIG. 3, in which: P01, P02 are natural persons with social relations, C01, C02, C03, C04, C05 are enterprise nodes, and CO_wner (P01, C01) and CC_ shareholding (C03, C05), CO_wner (P02, C02) are present; cc_transactions (C05, C04) are present and the tax rate differences involved are significant; the risk path meets the requirements of multi-layer penetration and hierarchical depth, and E meets the requirements of tail nodes of a plurality of risk paths; C1.2 Node attributes and topological structure features are extracted for each avoid a tax mode p j (1≤j≤r), and a corresponding mode feature vector Fp j is generated, taking example p 1 as an example, the feature vector Fp 1 contains the following core dimensions: C1.2.1 Specific relationship types of enterprise taxpayers affecting the associated tax risk; C1.2.2 Tax rate difference threshold for nodes at both ends of a particular relationship type Wherein T C05 、T C04 represents the tax rate of the head node C05 and the tail node C04; c1.2.3 Associated transaction amount ratio threshold value of two end nodes of specific relation type Re C05 represents the total income of the node C05, re CC represents the total income of transactions generated by the node C04 and the node C05, and alpha is 0 if no transactions exist between the nodes; Finally obtain Wherein type represents a relationship type; C2. starting from enterprise taxpayer nodes, identifying candidate risk subgraphs G ce ; C3. Starting from natural person nodes, identifying candidate risk subgraphs G cr ; C4. Feature similarity between avoid a tax patterns and the candidate risk subgraph G ce 、G cr is calculated, and high tax risk subgraphs are identified, wherein the specific steps are as follows: C4.1 The feature similarity of each candidate risk sub-graph G ce 、G cr and each avoid a tax mode p j (j is more than or equal to 1 and less than or equal to r) is calculated, and the specific steps are as follows: c4.1.1 Extracting a feature vector F Gce 、F Gcr of the candidate risk subgraph G ce 、G cr by using the method of the step C1.2; C4.1.2 Calculating the feature similarity of the tax avoidance pattern p j and the candidate subgraph G ce 、G cr (j=1, 2,.,. R) by adopting formulas (7) and (8), respectively; (7) (8) C4.2 If sim (p j , G ce ) is more than or equal to θs or sim (p j , G cr )≥θ s ), judging G ce or G cr as a high tax risk subgraph, wherein θ s is a configurable parameter.
  2. 2. The method for evaluating the associated tax risk of the enterprise taxpayer based on the ripple set, as set forth in claim 1, comprises the following specific steps: a1.1 Defining entity types of enterprise tax payers and natural artificial tax risk knowledge graphs G; A1.2 Defining 8 relation types among enterprise tax payers and between the enterprise tax payers and natural people, and extracting relation data among data construction examples from a system database; A1.3 Attributes defining the types of relationships between entities.
  3. 3. The method for evaluating tax risk of enterprise taxpayers based on the knowledge graph as set forth in claim 1, wherein the step of identifying the candidate risk subgraph G ce from the node of the enterprise taxpayers comprises the following specific steps: c2.1 Searching 2 enterprise taxpayer nodes a and nodes b with transaction relations on the candidate subgraph G s ; C2.2 Performing ripple propagation reverse tracking in the candidate subgraph by taking the node a and the node b as starting points, searching actual controller nodes c and d corresponding to the node a and the node b respectively, extracting the relation type, tax rate difference and transaction amount ratio of the node a and the node b, and obtaining a feature vector F Gce of the candidate subgraph G s ; C2.3 And if the social relationship exists and the threshold condition of the pattern p j is met, marking the social relationship as a candidate risk sub-graph G ce .
  4. 4. The method for evaluating tax risk of enterprise taxpayers based on the knowledge graph as set forth in claim 1, wherein the step of identifying the candidate risk subgraph G cr from natural person nodes comprises the following steps: c3.1 Searching a natural person node a and a natural person node b with social relations on the candidate sub-graph G s ; C3.2 Using the node a and the node b as starting points, tracking the ripple propagation direction in the candidate subgraph, searching enterprise taxpayer nodes c and d respectively controlled by the node a and the node b, extracting the relation type, tax rate difference and transaction amount ratio of the nodes, and obtaining a feature vector F Gcr of the candidate subgraph G s ; C3.3 Detecting whether the enterprise taxpayer nodes c and d have a transaction relationship meeting a threshold condition; if present, and the threshold condition for pattern p j is met, then the candidate risk subgraph G cr is marked.

Description

Enterprise tax payer associated tax risk assessment method based on ripple set Technical Field The invention relates to the field of tax risk management of enterprise taxpayers, in particular to an enterprise taxpayer-associated tax risk assessment method based on a ripple set, and belongs to the field of intersection of artificial intelligence and tax risk assessment management. Background With the development of digital economy and the increasing complexity of enterprise operation modes, hidden tax planning and even tax evasion actions are increasing by means of related transactions, cross-regional layout, shell companies and the like. Traditional tax risk assessment methods typically rely on static financial data, and it is difficult to effectively identify and quantify such tax risks that evolve dynamically across subjects. Thus, the need for more advanced methods to improve the accuracy and efficiency of tax risk assessment is a challenge for researchers in the field. Researchers in the industry and academia apply graph computation and artificial intelligence techniques to associated risk analysis. Ji Peng et al propose a public opinion early warning and risk propagation analysis method, a system, equipment and a storage medium, construct a knowledge graph containing basic information of enterprises, focus on unstructured public opinion text analysis outside enterprises, automatically mine nodes associated with high-risk nodes without paying attention to enterprise internal management data, zhang Ze et al propose an entity associated risk assessment method, which focuses on a stock right relationship, combines basic information of the entities, business transaction, financial conditions and graph theory indexes such as degrees, medium centrality and the like, analyzes enterprises, further identifies key entities with risks, ignores the influence of association relationships such as transactions, cooperation and the like on associated risk assessment, xu Ming et al propose an artificial intelligence-based individual stock risk early warning method and system, calculate individual stock risks by adopting a multi-level attenuation mechanism and multiple adjustment factors according to the risk conduction characteristics of the market, and identify insufficient stock in a cross-main-conductivity avoid a tax mode. Disclosure of Invention The invention discloses an enterprise tax payer associated tax risk assessment method based on a ripple set, which comprises the steps of constructing an enterprise tax risk knowledge graph, defining 8 entity relation types and an association relation strength matrix, generating candidate subgraphs by using a ripple set algorithm, and identifying high tax risk subgraphs by using tax characteristics and structural characteristics of subgraphs. The method comprises the steps of 1) constructing a tax risk knowledge graph database, 2) selecting seed nodes to generate candidate subgraphs, and 3) identifying high tax risk subgraphs. Specifically, the method of the present invention comprises the steps of: A. constructing a tax risk knowledge graph database, which comprises the following specific steps: A1. Defining tax risk knowledge graphs G= { E and R }, wherein E and R respectively represent an entity and a relation set, E contains 2 entity types of natural people and enterprise tax payers, and R contains 8 relation types in Table 1; a1.1 Defining entity types of enterprise tax payers and natural artificial tax risk knowledge graphs G; A1.2 8 relationship types among the enterprise taxpayers and between the enterprise taxpayers and the natural person are defined, as shown in table 1: TABLE 1 relationship types Wherein CC represents the relationship between the entity of the enterprise tax payer, CO represents the relationship between the entity of the natural person and the entity of the enterprise tax payer, OO represents the relationship between the entity of the natural person, A, B represents the entity of the enterprise tax payer, C, D represents the natural person; a1.3 Attributes defining the relationship types between entities, as in table 2: TABLE 2 relationship type attribute A2. the method comprises the steps of extracting relevant data of investment information, stock information, payment information and transaction information from a tax management system, constructing relations among enterprise tax payers, natural people and natural people, and marking the relations as relations (subjects and objects); A3. extracting attribute data corresponding to the relation from the tax management system, and recording the attribute data as attribute values of the relation as relation (subject and object); B. selecting seed nodes to generate a candidate sub-graph G s, wherein the method comprises the following specific steps: B1. selecting a seed node u, wherein the main reference elements comprise the following characteristics: -transactions with significant tax differences