Search

CN-121998412-A - Intelligent recognition method and system for vendor violation based on Monte Carlo method

CN121998412ACN 121998412 ACN121998412 ACN 121998412ACN-121998412-A

Abstract

The invention discloses a vendor illegal behavior intelligent identification method and a vendor illegal behavior intelligent identification system based on a Monte Carlo method, which relate to the technical field of data processing, wherein the method comprises the steps of preprocessing an original bid record data set to obtain an analysis bid record data set; the method comprises the steps of multidimensional analysis of characteristic mechanism multidimensional characteristic extraction, supplier relation network characteristic quantification processing to obtain an initial characteristic engineering data set, utilizing a multi-granularity time analysis mechanism time sequence mode analysis to fuse the initial characteristic engineering data set into an enhanced multidimensional characteristic engineering data set, utilizing a Monte Carlo simulation model to construct an abnormal behavior portrait, constructing a multi-layer data view, simulating the multi-layer data view and obtaining a quantified comprehensive risk score, utilizing multidimensional risk assessment and intelligent identification terminal assessment identification to obtain an integrity comprehensive risk score, dividing suppliers based on a preset threshold value, and generating an illegal behavior adjustment report. The invention provides a differentiation scheme for suppliers with different risks, and improves the scientificity and pertinence of management and control.

Inventors

  • KONG SI
  • ZHANG YI
  • ZHOU BEIBEI
  • NIU ZHIKANG
  • WANG WEI
  • TIAN DANNI
  • WANG HUADONG
  • WANG YOU
  • LI LINGXIAO
  • WANG JINCANG

Assignees

  • 华电集团北京燃料物流有限公司

Dates

Publication Date
20260508
Application Date
20251231

Claims (10)

  1. 1. The intelligent recognition method for the vendor violation based on the Monte Carlo method is characterized by comprising the following steps of: S1, acquiring an original bidding record data set and preprocessing to obtain an analysis bidding record data set; S2, performing multidimensional feature extraction on the analysis bidding record data set by using a multidimensional analysis feature mechanism, and performing supplier feature quantification processing on the extracted multidimensional feature result by using a supplier relation network to obtain an initial feature engineering data set; S3, performing time sequence mode analysis on the analysis bid record data set by utilizing a multi-granularity time analysis mechanism, and fusing a time sequence distribution analysis result with the initial characteristic engineering data set to obtain an enhanced multi-dimensional characteristic engineering data set; s4, constructing a Monte Carlo simulation model, and identifying and constructing an enhanced multi-dimensional characteristic engineering data set by using the Monte Carlo simulation model to obtain an abnormal behavior portrait; s5, identifying and screening abnormal behavior portraits, constructing a multi-level data view, and carrying out inspection simulation on the multi-level data view by utilizing the Monte Carlo simulation model to obtain a randomization scheme of supplier conditions; S6, evaluating and identifying the quantitative comprehensive risk score by utilizing the multidimensional risk evaluation and intelligent identification terminal, integrating the evaluation and identification results to obtain an integrity comprehensive risk score, grading the suppliers based on a preset threshold value and the integrity comprehensive risk score, and generating an illegal activity adjustment report to realize the identification of the illegal activity of the suppliers.
  2. 2. The method for intelligently identifying vendor violations based on the monte carlo method according to claim 1, wherein the steps of obtaining and preprocessing an original bid record data set to obtain an analyzed bid record data set include: s11, generating an original bidding record data set based on bidding identification, provider identification, quotation amount and bidding time; S12, processing missing values of the original bid record data set by using a layered cleaning strategy, marking invalid data of any field which is missing, and recording deletion reasons and quantity to obtain a preliminary cleaning data set; S13, detecting an offer amount field in the preliminary cleaning data set by using a multi-level anomaly detection mode, identifying a dynamic adjustment threshold value based on seasonal fluctuation characteristics of coal prices and an improved box diagram method, replacing the adjusted Z fraction of the standard deviation by using a median instead of a mean and a median absolute deviation, and marking an observed value with an absolute value larger than a preset threshold value as an outlier; s14, carrying out entity normalization on the problem of inconsistent provider names by utilizing a fuzzy matching algorithm of the editing distance and the business knowledge base to obtain bidding record intermediate data with uniform provider identifiers; S15, integrating the outliers, the data quality reports and the data of the intermediate data of the bid records into a star-type data warehouse to form an analysis bid record data set.
  3. 3. The intelligent recognition method of vendor violations based on the monte carlo method according to claim 1, wherein the steps of performing multidimensional feature extraction on the analysis bid record data set by using a multidimensional analysis feature mechanism and performing vendor feature quantization processing on the extracted multidimensional feature result by using a vendor relation network to obtain an initial feature engineering data set include: S21, constructing a multidimensional analysis characteristic mechanism based on price difference characteristics, relationship characteristics, time sequence characteristics and derivative characteristics; S22, performing feature extraction on the analysis bidding record data set by utilizing a multidimensional analysis feature mechanism, converting the feature extraction result into a series behavior information carrier, and generating a multidimensional feature set of the individual behaviors of the suppliers based on the information carrier; S23, constructing a provider relationship network of time sequence weight based on the common bidding records, and identifying potential association groups in the provider relationship network by utilizing a community discovery algorithm to obtain community structural features; s24, carrying out network association quantification and enhancement fusion on the multidimensional feature set by utilizing the provider relationship network and the community structure features to obtain an initial feature engineering data set.
  4. 4. The intelligent recognition method of vendor violations based on the monte carlo method according to claim 3, wherein the constructing a vendor relational network of time sequence weights based on the common bid records and recognizing potential association groups in the vendor relational network by using a community discovery algorithm to obtain community structure features comprises: s231, performing multiple operation to obtain a consensus strategy by using a community discovery algorithm, and independently operating random seeds with different preset times; s232, based on community division results of independent operation of preset times, counting community sharing frequencies among provider nodes, and constructing a consensus matrix of a provider relationship network to obtain a consensus matrix data set; S233, inputting the data set based on the consensus matrix to the community structure detection again, eliminating the random deviation of single algorithm operation, and obtaining a vendor community division result; S234, based on the division result of the supplier society, extracting the supplier nodes with high intermediation center, and obtaining the competition health degree of the supplier network.
  5. 5. The intelligent recognition method of vendor violations based on the monte carlo method according to claim 1, wherein the analyzing the bid record data set by using a multi-granularity time analysis mechanism for time sequence pattern analysis and fusing the time sequence distribution analysis result with the initial feature engineering data set to obtain the enhanced multi-dimensional feature engineering data set comprises: s31, constructing a multi-granularity time analysis mechanism based on macroscopic periodicity analysis, mesoscopic aggregation analysis and microscopic sequence dependency analysis; S32, carrying out macroscopic periodicity analysis on the analysis bidding record data set by utilizing a multi-granularity time analysis mechanism, and identifying abnormal rules of bidding behaviors on seasonality and periodic time scales to obtain macroscopic periodicity characteristics; S33, performing mesoscopic aggregation analysis on the analysis bid record data set by utilizing a multi-granularity time analysis mechanism, and checking a distribution mode on a quotation event time axis according to a point process model to obtain a time aggregation characteristic; S34, performing microscopic sequence dependency analysis on the analysis bidding record data set by utilizing a multi-granularity time analysis mechanism, and analyzing the time interval, the quotation sequence synchronism and the winning rotation mode of the continuous bidding event to obtain behavior sequence characteristics; S35, integrating and quantifying macro periodic features, time aggregation features and behavior sequence features to obtain a time sequence distribution analysis result; S36, fusing the time sequence distribution analysis result with the initial characteristic engineering data set to obtain the enhanced multi-dimensional characteristic engineering data set.
  6. 6. The intelligent recognition method of vendor offence based on monte carlo method according to claim 1, wherein the constructing a monte carlo simulation model, recognizing and constructing an enhanced multi-dimensional feature engineering data set by using the monte carlo simulation model, and obtaining an abnormal behavior representation comprises: S41, constructing a Monte Carlo simulation model based on overall frequency inspection, provider pair analysis, conditional pattern inspection and time sequence pattern analysis and combining a randomized inverse scene; s42, carrying out large-scale simulation by using a randomization strategy for maintaining a scale structure to construct zero distribution of price difference unitary frequency, calculating percentile ranking of observation frequency, effect size index and nonparametric confidence interval, and carrying out statistical comparison and evaluation to obtain macroscopic inspection evidence; S43, screening provider pairs with statistical significance, executing special condition randomization test for pairing, and obtaining combined level test evidence based on multiple comparison correction and fusion of statistical significance, effect size and risk assessment of behavior persistence; s44, carrying out layered Monte Carlo simulation and statistical significance evaluation on basic service conditions, complex interaction conditions and dynamic market conditions in a layered manner to obtain contextualized test evidence; s45, constructing a random reference of a periodic, aggregation and sequence dependence time mode by utilizing Monte Carlo simulation, and carrying out statistical test to obtain a time sequence test evidence; S46, performing cross verification and evidence fusion on the enhanced multi-dimensional feature engineering data set based on the macroscopic inspection evidence, the combination level inspection evidence, the contextualized inspection evidence and the time sequence inspection evidence, and constructing a complete abnormal behavior portrait.
  7. 7. The intelligent recognition method of vendor violations based on the monte carlo method according to claim 6, wherein the constructing a random benchmark of periodic, aggregate and sequence-dependent time patterns using monte carlo simulation and performing statistical test to obtain timing test evidence comprises: S451, constructing a time sequence model based on dynamic condition test, and analyzing the correlation between the bid difference unitary frequency and the market index by utilizing the time sequence model; S452, decomposing the quotation difference unitary event sequence into a trend component, a period component and a residual component by utilizing a time sequence decomposition technology, performing spectrum analysis on the period component to identify a significant period, generating an event sequence with random time distribution by utilizing Monte Carlo simulation, comparing the period intensity difference between an actual sequence and the random sequence, and evaluating the statistical significance of the periodicity; s453, regarding the price quotation difference unary event as an event point on a time axis by using a point process analysis mode, calculating the interval time distribution of an actual event sequence and comparing the interval time distribution with a random sequence generated by a homogeneous poisson process; S454, analyzing the aggregation degree of multiple time scales by using the L function, constructing a confidence envelope curve of the K function by using Monte Carlo simulation, and judging whether the actual K function value exceeds the envelope curve so as to test the remarkable time aggregation.
  8. 8. The intelligent recognition method of vendor offence based on the Monte Carlo method of claim 1, wherein the steps of recognizing and screening abnormal behavior portraits and constructing a multi-level data view, performing inspection simulation on the multi-level data view by using the Monte Carlo simulation model to obtain a randomization scheme of vendor conditions, and performing evaluation and division on the randomization scheme to obtain a quantized comprehensive risk score comprise: S51, analyzing global distribution of common bidding times in the abnormal behavior portraits by using an adaptive threshold algorithm to determine a score threshold, and combining common bidding strength, time coverage, quotation correlation, bid complementation and multidimensional screening criteria to obtain a qualified provider pair set; S52, extracting basic information, two-party quotation details, competition environment information and business background information of the common participation bid sections based on the provider pair set, and inputting a common bidding event sequence, multi-dimensional characteristics of each event point and a time interval sequence which are arranged in time sequence into context data of the performances of the two parties respectively participating in other bid sections to obtain a multi-layer data view; S53, carrying out randomization test on the multi-layer data view by utilizing constraint randomization in the Monte Carlo simulation model, and constructing zero distribution to obtain a randomization scheme of supplier conditions; s54, comparing actual observed quotation difference unitary frequency with zero distribution generated by simulation, calculating p value, effect size and relative risk ratio, analyzing confidence interval, performing multiple comparison correction by using a layered error discovery rate control mode, and testing the influence of a randomization scheme on the result by combining sensitivity analysis to obtain a statistical significance evaluation result; s55, based on the statistical significance evaluation result, fusing behavior pattern risk, time consistency and network relation risk multidimensional indexes, calculating a quantitative comprehensive risk score by using a weighted formula, and dividing a provider pair into high, medium and low risk grades according to a preset threshold value to obtain the quantitative comprehensive risk score.
  9. 9. The intelligent recognition method for the illegal behaviors of the suppliers based on the monte carlo method according to claim 1, wherein the steps of utilizing the multidimensional risk assessment and the intelligent recognition terminal to evaluate and recognize the quantitative comprehensive risk score, integrating the evaluation recognition result to obtain an integrity comprehensive risk score, grading the suppliers based on a preset threshold value and the integrity comprehensive risk score, and generating the regulation report for the illegal behaviors of the suppliers to recognize the illegal behaviors of the suppliers comprise: S61, evaluating and identifying the quantized comprehensive risk scores and corresponding provider related data by utilizing a multidimensional risk evaluation and intelligent identification terminal to obtain an evaluation input data set; S62, based on a multi-dimensional evaluation framework of the terminal, carrying out statistical significance dimension evaluation, and mapping the p value into a statistical risk score through a conversion function by combining indexes such as the p value, the effect size, the confidence interval width and the like obtained by Monte Carlo test to obtain a statistical risk score result; S63, recognizing abnormal patterns of quotation time synchronism, quotation amount mantissa rule and quotation adjustment consistency by using an unsupervised anomaly detection algorithm, and quantifying the abnormal patterns into behavioral anomaly indexes to obtain behavioral pattern risk scoring results; s64, identifying a starting point, a duration and an evolution trend of an abnormal mode by using a time sequence analysis and variable point detection technology, and evaluating the stability and avoidance behavior of the abnormal mode to obtain a time consistency risk scoring result; s65, calculating node centrality and community structure indexes based on a provider relationship network, and evaluating the provider association strength and the connection relationship of high-risk providers to obtain a network relationship risk scoring result; S66, integrating a statistical risk scoring result, a behavior pattern risk scoring result, a time consistency risk scoring result and a network relation risk scoring result by using a weighted fusion model, and setting dynamic weights according to business importance to obtain an integrity comprehensive risk score; S67, dividing the suppliers or the pairs of suppliers into three risk levels of high, medium and low based on a preset threshold value, generating an adjustment report, and adjusting the report based on the illegal behaviors to realize the recognition result of the illegal behaviors of the suppliers.
  10. 10. A monte carlo method-based intelligent recognition system for vendor violations, configured to implement the monte carlo method-based intelligent recognition method according to any of claims 1 to 9, comprising: the bidding data preprocessing module is used for acquiring an original bidding record data set and preprocessing the original bidding record data set to obtain an analysis bidding record data set; The supplier characteristic quantization extraction module is used for carrying out multidimensional characteristic extraction on the analysis bidding record data set by utilizing a multidimensional analysis characteristic mechanism, and carrying out supplier characteristic quantization processing on the extracted multidimensional characteristic result by utilizing a supplier relation network to obtain an initial characteristic engineering data set; The multi-granularity time sequence feature fusion module is used for carrying out time sequence mode analysis on the analysis bid record data set by utilizing a multi-granularity time analysis mechanism, and fusing a time sequence distribution analysis result with the initial feature engineering data set to obtain an enhanced multi-dimensional feature engineering data set; The Monte Carlo simulation model analysis module is used for constructing a Monte Carlo simulation model, and the Monte Carlo simulation model is utilized to identify and construct an enhanced multi-dimensional characteristic engineering data set so as to obtain an abnormal behavior portrait; The provider performs identification screening on the abnormal behavior portraits and builds a multi-level data view, and performs inspection simulation on the multi-level data view by using the Monte Carlo simulation model to obtain a randomized scheme of the provider conditions; the multidimensional risk grade judging module is used for carrying out evaluation and identification on the quantitative comprehensive risk score by utilizing the multidimensional risk evaluation and the intelligent identification terminal, carrying out integration processing on the evaluation and identification result to obtain an integrity comprehensive risk score, carrying out grade division on the supplier based on a preset threshold value and the integrity comprehensive risk score, and generating an illegal action adjustment report so as to realize the identification of the illegal action of the supplier.

Description

Intelligent recognition method and system for vendor violation based on Monte Carlo method Technical Field The invention relates to the technical field of data processing, in particular to a method and a system for intelligently identifying illegal behaviors of suppliers based on a Monte Carlo method. Background In purchasing bidding, there may be collusion of bidding activities, such as bidder ring and cross-bidding, between suppliers, one common approach being to ensure bidding in a particular supplier by manipulating the bid. The phenomenon of price-poor unary may be an intentional strategy, for example, 100 yuan for one newspaper and 101 yuan for the other, so that the signer is strongly competitive, and is actually collusion between suppliers. Because the spread is extremely small, bid results are typically not affected, for example, bid in low price bidding, bid in low bidding suppliers, but may be used to disguise collusion such as alternately bid or protect a particular supplier. The monte carlo method is a statistical simulation method based on random sampling for evaluating the probability of occurrence of an event. In this scenario, by simulating a large number of random quotation scenarios, the frequency at which quotation differences unary occur under random conditions is observed, and the actual observed frequency is compared with the simulated frequency. If the actual frequency is significantly higher than the simulated frequency, it is indicated that the bad quotation unary may not be randomly generated, thereby suggesting that there may be an offence. The prior approximation scheme is to use a traditional hypothesis test method for chi-square fitting goodness test to test whether the observed quotation difference distribution accords with uniform distribution, and the steps are as follows, a hypothesis is established, and the original hypothesis (H0) is that the distribution of the quotation differences has no obvious difference from the uniform distribution. Alternative hypothesis (H1) is that the distribution of reporting cost differences is significantly different from the uniform distribution. Data is prepared and bid difference data, i.e., bid differences of any two suppliers in all bids, is collected. And determining the range of the price reporting difference, and dividing the interval. And calculating theoretical frequency numbers, and calculating expected frequency numbers of each interval according to uniform distribution. Desired frequency number=total frequency number per interval number of each interval is uniformly distributed. Chi-square statistics are calculated for each interval, and (observation frequency-expected frequency)/(2/expected frequency) is calculated for each interval. And calculating the total chi-square statistic, and adding the chi-square statistic of each interval to obtain the total chi-square statistic. The degree of freedom is determined, and in a uniform distribution, the degree of freedom is the number of intervals minus 1. P-value=1-chi2.cdf (chi_square_static, degrees _of_ freedom) is calculated. Decision making is made, if the p-value is less than the significance level, the original assumption is rejected, and the reporting price difference distribution is considered to be significantly different from the theoretical distribution, i.e. the reporting price difference distribution is not uniform, and there may be artificial manipulation. The existing approximation scheme uses a traditional hypothesis test method chi-square fitness test, and has the defect of sensitivity to distributed hypotheses that chi-square test depends on a set theoretical distribution. If the set theoretical distribution does not match the distribution in the actual random case, the test result may be unreliable. For example, in bid price quotes, price quotes may not be evenly distributed in nature, and then using an even distribution as a theoretical distribution may lead to erroneous conclusions. A sufficient sample size is required-chi-square test requires that the expected frequency per interval cannot be too small. If the sample size is insufficient, a merge section may be required, and the merge section may mask some important details. For example, the 1-element price difference is concerned, but if the expected frequency of the 1-element price difference is less than 5, it is necessary to merge with other intervals, so that it cannot be checked separately whether the 1-element price difference is abnormal. Complex collusion patterns may not be detected-chi-square testing is a test of the overall distribution, which may be insensitive to certain specific patterns, especially when such patterns are small in the overall distribution. For example, if collusion occurs only in a few bids, while most reporting errors are random, chi-square checks may fail to find such a few but important anomalies. The sensitivity to extremes is low-the chi-square test groups data, the specific