Search

CN-121981787-A - Power transmission and transformation project cost prediction method and system based on data mining

CN121981787ACN 121981787 ACN121981787 ACN 121981787ACN-121981787-A

Abstract

The invention discloses a power transmission and transformation project cost prediction method and system based on data mining, which relate to the technical field of project cost prediction and comprise the steps of collecting multi-dimensional data covering project attributes, environmental factors and historical cost, and establishing a project database; the method comprises the steps of filling and preprocessing missing values of data, calculating a correlation coefficient matrix, determining the number of principal components based on principal component analysis, calculating a principal component load matrix, extracting a candidate key field set, carrying out partial correlation analysis, cross validation and regression significance test on the candidate field and engineering cost data, identifying key factors with obvious influence on the cost, constructing an engineering cost prediction model based on the key factors, and carrying out scientific prediction on the cost of a new engineering project. The invention solves the problems of strong dependence of traditional experience estimation and low data utilization rate, realizes systematic identification and cost prediction of key factors of the cost, and provides support for investment decision and cost control of power grid construction.

Inventors

  • GU ZHIHONG
  • SHI ZHUOPENG
  • ZHAO HAIBO
  • AN XIANGYONG
  • XING XIAOXIA
  • WEN WEI
  • FAN XIYING
  • LIU ZHUO

Assignees

  • 国网山西省电力有限公司经济技术研究院

Dates

Publication Date
20260505
Application Date
20251204

Claims (10)

  1. 1. The power transmission and transformation project cost prediction method based on data mining is characterized by comprising the following steps of: acquiring historical project basic information of power transmission and transformation projects, and establishing a project basic information base; Acquiring environmental factor information related to power transmission and transformation engineering, establishing an environmental factor information base, and associating the environmental factor information base with a project basic information base to form a project database supporting environment-cost joint analysis; Extracting engineering cost fields and related influence factor fields from a project database, preprocessing, obtaining a correlation result between each influence factor and the engineering cost, and constructing a cost correlation coefficient matrix according to the correlation result; Based on the cost correlation coefficient matrix, obtaining a characteristic value and a characteristic vector thereof, obtaining a characteristic value contribution rate and an accumulated contribution rate, forming a candidate principal component set, obtaining a cost sensitivity score according to the characteristic value contribution rate of the candidate principal component set, and then screening principal components meeting double judgment conditions according to a preset accumulated contribution rate threshold value and a cost sensitivity threshold value to form a cost principal component set; Constructing a load matrix based on the cost main component set, extracting key fields, and forming a candidate key field set; The method comprises the steps of obtaining historical project cost data of power transmission and transformation projects, carrying out correlation analysis and regression inspection on candidate field sets and the historical project cost data, determining key factors which have obvious influence on the cost, forming a cost key factor set, and constructing a project cost prediction model according to the cost key factor set.
  2. 2. The method for predicting the cost of power transmission and transformation project based on data mining according to claim 1, wherein the steps of obtaining environmental factor information related to the power transmission and transformation project, establishing an environmental factor information base, and associating the environmental factor information base with a project basic information base to form a project database supporting the combined analysis of environment and cost comprise: obtaining basic information of a history project from a completion file, a cost audit file and a bidding record of a power transmission and transformation project, wherein the basic information comprises project cost data related to project scale, equipment category, construction period, construction process and labor cost; In the acquired basic information of the history project, setting a field directly related to the total construction cost of the project as a construction cost field, and setting a field indirectly related to the total construction cost of the project as a related influence factor field; Archiving and storing the engineering cost field and the related influence factor field according to the engineering number, establishing a project basic information base, and configuring a field index for the project basic information base; Extracting environmental factor data from geological data, meteorological monitoring records and construction site investigation results of an area where the engineering is located, wherein the environmental factor data comprises geological conditions, climate parameters, topography characteristics and construction area natural risk parameters; Storing the environmental factor data into an environmental factor information base, and configuring an associated identifier corresponding to the engineering number for the environmental factor information base; Setting an environment-cost association priority rule based on an environment factor information base; When the geological conditions in the environmental factors are specific foundation types and the climate parameters are specific annual weather indexes, preferentially associating foundation construction cost and anti-icing equipment cost type engineering cost fields, wherein the specific foundation types are foundation types related to soft soil, sand gravel soil and rock foundations, and the specific annual weather indexes are the conditions that the annual average air temperature, the extreme minimum air temperature and the weather indexes related to frozen soil depth reach preset thresholds; When the terrain features in the environmental factors are of specific terrain types and the natural risk parameters of the construction area are of specific natural risk occurrence probabilities, preferentially associating the earthwork excavation cost with the engineering cost fields of the side slope protection cost class, wherein the specific terrain types comprise hilly terrains, mountain terrains, valley terrains, plateau terrains and other relevant terrains, and the specific natural risk occurrence probabilities are the risk probabilities when the historical occurrence frequency of landslide, debris flow and flood related natural disaster risks reaches a preset threshold; according to the environment-cost association priority rule, taking the engineering number as a main association key, and correspondingly matching the environment factor data in the environment factor information base with the engineering cost field and the related influence factor field in the project basic information base; In the matching process, when the environmental factors meet the priority rule, the system executes field level mapping, and establishes a one-to-many or many-to-one mapping relation between corresponding environmental factor data, engineering cost fields and related influence factor fields; based on the mapping relation, corresponding association weight values are allocated to the environmental factors, association pairs of the environmental factors, the engineering cost fields and related influence factor fields are established, and the association pairs are integrated to form a project database supporting environment-cost joint analysis.
  3. 3. The method for predicting construction cost of power transmission and transformation project based on data mining according to claim 1, wherein the steps of extracting the construction cost field and the related influence factor field from the project database, preprocessing, calculating the correlation result between each influence factor and the construction cost, and constructing a construction cost correlation coefficient matrix according to the correlation result comprise: extracting engineering cost fields and related influence factor fields from a project database, and selecting a corresponding interpolation method for the missing values according to field attributes to perform filling processing to obtain filled data; Performing standardized preprocessing on the data with different dimensions, and further performing conditional triggering normalization on the fields greatly influenced by the external environment to obtain preprocessed data; Based on the preprocessed data, respectively calculating a Pierson correlation coefficient and a Speermann class correlation coefficient between each correlation influence factor and the engineering cost to obtain a multidimensional correlation analysis result; weighting and fusing the multidimensional correlation analysis results to form a comprehensive correlation score; Ordering all relevant influence factors in a descending order based on the comprehensive relevance score to obtain an influence factor sensitivity sequence; And constructing a cost correlation coefficient matrix according to the influence factor sensitivity sequence, wherein matrix rows represent the correlation influence factors, and matrix columns represent the corresponding correlation coefficients and weight information.
  4. 4. The method for predicting the cost of a power transmission and transformation project based on data mining according to claim 1, wherein the method for predicting the cost of the power transmission and transformation project based on the cost correlation coefficient matrix is characterized by obtaining a characteristic value and a characteristic vector thereof to obtain a characteristic value contribution rate and an accumulated contribution rate, forming a candidate principal component set, obtaining a cost sensitivity score according to the characteristic value contribution rate of the candidate principal component set, and then screening principal components meeting a double determination condition according to a preset accumulated contribution rate threshold and cost sensitivity threshold to form a cost principal component set, and specifically comprising: sequencing the characteristic values according to the sequence from the big value to the small value to obtain a characteristic value sequence; based on the characteristic value sequence, obtaining the ratio of each characteristic value to the sum of all characteristic values to obtain the characteristic value contribution rate; Accumulating on the basis of the characteristic value contribution rate to obtain an accumulated contribution rate, and drawing an accumulated contribution rate curve; based on the accumulated contribution rate curve, identifying the inflection point position of the accumulated contribution rate curve which is gradually changed from rapid rising to gradual rising, and screening out a corresponding number of candidate main component sets under the conditions of inflection points and threshold values by combining with a preset accumulated contribution rate threshold value; Fusing the characteristic value contribution rate in the candidate principal component set with the corresponding weight in the cost correlation coefficient matrix, and calculating a cost sensitivity score; according to the cost sensitivity score and the preset cost sensitivity threshold, screening out main components with the cost sensitivity score larger than or equal to the preset cost sensitivity threshold to form a final cost main component set; the calculation formula of the accumulated contribution rate is as follows: In the formula, Is in front of The cumulative contribution rate of the individual characteristic values, Is the first The value of the characteristic is a value of, Is the total number of the characteristic values; the cost sensitivity score is calculated by the following formula: In the middle of Is the first Cost sensitivity scores for the candidate principal components, Is the first The eigenvalue contribution ratio of each candidate principal component, Is the first The candidate principal components correspond to the weights of the influencing factors in the cost correlation coefficient matrix, 、 Respectively corresponding to the fusion weights 。
  5. 5. The method for predicting the cost of power transmission and transformation project based on data mining according to claim 1, wherein the constructing a load matrix based on the main component set of cost, extracting key fields, and forming a candidate key field set comprises: extracting a feature vector corresponding to each principal component based on the cost principal component set, and carrying out weighting treatment on the preprocessed engineering cost field and related influence factor fields according to the feature vector to obtain a combined value of each principal component on each influence factor to form a principal component matrix; based on the principal component matrix and the characteristic values of the corresponding principal components, calculating the load of each relevant influence factor on each principal component to obtain a principal component load matrix; taking absolute values of elements of the main component load matrix to obtain load intensity of each field on each main component, and sorting in descending order according to the load intensity; And screening out fields with the load intensity larger than a preset threshold value on any main component according to a preset load threshold value and a sequencing rule to form a candidate key field set.
  6. 6. The method for predicting the construction cost of a power transmission and transformation project based on data mining according to claim 1, wherein the steps of obtaining the construction cost data of the history project of the power transmission and transformation project, performing correlation analysis and regression inspection on the candidate field set and the construction cost data, determining key factors which have significant influence on the construction cost, forming a construction cost key factor set, and constructing a project construction cost prediction model according to the construction cost key factor set comprise the following steps: Acquiring historical project cost data of power transmission and transformation projects based on a project database; Carrying out correlation analysis on the candidate field set and the historical project cost data, and calculating a partial correlation coefficient to obtain a correlation analysis result between each field and the project cost; performing stability evaluation on the correlation analysis result, counting the partial correlation coefficient direction and amplitude change of each field in multiple times, screening out fields which show stability, and forming a stable field set; Based on the stable field set, carrying out regression test, establishing a multiple regression model, calculating regression coefficients of each field on engineering cost and statistical significance thereof, and obtaining a regression test result; Screening out fields affecting the construction cost according to regression test results to form a cost key factor set; Taking the cost key factor set as a prediction input characteristic, taking the historical project cost as a prediction target variable, and constructing a project cost prediction model; And inputting the characteristic data of the project to be evaluated into a project cost prediction model to obtain a project cost prediction result.
  7. 7. A power transmission and transformation project cost prediction system based on data mining, for implementing the cost prediction method as claimed in any one of claims 1 to 6, comprising: The main control module is used for completing main component extraction, candidate field screening and key factor identification based on the data transmitted by the information acquisition module and the calculation module, constructing a project cost prediction model based on the identified cost key factors and outputting a project cost prediction result; the information acquisition module is used for acquiring the basic information of the historical project of the power transmission and transformation project and the related environmental factor information, performing field-level mapping and association on the basic information and the related environmental factor information to form a project database supporting environment-cost joint analysis, and acquiring the cost data of the historical project; the computing module is used for filling the missing value of the acquired data, analyzing the correlation and carrying out regression inspection; And the display module is used for interacting with the main control module and displaying key factors and prediction results of the power transmission and transformation project cost.
  8. 8. The data mining-based power transmission and transformation project cost prediction system according to claim 7, wherein the main control module comprises: the control unit is used for acquiring the characteristic value and the characteristic vector based on the cost correlation coefficient matrix, determining the quantity of the main components and calculating a main component load matrix; The screening unit is used for extracting a candidate key field set according to the load threshold and the ordering rule, performing partial correlation analysis, cross-validation stability evaluation and regression significance test on the candidate field and the engineering cost data, and outputting a cost key factor set; the prediction unit is used for constructing a project cost prediction model according to the cost key factor set, inputting characteristic data of the project to be evaluated into the model and outputting a project cost prediction result; The information receiving unit is used for receiving the data transmitted by the information acquisition module and the calculation module and transmitting the data to the control unit and the screening unit.
  9. 9. The power transmission and transformation project cost prediction system based on data mining according to claim 7, wherein the information acquisition module comprises: the basic information acquisition unit is used for acquiring basic information of a power transmission and transformation project history project and extracting a project cost field and a related influence factor field; The environment factor acquisition unit is used for acquiring environment factor data related to engineering and performing field-level mapping and association with the basic information; The manufacturing cost data acquisition unit is used for acquiring the manufacturing cost data of the historical project and correspondingly correlating the manufacturing cost data with the multi-dimensional historical data to form a project database.
  10. 10. The power transmission and transformation project cost prediction system based on data mining according to claim 7, wherein the calculation module comprises: the data processing unit is used for filling the missing value of the acquired data; The data analysis unit is used for calculating a correlation coefficient matrix based on the preprocessed data, extracting a characteristic value and a characteristic vector, calculating an accumulated contribution rate and forming a principal component analysis result, and providing input data of a candidate key field set for the main control module; and the regression analysis unit is used for carrying out correlation analysis and regression inspection on the candidate key fields and the historical project cost data to generate a cost key factor set.

Description

Power transmission and transformation project cost prediction method and system based on data mining Technical Field The invention relates to the technical field of project cost prediction, in particular to a power transmission and transformation project cost prediction method and system based on data mining. Background The key factor identification method for the power transmission and transformation project cost aims at improving the scientificity of investment decisions and the level of cost control of the power grid construction project. Under the background that the power transmission and transformation engineering scale is continuously enlarged and the construction conditions are increasingly complex, the accuracy and the comprehensiveness requirements of the cost prediction are difficult to meet by the traditional experience estimation mode. The analysis method based on data mining can integrate multidimensional data such as engineering attributes, environmental factors, historical cost and the like, reveal core factors influencing the cost, thereby improving the reliability and rationality of a prediction result and providing powerful support for the fine management and investment optimization of power grid construction. However, the existing cost analysis method often depends on expert experience or single factor modeling, has the problems of insufficient data utilization rate and incomplete identification of key factors, and is difficult to reflect the comprehensive influence of multi-dimensional factors on engineering cost. Meanwhile, the existing method generally does not fully consider the relevance of multi-dimensional information such as engineering properties, construction technology, equipment types, environmental factors and the like, so that key influence factors are difficult to systematically extract. In addition, the historical project data acquisition is not standard, so that the analysis result may have deviation, and the scientificity and the application value of the manufacturing cost prediction model are reduced. Therefore, it is necessary to provide a method for predicting the cost of power transmission and transformation projects based on data mining to solve the above problems. Disclosure of Invention In order to solve the technical problems, the technical scheme provides a power transmission and transformation project cost prediction method and system based on data mining, and solves the problem that the existing method provided in the background technology generally does not fully consider the relevance of multi-dimensional information such as project attributes, construction technology, equipment types, environmental factors and the like, so that key influence factors are difficult to systematically extract. In addition, the historical project data acquisition is not standard, so that the analysis result may have deviation, and the problems of scientificity and application value of the manufacturing cost prediction model are reduced. In order to achieve the above purpose, the invention adopts the following technical scheme: a power transmission and transformation project cost prediction method based on data mining comprises the following steps: acquiring historical project basic information of power transmission and transformation projects, and establishing a project basic information base; Acquiring environmental factor information related to power transmission and transformation engineering, establishing an environmental factor information base, and associating the environmental factor information base with a project basic information base to form a project database supporting environment-cost joint analysis; Extracting engineering cost fields and related influence factor fields from a project database, preprocessing, obtaining a correlation result between each influence factor and the engineering cost, and constructing a cost correlation coefficient matrix according to the correlation result; Based on the cost correlation coefficient matrix, obtaining a characteristic value and a characteristic vector thereof, obtaining a characteristic value contribution rate and an accumulated contribution rate, forming a candidate principal component set, obtaining a cost sensitivity score according to the characteristic value contribution rate of the candidate principal component set, and then screening principal components meeting double judgment conditions according to a preset accumulated contribution rate threshold value and a cost sensitivity threshold value to form a cost principal component set; Constructing a load matrix based on the cost main component set, extracting key fields, and forming a candidate key field set; The method comprises the steps of obtaining historical project cost data of power transmission and transformation projects, carrying out correlation analysis and regression inspection on candidate field sets and the historical project cost data, determining key