CN-122019985-A - Semi-supervised learning-based shield tunneling parameter optimization prediction method and system

CN122019985ACN 122019985 ACN122019985 ACN 122019985ACN-122019985-A

Abstract

The invention discloses a shield tunneling parameter optimization prediction method and system based on semi-supervised learning, and relates to the technical field of engineering construction control. The method comprises the steps of preprocessing data and generating features, carrying out geological clustering by adopting multi-algorithm fusion, carrying out feature optimization and multi-model construction, carrying out semi-supervised enhancement by cooperative training, noise enhancement and iterative learning in the geological cluster, selecting an optimal model based on comprehensive scores, carrying out route prediction by using cluster center distance, and outputting uncertainty assessment. The method effectively solves the problems of poor data quality, rare labels, geological distribution deviation and low calculation efficiency in shield construction, and improves the accuracy, generalization capability and engineering applicability of tunneling parameter prediction.

Inventors

HU XIONGYU
HUANG YUBO
WANG YANZHENG
ZOU JIAZHI
LAI XIN
Ning Zhichen
XU JUNHAO
ZHENG XIAOBO
AN ZIHAN

Assignees

西南交通大学

Dates

Publication Date: 20260512
Application Date: 20260409

Claims (10)

1. The shield tunneling parameter optimization prediction method based on semi-supervised learning is characterized by comprising the following steps of: S1, extracting active control parameters, passive response parameters and geological related parameters from historical construction data, performing multi-stage complementation on a missing value, and performing double-index detection and cleaning on the missing value; S2, generating expansion features on the basis of the preprocessed passive response parameters and the geological related parameter data, wherein the expansion features comprise interaction features, statistical features and combination features; S3, respectively adopting a plurality of normalization methods to construct a multi-scale view for the geological related parameters, clustering by using a plurality of clustering algorithms under each view, and generating a final geological cluster label through a two-stage fusion strategy; S4, constructing a random forest regressive to evaluate the characteristic scaling effect under different normalization methods, selecting an optimal scaling mode, and executing multi-stage characteristic screening; s5, extracting a corresponding sample subset aiming at each geological cluster, and training a plurality of prediction models aiming at each active parameter in the subset; s6, executing collaborative training regression, gaussian noise data enhancement and iterative incremental training in the geological cluster, and improving the generalization capability of the model; s7, under the combination of each geological cluster and the active parameter, selecting an optimal submodel based on the comprehensive evaluation index on the verification set; s8, calculating the distance between the new input data and the center of each geological cluster, routing the new input data to the corresponding sub-model for prediction, and calculating the uncertainty score of the prediction result; and S9, saving the trained model and related components as a structured file to support local and remote deployment.
2. The semi-supervised learning-based shield tunneling parameter optimization prediction method according to claim 1, wherein the step S1 of performing multi-stage filling on missing values comprises performing smooth filling on a feature column with continuous missing values by adopting a cubic spline interpolation method, and performing median filling on isolated missing values still existing after interpolation; The double-index detection and cleaning of the abnormal value comprises the steps of firstly calculating the correction Z score of each characteristic, screening out samples exceeding a preset threshold value as suspected abnormal values, then calculating the mahalanobis distance of each sample in a multidimensional characteristic space to measure the deviation degree of each sample relative to the whole distribution, and eliminating the samples with the top 5% of the abnormal degree ranking.
3. The semi-supervised learning-based shield tunneling parameter optimization prediction method is characterized in that the interactive features in the S2 are second-order interactive polynomial features, and only interactive items among different original features are reserved; The statistical features comprise mean, standard deviation, skewness and kurtosis; the combined features include sum, product, and ratio features of the geologic parameters, wherein a denominator of the ratio features introduces a smoothing factor.
4. The semi-supervised learning-based shield tunneling parameter optimization prediction method according to claim 3, wherein the plurality of normalization methods in S3 include normalization, robust scaling and quantile scaling; The plurality of clustering algorithms comprise K-means, DBSCAN and Gaussian mixture models; The two-stage fusion strategy comprises the steps of carrying out consistency voting on clustering results of different scale views in the same algorithm to obtain stable labels of the algorithm, and carrying out majority voting on the stable labels of different algorithms to generate a final geological cluster label.
5. The semi-supervised learning-based shield tunneling parameter optimization prediction method is characterized in that the step S4 of selecting the optimal scaling mode comprises the steps of respectively constructing random forest regressions under different normalization methods, calculating a decision coefficient R2 by using 3-fold cross validation, and selecting the scaling mode with the highest R2; the multi-stage feature screening includes variance filtering, mutual information selection, and recursive feature elimination performed in sequence.
6. The semi-supervised learning based shield tunneling parameter optimization prediction method according to claim 5, wherein the plurality of prediction models in S5 include random forests, extreme random trees, gradient lifting trees, XGBoost, lightGBM, catBoost and deep neural network regressors; When the GPU is detected to be available, the GPU acceleration model training process is automatically started.
7. The semi-supervised learning-based shield tunneling parameter optimization prediction method is characterized in that the collaborative training regression in S6 comprises the steps of predicting unlabeled data by using two types of base models with complementary error characteristics in the same geological cluster, screening samples with prediction difference values lower than a consistency threshold and within a typical range from the cluster center, generating pseudo labels by weighted average, and injecting the pseudo labels into a training set; in the Gaussian noise data enhancement, the noise standard deviation is proportional to the standard deviation of the corresponding feature; the iterative incremental training divides training data into a plurality of incremental packages and updates model parameters by batch.
8. The optimized prediction method of shield tunneling parameters based on semi-supervised learning as recited in claim 7, wherein the comprehensive evaluation index in S7 includes a decision coefficient Root mean square error RMSE and mean absolute error MAE; the selection of the optimal submodel is based on a comprehensive scoring formula: selecting a composite score The highest model serves as the optimal sub-model.
9. The semi-supervised learning-based shield tunneling parameter optimization prediction method is characterized in that the routing process in the step S8 comprises the steps of calculating Euclidean distances between input geological parameters and the centers of all geological clusters, and selecting a submodel corresponding to the cluster with the smallest distance for prediction; when the sub-model is an integrated model, the uncertainty score is the standard deviation of the prediction results of each base learner.
10. A semi-supervised learning based shield tunneling parameter optimization prediction system, characterized by being configured to implement the method of claims 1-9, comprising: the data preprocessing module is used for classifying and extracting the historical construction data, performing multi-stage complementation of the missing values, detecting and cleaning the abnormal values based on the corrected Z scores and the Markov distances, and generating interaction, statistics and combination characteristics; The geological clustering module is used for carrying out multi-scale normalization processing on geological parameters, carrying out parallel clustering by adopting K-means, DBSCAN and Gaussian mixture models, and generating geological cluster labels and cluster centers through a two-stage fusion strategy; the feature optimization module is used for evaluating the performances of different normalization methods through a random forest regressor, selecting an optimal scaling mode, and sequentially carrying out variance filtering, mutual information selection and recursive feature elimination to obtain an optimized feature subset; The model construction module is used for training a plurality of base models for each geological cluster sample and implementing a semi-supervision enhancement strategy based on collaborative training, gaussian noise enhancement and iterative incremental learning in the geological clusters; the model selection module is used for evaluating each candidate model on the verification set based on the comprehensive scoring formula and selecting an optimal sub-model under the combination of each geological cluster and the active parameters; The prediction and uncertainty evaluation module is used for carrying out geological routing according to the distance between the input data and the cluster center, calling corresponding sub-model prediction, and outputting and calculating the prediction standard deviation of each base learner as an uncertainty score for the integrated model; the model storage and deployment module is used for saving the trained model, the normalizer, the geological tag and the evaluation result as a structured file and supporting local or remote deployment through the REST API and the WebSocket interface.

Description

Semi-supervised learning-based shield tunneling parameter optimization prediction method and system Technical Field The invention relates to the technical field of engineering construction control, in particular to a shield tunneling parameter optimization prediction method and system based on semi-supervised learning. Background The current shield tunneling parameter prediction method mainly can be divided into a method based on a mechanism model and a method based on data driving. The method based on the mechanism model relies on theories such as geotechnical mechanics, shield machine dynamics and the like to establish a mathematical model, and has interpretability in mechanism, but the modeling process needs a large number of geology and equipment internal parameters which are difficult to accurately acquire, and has poor adaptability to complicated working conditions such as formation heterogeneity, groundwater change and the like, and model assumption conditions are often difficult to meet in actual construction, so that prediction accuracy is limited. The data-driven method, in particular, the machine learning technology is used to learn the mapping relation from the historical data, so that the method is becoming the main stream of research. Such methods include traditional statistical analysis and modern machine learning models. While data-driven approaches reduce reliance on accurate mechanism models, their performance is severely limited by data quality and integrity. In actual engineering, a large number of missing values, abnormal noise and multi-source heterogeneous characteristics of shield construction data generally exist, and the input quality and stability of a model are directly affected. Meanwhile, the traditional supervised learning model generally assumes that training data and predicted data follow the same distribution, however, in long-distance tunneling, geological conditions often show severe and non-stable changes, so that the model faces a serious distribution Shift (Domain Shift) problem, and the generalization capability of the model is obviously reduced when the model is applied across stratum or engineering. Aiming at the problem of distribution deviation, although research is conducted to introduce transfer learning or field self-adaptation technology, the shield construction field is still in a preliminary exploration stage, and the limitations of complex calculation, high dependence on labeling data, failure in deep fusion with engineering geological features and the like generally exist. In addition, a plurality of key parameters (such as slag soil characteristics and cutter abrasion) in the construction process depend on manual or laboratory detection, and real-time full-quantity acquisition is difficult, so that label data for model training is seriously scarce, and further improvement of the performance of a supervised learning model is restricted. From the engineering implementation point of view, the prior art scheme has the following systematic defects that firstly, a data processing flow is usually isolated and dispersed, preprocessing, characteristic engineering, model training and deployment links are split, an end-to-end integrated architecture is lacked, the complexity of system maintenance and result tracing is increased, secondly, a large-scale and high-dimensional real-time construction data flow is faced, the traditional CPU-based calculation mode is low in link efficiency such as characteristic construction, cluster analysis and multi-model training, and the like, the real-time requirements of a construction site on low-delay prediction and quick decision are difficult to meet, thirdly, the geological condition recognition, characteristic self-adaptive optimization, semi-supervised learning and high-performance calculation can be organically fused by a fresh method, and a comprehensive solution which can simultaneously cope with poor data quality, few labels, rapid distribution change and high calculation requirements is formed. Disclosure of Invention In order to solve the problems of insufficient data quality processing, poor model generalization capability, poor label scarcity precision reduction, low calculation efficiency, lack of integrated system architecture and the like in the existing shield tunneling parameter prediction technology, the invention provides a shield tunneling parameter optimization prediction method and system based on semi-supervised learning, which realize efficient processing, geological self-adaptive modeling and real-time prediction of multi-source heterogeneous construction data so as to solve the problems. The application discloses a shield tunneling parameter optimization prediction method based on semi-supervised learning, which comprises the following steps: S1, extracting active control parameters, passive response parameters and geological related parameters from historical construction data, performing multi-stage co