CN-121980396-A - Random forest classification method based on quantum-classical dual-feature branch fusion

CN121980396ACN 121980396 ACN121980396 ACN 121980396ACN-121980396-A

Abstract

The invention relates to the technical field of quantum machine learning, in particular to a random forest classification method based on quantum-classical dual-feature branch fusion. The problem of classical module parameter solidification, precision is limited is solved. The technical scheme includes that the method comprises the following steps of S1, S2, construction of dynamic quantum feature mapping, S3, construction of a quantum decision tree and S4, namely construction of quantum-classical soft voting integration. Compared with the prior art, the method has the beneficial effects that the quantum resource consumption is greatly reduced, and the adaptability is improved. The overfitting risk is obviously inhibited, and the generalization capability is enhanced. And the classification precision and the robustness are improved. The training efficiency and the universality are obviously enhanced. The engineering floor is strong and the cooperative advantage is outstanding.

Inventors

JI MINGZHU
CHENG XUEYUN
YANG FAN
QIN SHUO
WU JUNDE
ZHANG HAOCHENG
Zan Yiming

Assignees

南通大学

Dates

Publication Date: 20260505
Application Date: 20251208

Claims (10)

1. A random forest classification method based on quantum-classical dual-feature branch fusion is characterized by comprising the following steps: Step S1, data preprocessing and characteristic engineering are executed; S2, constructing dynamic quantum feature mapping; S3, constructing a quantum decision tree; And S4, constructing quantum-classical soft voting integration.
2. The random forest classification method based on quantum-classical dual-feature branch fusion according to claim 1, wherein the step S1 specifically comprises the following steps: step S11, loading a target data set and defining a feature matrix And tag vector ; Wherein R represents the real number domain, N represents the total number of samples, d represents the feature dimension, K represents the total number of classification categories, The d-dimensional characteristics of the characteristic matrix comprise various characteristic indexes corresponding to the data set; Step S12, carrying out linear characteristic normalization on X, as shown in a formula (1): x′ ij = (1); Wherein the method comprises the steps of As the original characteristic value of the object is obtained, Minimum value and maximum value of j-th dimension characteristic respectively, and normalizing ∈[0,1]; Step S13, implementing hierarchical sampling and dividing the data set and controlling the training set And test set The category distribution of (1) is consistent with the original data, namely for any category ; Satisfy formula (2): (2); Wherein: for the number of samples of the training set, In order to test the number of set samples, For category identification, corresponding to different classification categories, Representing the number of samples belonging to class k in the training set, Indicating that the test set belongs to a category Is used for the number of samples of (a), Representing the number of samples belonging to class k in the original data.
3. The random forest classification method based on quantum-classical dual-feature branch fusion according to claim 1, wherein step S2 is specifically: s21, selecting the basic framework and performing adaptive design; step S22, configuring two groups of core parameters for each quantum decision tree; Step S23, setting differentiated parameter allocation strategies for the characteristic attributes of different data sets based on dynamic matching logic of the characteristic dimension and the characteristic association strength of the data sets.
4. The random forest classification method based on quantum-classical dual-feature branch fusion according to claim 1, wherein step S3 is specifically: Step S31, receiving the feature data preprocessed in step S1 and converted by the dynamic quantum feature mapping in step S2 at the input layer S32, executing quantum feature mapping and quantum kernel function calculation on a quantum calculation layer, and constructing a quantum kernel function based on dynamic quantum feature mapping; step S33, configuring quantum support vector machine parameters, randomly adjusting regularization parameters C in a [1.0,3.0] interval, and inhibiting overfitting by controlling punishment force on the misclassified samples; And S34, enabling a probability output function of the quantum support vector machine, and outputting a classification result and a prediction confidence of the corresponding category.
5. The random forest classification method based on quantum-classical dual-feature branch fusion according to claim 1, wherein step S4 is specifically: step S41, partial sample sampling enhancement diversity; Step S42, soft voting fusion improves decision accuracy; Step S43, quantum-classical hybrid architecture.
6. A method of random forest classification for quantum-classical dual-feature branch fusion according to claim 3, wherein the two sets of core parameters are repetition times and entanglement patterns.
7. The random forest classification method based on quantum-classical dual-feature branch fusion according to claim 6, wherein the repetition number is an iterative operation number of a quantum circuit.
8. The random forest classification method based on quantum-classical dual-feature branch fusion according to claim 6, wherein the entanglement mode is a correlation mode of different features in quantum space.
9. The random forest classification method based on quantum-classical dual-feature branch fusion according to claim 8, wherein the value of the association mode is limited to three types of linear, full entanglement and annular, and different feature association scenes are respectively adapted.
10. The random forest classification method based on quantum-classical dual-feature branch fusion according to claim 9, wherein, The linear entanglement pattern is suitable for the feature to present adjacent associated scenes in a specific order; the full entanglement mode is suitable for scenes with association among any characteristics; The loop entanglement mode is suitable for a scene with closed loop correlation of head and tail features.

Description

Random forest classification method based on quantum-classical dual-feature branch fusion Technical Field The invention relates to the technical field of quantum machine learning, in particular to a random forest classification method based on quantum-classical dual-feature branch fusion. Background Along with the penetration of quantum computing (Quantum Computing, QC) in the machine learning field, quantum random forests (Quantum Random Forest, QRF) have been primarily applied to iris (4-dimensional features) and other low-dimensional data classification by virtue of quantum superposition and entanglement characteristics as a quantum enhancement scheme of classical random forests. However, the prior art is limited by a fixed architecture design and a single fusion strategy, is difficult to adapt to the resource limitation of a noise medium Quantum (NISQ) device, and has obvious defects in classification precision and generalization capability, and specific defects are as follows: (1) The quantum feature mapping has poor suitability and serious resource waste. In the prior art, quantum characteristic mapping (for example, ZZFeatureMap fixed repetition times reps =3 and entanglement mode ENTANGLEMENT =full) with a fixed structure is adopted, iris data sets are taken as an example according to data characteristic dynamic adjustment parameters [Li J, Wang H, Zhang C. Fixed Quantum Feature Mapping-Based Quantum Random Forest for Low-Dimensional Data Classification[J]. Journal of Quantum Information Science, 2022, 12(3): 45-62]., sepal characteristics (high linear distinction degree) and petal characteristics (high non-linear distinction degree) of the iris data sets are required to be mapped differently, but the fixed entanglement mode can lead to quantum line redundancy, namely, the strong use of 'full' entanglement on the linear characteristics can increase the quantum line depth by 25%, quantum bit noise accumulation exceeds 18%, and the final test set accuracy is lower than 92%. (2) The sampling strategy is single, and the integration diversity is insufficient. In the existing method, each quantum decision tree uses a full quantity of training samples (for example, 105 samples of the iris training set are all input), so that the prediction results of a plurality of trees are highly similar, the generalization capability is limited [Zhao Y, Liu S, Chen M. Comparative Study on Classical and Quantum Random Forest for Small-Sample Classification Tasks[J]. IEEE Transactions on Emerging Topics in Computational Intelligence, 2023, 7(2): 189-201]. on the iris dataset, the accuracy of the training set of the existing method reaches 99%, but the accuracy of the testing set is only 91%, and the overfitting rate is more than 8%. (3) The fusion mechanism is inefficient and confidence is not utilized. The traditional scheme adopts a 'hard voting' to fuse quantum and classical module results, equal weight [Chen K, Yang D, Hu J. Research on Quantum-Classical Fusion Mechanisms for Ensemble Classification Models[J]. Quantum Information Processing, 2021, 20(11): 345-368]. is given to the classification confidence, for example, the confusion rate of the iris Versicolor and VIRGINICA categories by a quantum support vector machine (Quantum Support Vector Machine, QSVM) reaches 15 percent (low confidence), the confusion rate of the classical support vector machine (CLASSICAL SUPPORT VECTOR MACHINE, CSVM) is only 5 percent (high confidence), and equal voting of the two results in 30 percent increase of error samples. (4) Classical module parameters cure, with limited precision. In the prior art, a classical module mostly adopts an unoptimized base CSVM, a fixed regularization parameter c=1.0, a kernel function parameter gamma= 'auto', characteristic correlation [Wang L, Zhang X, Li P. Comparative Experiment on Parameter Optimization of Classical Support Vector Machine[J]. Journal of Computer Applications, 2020, 40(8): 2345-2352]. cannot be adapted to an iris data set, the accuracy of classification of CSVM alone is only 88%, prediction probability cannot be output, and the method is difficult to cooperate with a quantum module. In summary, the existing method has the defects of "Quantum feature mapping suitability, sample sampling diversity, quantum-classical fusion efficiency and classical module (basic SVM) precision", so that the existing method is difficult to adapt to the resource limitation of noise medium-Scale Quantum (NISQ) equipment and the classification requirement of small sample and high-dimensional data, and an optimization scheme for dynamically adapting the data, enhancing the integration diversity and fusing branches efficiently is needed. Disclosure of Invention The invention aims to provide a random forest classification method based on quantum-classical dual-feature branch fusion. The invention solves the problems of poor suitability caused by fixed feature mapping, insufficient diversity caused by single