Search

CN-122020337-A - Single-molecule charge transmission unsupervised identification method, device, equipment and storage medium

CN122020337ACN 122020337 ACN122020337 ACN 122020337ACN-122020337-A

Abstract

The invention relates to the technical field of single-molecule electronics, in particular to a single-molecule charge transmission unsupervised identification method, a device, equipment and a storage medium, wherein the method unifies formats of original conductivity data through standardized processing, relies on dual-branch comparison learning and feature fusion to generate global features fitting single-molecule physical characteristics, adopts progressive strategies of single-classification pre-screening and two-classification fine screening to realize extraction of effective conductivity traces, combines a data standardization and optimal cluster number determination mechanism to complete unsupervised classification, solves the problems of low data utilization rate and weak generalization capability caused by the fact that only single-molecule data training is adopted in the prior art, avoids interference of subjective errors on analysis results, and can adapt to analysis requirements of a multi-molecule system and large-scale single-molecule conductivity measurement data according to the high-level characteristics of conductivity step classification, thereby overcoming the defects of the prior art in the aspects of feature extraction pertinence, effective trace screening precision, automation degree and classification refinement.

Inventors

  • QIAN SIYU
  • HE YUANQIN
  • BI HAI
  • HUANG WEIHUA
  • DI ZIXIANG
  • Tu Zhengqian

Assignees

  • 季华实验室

Dates

Publication Date
20260512
Application Date
20260413

Claims (10)

  1. 1. An unsupervised identification method for single molecule charge transfer, comprising: acquiring at least two groups of experimental data sets, wherein the experimental data sets comprise blank substrate experimental data sets and molecular substrate experimental data sets; respectively carrying out standardization processing on the two groups of experimental data sets to obtain a blank dimension feature vector and a molecular dimension feature vector; Taking the blank dimension feature vector as a reference sample, and sequentially carrying out double-branch contrast learning processing and feature fusion processing on the blank dimension feature vector and the molecular dimension feature vector to obtain blank global features and molecular global features; Taking the blank global feature as a reference, sequentially carrying out single classifier pre-screening treatment and classifier fine screening treatment on the molecular global feature to obtain a molecular positive sample, wherein the molecular positive sample comprises a plurality of electric conduction track lines with electric conduction step features; carrying out standardization treatment on the molecular positive sample to obtain a standard molecular positive sample, and determining the optimal cluster number; And carrying out unsupervised clustering treatment on the standard molecule positive samples based on the optimal clustering number to obtain classified positive samples, wherein the classified positive samples are classification results of a plurality of electric conduction trace lines classified according to the height of the electric conduction steps.
  2. 2. The method of claim 1, wherein the acquiring at least two sets of experimental data comprises: setting up a single-molecule conductivity measurement experiment system based on a scanning tunnel microscope splitting technology, and setting core parameters of the experiment system, wherein the core parameters comprise a stretching rate, a stretching stroke and a sampling frequency; taking a gold needle point as an upper electrode, taking a blank gold substrate or a molecular functional gold substrate as a lower electrode, and repeatedly carrying out a plurality of stretching experiments by adopting the experiment system to obtain a plurality of stretching experiment results; and respectively carrying out baseline correction treatment, outlier rejection treatment and effective segment cutting pretreatment on a plurality of tensile experiment results to construct the blank substrate experiment data set and the molecular substrate experiment data set.
  3. 3. The method of claim 1, wherein the normalizing the two sets of experimental data to obtain a blank dimension feature vector and a molecular dimension feature vector comprises: Extracting a conductance value sequence of each conductance trace line in the blank substrate experimental data set and the molecular substrate experimental data set and carrying out logarithmic transformation treatment; dividing the logarithmic transformation processed electric conduction value sequence into a preset number of equidistant electric conduction intervals by adopting a direct method statistics method, and counting the occurrence frequency of electric conduction values of each electric conduction trace line in each electric conduction interval so as to construct a blank distribution vector and a molecule distribution vector of one-dimensional frequency characteristics; And respectively carrying out normalization processing and principal component analysis dimension reduction processing on the blank distribution vector and the molecular distribution vector to obtain the blank dimension feature vector and the molecular dimension feature vector.
  4. 4. The method according to claim 1, wherein the sequentially performing a dual-branch contrast learning process and a feature fusion process on the blank dimension feature vector and the molecular dimension feature vector with the blank dimension feature vector as a reference sample to obtain a blank global feature and a molecular global feature includes: constructing a contrast learning neural network with double-branch sharing weight, inputting the blank dimension feature vector as a reference sample into a first branch of the contrast learning neural network, and inputting the molecular dimension feature vector into a second branch of the contrast learning neural network; Training the contrast learning neural network by adopting InfoNCE loss functions to extract blank high-level semantic features and molecular high-level semantic features which are output by the contrast learning neural network; performing series fusion processing on the blank advanced semantic features and the blank dimension feature vectors to obtain blank global features; and carrying out serial fusion processing on the molecular high-level semantic features and the molecular dimension feature vectors to obtain the molecular global features.
  5. 5. The method according to claim 1, wherein the performing single classifier pre-screening and classifier fine screening on the molecular global feature sequentially based on the blank global feature to obtain a molecular positive sample, where the molecular positive sample includes a plurality of conductive trace lines with conductive step features, includes: Taking the blank global features as a training set to train a single classifier model; Inputting the molecular global features into a single classifier model which is completed by training, removing pure tunneling samples which are highly similar to the blank global features, and reserving abnormal samples which potentially comprise conductance steps as pre-screening results; A blank global feature is used as a sample with a label of 0, an abnormal sample obtained by pre-screening is used as a sample with a label of 1, a two-classification training set is constructed, and a two-classifier model is trained based on the two-classification training set; inputting the molecular global features into a trained classifier model, and separating out a molecular positive sample comprising the conductance step features by cross-verifying a preset probability threshold.
  6. 6. The method of claim 1, wherein normalizing the positive molecular samples to obtain standard positive molecular samples and determining an optimal cluster number comprises: performing standardization treatment on the molecular positive sample by adopting a Z-Score standardization method to obtain a standard molecular positive sample; Based on the standard molecular positive samples, respectively calculating sample distortion coefficients and sample contour coefficients under different preset cluster numbers, and drawing distortion coefficient curves for associating the distortion coefficients with the cluster numbers and contour coefficient curves for associating the contour coefficients with the cluster numbers; Combining the elbow inflection point of the distortion coefficient curve and the maximum point of the profile coefficient curve to determine the number range of candidate optimal clusters; and acquiring experimental priori knowledge, and determining the optimal cluster number from the candidate optimal cluster number range based on the experimental priori knowledge.
  7. 7. The method according to claim 1, wherein performing an unsupervised clustering process on the standard molecular positive samples based on the optimal cluster number to obtain classified positive samples comprises: Acquiring a preselected unsupervised clustering model, and inputting the standard molecular positive sample into the unsupervised clustering model to divide the molecular positive sample into a plurality of clustering categories corresponding to the optimal clustering number; Extracting the height characteristics of the conductance steps in the corresponding conductance trace line according to each cluster type; And respectively labeling the plurality of clustering categories based on the extracted conductivity step height characteristics, and integrating the plurality of labeled clustering categories to obtain the classification positive sample.
  8. 8. An apparatus for unsupervised identification of single molecule charge transport comprising: the acquisition module is used for acquiring at least two groups of experimental data sets, wherein the experimental data sets comprise blank substrate experimental data sets and molecular substrate experimental data sets; The processing module is used for respectively carrying out standardization processing on the two groups of experimental data sets to obtain a blank dimension feature vector and a molecular dimension feature vector; The learning module is used for taking the blank dimension feature vector as a reference sample, and sequentially carrying out double-branch contrast learning processing and feature fusion processing on the blank dimension feature vector and the molecular dimension feature vector to obtain blank global features and molecular global features; The screening module is used for taking the blank global feature as a reference, and sequentially carrying out single classifier pre-screening treatment and two classifier fine screening treatment on the molecular global feature to obtain a molecular positive sample, wherein the molecular positive sample comprises a plurality of conductive trace lines with conductive step features; the determining module is used for carrying out standardization processing on the molecular positive sample to obtain a standard molecular positive sample and determining the optimal cluster number; And the classification module is used for carrying out unsupervised clustering treatment on the standard molecule positive samples based on the optimal cluster number to obtain classification positive samples, wherein the classification positive samples are classification results of classifying a plurality of electric conduction track lines according to the height of the electric conduction steps.
  9. 9. The single-molecule charge transmission unsupervised identification device is characterized by comprising a memory and at least one processor, wherein the memory is stored with instructions; At least one of the processors invokes the instructions in the memory to cause the single-molecule charge transport unsupervised identification apparatus to perform the steps of the single-molecule charge transport unsupervised identification method according to any one of claims 1-7.
  10. 10. A computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the steps of the single molecule charge transport unsupervised identification method according to any one of claims 1-7.

Description

Single-molecule charge transmission unsupervised identification method, device, equipment and storage medium Technical Field The present invention relates to the field of single molecule electronics, and in particular, to a method, apparatus, device, and storage medium for unsupervised identification of single molecule charge transfer. Background The single-molecule electronics is a core research direction of nano science and technology, mainly researches electron transport properties and related physical and chemical processes at a single-molecule scale, along with the iteration of precision measurement technologies such as a mechanical controllable cracking technology (MCBJ), a scanning tunneling microscope cracking technology (STM-BJ) and the like, accurate measurement of conductance at the single-molecule scale is realized, the technology can fix target molecules between two poles through anchoring groups, can capture electric transport behavior signals of single molecules in real time, and can accurately extract effective conductance information from mass experimental measurement data due to extremely strong randomness in the single-molecule connection process, so that the technology becomes a core technical bottleneck in the field of single-molecule charge transmission research. The current single-molecule charge transmission data classification technology is characterized in that event identification and classification are carried out around conductance-distance curves obtained by STM-BJ, MCBJ and other splitting technologies, the whole is gradually developed from a traditional statistical method to a machine learning and deep clustering method, wherein the traditional statistical method takes a one-dimensional conductance histogram and a two-dimensional conductance-distance statistical diagram as core tools, the traditional statistical method can only qualitatively present data distribution rules, the problems of small probability effective event masking by dominant data, quantitative analysis capability deficiency and poor anti-interference performance exist, the traditional machine learning method is applicable to a scene limited by relying on manual screening characteristics, the deep clustering method is difficult to capture nonlinear structural characteristics in the data, the deep clustering method is used for completing characteristic extraction and clustering by means of a self-encoder, gao Weida volume data can be adapted, the characteristic learning lacks physical pertinence aiming at single-molecule data, the model design is high in subjectivity degree and poor in interpretation, the problem of poor quantitative analysis capability exists in the scene with large tunneling data occupation ratio and small effective sample volume splitting is extremely easy, and the traditional machine learning method is applicable to a special experiment with excessive clustering algorithm such as a DAK algorithm. In summary, the prior art adopts a single molecular data driving model for training, cannot mine the commonality and difference characteristics of the multi-molecular experimental data, has weak data utilization rate and model generalization capability, simultaneously has the problems of insufficient characteristic extraction accuracy, easiness in interference of effective conductive traces by tunneling data, no general end-to-end automatic analysis frame, low classification refinement degree and the like, and is difficult to meet the accurate analysis requirement of complex single molecular charge transmission data, and therefore, the prior art needs to be improved and improved. Disclosure of Invention In order to overcome the defects of the prior art, the invention aims to provide the single-molecule charge transmission unsupervised identification method, which solves the problems of low data utilization rate and weak generalization capability caused by training only using single-molecule data in the prior art, and avoids the interference of subjective errors on analysis results. The invention provides an unsupervised identification method for single-molecule charge transmission, which comprises the steps of obtaining at least two groups of experimental data sets, respectively carrying out standardized processing on the two groups of experimental data sets to obtain a blank dimension feature vector and a molecule dimension feature vector, sequentially carrying out double-branch contrast learning processing and feature fusion processing on the blank dimension feature vector and the molecule dimension feature vector by taking the blank dimension feature vector as a reference sample to obtain a blank global feature and a molecule global feature, sequentially carrying out single-classifier pre-screening processing and classifier fine screening processing on the molecule global feature by taking the blank global feature as a reference, obtaining a molecule positive sample, wherein the molecule positi