CN-122024911-A - Method for identifying molecular structure and constructing reaction network based on atomic coordinates
Abstract
The invention discloses a method for identifying a molecular structure and constructing a reaction network based on atomic coordinates, and belongs to the technical field of material simulation. The method for identifying the molecular structure based on the atomic coordinates comprises the steps of obtaining track data, transforming atoms outside a box into the box through minimum mirror image convention, dividing grids according to space cut-off radius, accurately screening potential bonding atoms, avoiding redundancy of two-by-two distance calculation of atoms of a whole system, and combining depth-first search to quickly divide atomic clusters to realize molecular structure identification. The method only depends on basic data such as atomic coordinates and the like, does not depend on a simulation software post-processing module, and breaks through the limitation that the prior art only adapts to a single simulation system. The invention relates to a method for constructing a reaction network, which comprises the steps of comparing molecular structure descriptions frame by frame, counteracting and filtering ineffective reactions through forward and reverse reactions to obtain net reactions, counting the number of times of the net reactions to determine the probability of the reactions so as to quantify the tendency of the reactions, and constructing the reaction network by taking the molecular structure as a node and the reaction frequency as an edge weight.
Inventors
- WANG TAO
- LI WEIYI
- LIU RONG
- Guo Wanxiao
- YUAN XIYAO
- FANG YILIN
- WANG JINTAO
Assignees
- 中国人民解放军火箭军工程大学
Dates
- Publication Date
- 20260512
- Application Date
- 20251121
Claims (6)
- 1. The method for identifying the molecular structure based on the atomic coordinates is characterized by comprising the following steps: s1, acquiring track data, wherein the track data comprise atomic coordinates, atomic types and simulated box data; s2, carrying out coordinate transformation on all the atomic coordinates by utilizing the minimum mirror image convention to obtain package coordinates; the calculation formula of the minimum mirror image convention is: Wherein, the Is an atom And atoms Is defined by the three-dimensional position vector of (c), Is an atom Pointing to atoms Is used to determine the minimum mirror vector of (c), Is the length vector of the periodic box, Rounding a function for a number; S3, dividing grids according to space cutoff radius based on the package coordinates obtained in the step S2, distributing the package coordinates to corresponding grids, and calculating minimum mirror image distances between atoms in the grids and other atoms in the grids nearby the grids , ; S4, comparing the minimum mirror distance of the two atoms with the chemical bond distance cutoff value, and judging that the two atoms form a bonding relationship when the minimum mirror distance is less than or equal to the corresponding chemical bond distance cutoff value, otherwise, judging that the two atoms do not form the bonding relationship; And S5, traversing the bonding relation of all atoms by adopting a depth-first search algorithm (DFS), dividing an atom set which is mutually communicated through the bonding relation into independent clusters, wherein each cluster corresponds to one molecular entity, and generating the structural description of the molecular entity by analyzing the atom types, the number and the bonding relation in the clusters to obtain a molecular structure.
- 2. A method for identifying molecular structures based on atomic coordinates according to claim 1, wherein in step S3, the space cutoff radius is equal to the maximum value of the bond distance cutoff values of all atoms.
- 3. A method of identifying molecular structures based on atomic coordinates according to claim 1, wherein in step S5, a SMILES descriptor is generated as a structural description of the molecular entity by RDKit software package to characterize the molecular structure.
- 4. The method for constructing the reaction network based on the atomic coordinates is characterized by comprising the following steps: A method of identifying molecular structure based on atomic coordinates according to any one of claims 1-3, identifying the structure description and molecular structure of each molecular entity in the trajectory data frame by frame; comparing the structure descriptions of the molecular entities in two continuous frames frame by frame, if the structure description of any molecular entity in the current frame is different from the structure description of the molecular entity corresponding to the previous frame, judging that the molecular entity corresponding to the current frame is a product, judging that the molecular entity corresponding to the previous frame is a reactant, and recording the reaction relation between the reactant and the product; Counting the cumulative number of the same net reactions in all time windows as the reaction frequency of each net reaction, and taking the ratio of the cumulative number of a certain net reaction to the total cumulative number of all net reactions as the reaction probability of the net reaction; And traversing all net reactions, and merging the same graph nodes to obtain a reaction network.
- 5. A method for constructing a reaction network based on atomic coordinates according to claim 4 wherein the time interval between consecutive frames=2000 fs/adjacent frames of the trajectory is set.
- 6. A method of constructing a reaction network based on atomic coordinates according to claim 4 wherein the reaction network is visualized using HTML.
Description
Method for identifying molecular structure and constructing reaction network based on atomic coordinates Technical Field The invention belongs to the technical field of material simulation, and particularly relates to a method for identifying a molecular structure and constructing a reaction network based on atomic coordinates. Background In the front-end fields of material design, catalytic optimization, biopharmaceutical industry and the like, obtaining panoramic information of an atomic-molecular-reaction network is a core requirement. In recent years, a multi-scale molecular simulation technology has made breakthrough progress, and technologies represented by a Density Functional Theory (DFT), reaction molecular dynamics (ReaxFF-MD) and machine learning molecular dynamics (ML-MD) can realize conventional calculation of a femtosecond to nanosecond time span and a system of hundreds to millions of atoms, and can generate massive track data under the condition of completely no experimental priori, so that a rich data basis is provided for reaction mechanism research. However, the prior art has two key bottlenecks that firstly, a track file only records atomic coordinates and element types at discrete moments, chemical semantic information such as molecular structures, chemical bond changes and the like cannot be directly output, and reaction related data can be extracted only by relying on additional post-processing flows, and secondly, a traditional post-processing scheme has significant limitations, and particularly, the traditional post-processing scheme not only highly depends on a molecular bond level calculation module and a pre-processing product database of simulation software, but also can only adapt to simulation data of ReaxFF anti-stress fields, cannot be compatible with track files of other mainstream simulations such as DFT, ML-MD and the like, so that the application range is limited, and the efficient complex reaction network construction capability is lacked, or manual screening of reaction events, manual finishing of reaction relations are needed, or high-calculation force calculation is needed, so that the establishment period of a reaction network is long, the efficiency is low, and the requirement of high-throughput material research and development on rapid reaction analysis is difficult to meet. In summary, in the prior art, it is difficult to establish a reaction network by using simulation data based on atomic coordinates and the like, and the time for establishing the reaction network is long and the efficiency is low. Disclosure of Invention The invention aims to solve the technical problems in the prior art, and provides a method for identifying a molecular structure and constructing a reaction network based on atomic coordinates, which has the advantages of high design and treatment efficiency, wide application range and convenience in popularization and use. In order to solve the technical problems, the invention adopts the following technical scheme: a method for identifying molecular structure based on atomic coordinates, comprising the steps of: s1, acquiring track data, wherein the track data comprise atomic coordinates, atomic types and simulated box data; s2, carrying out coordinate transformation on all the atomic coordinates by utilizing the minimum mirror image convention to obtain package coordinates; the calculation formula of the minimum mirror image convention is: Wherein, the Is an atomAnd atomsIs defined by the three-dimensional position vector of (c),Is an atomPointing to atomsIs used to determine the minimum mirror vector of (c),Is the length vector of the periodic box,Rounding a function for a number; S3, dividing grids according to space cutoff radius based on the package coordinates obtained in the step S2, distributing the package coordinates to corresponding grids, and calculating minimum mirror image distances between atoms in the grids and other atoms in the grids nearby the grids ,; S4, comparing the minimum mirror distance of the two atoms with the chemical bond distance cutoff value, and judging that the two atoms form a bonding relationship when the minimum mirror distance is less than or equal to the corresponding chemical bond distance cutoff value, otherwise, judging that the two atoms do not form the bonding relationship; And S5, traversing the bonding relation of all atoms by adopting a depth-first search algorithm (DFS), dividing an atom set which is mutually communicated through the bonding relation into independent clusters, wherein each cluster corresponds to one molecular entity, and generating the structural description of the molecular entity by analyzing the atom types, the number and the bonding relation in the clusters to obtain a molecular structure. Further, in step S3, the space cutoff radius is equal to the maximum value of the chemical bond distance cutoff values of all atoms. Further, in step S5, a SMILES descriptor is generate