Search

CN-121999921-A - Extraction method, extraction system and extraction equipment for key segments of metal organic framework material

CN121999921ACN 121999921 ACN121999921 ACN 121999921ACN-121999921-A

Abstract

The invention provides an extraction method, an extraction system and extraction equipment for key fragments of a metal organic framework material. The extraction method comprises the steps of obtaining a crystal structure file of a metal organic framework material, converting the crystal structure into a crystal diagram, obtaining a key segment subgraph according to an extraction rule, identifying a hydrogen deficiency site based on the key segment subgraph, obtaining a hydrogen deficiency site coordination environment, supplementing hydrogen atom information, obtaining a key segment, associating the crystal structure of the metal organic framework material with the key segment, obtaining a database, and completing the extraction of the key segment of the metal organic framework material. The technical scheme provided by the invention can accurately and effectively extract the key fragments of the metal organic frame material, thereby establishing a metal organic frame structure database for calculating and analyzing an electronic structure and screening the metal organic frame material with special performance.

Inventors

  • WANG ZHUOZHENG
  • XIE QIMING
  • PAN FENG
  • LUO YI
  • LI SHUNNING
  • GONG QIHAN
  • LIN HAI
  • Chen Zhudan
  • ZHANG MINGZHENG
  • CHI KEBIN

Assignees

  • 中国石油天然气股份有限公司
  • 北京大学深圳研究生院

Dates

Publication Date
20260508
Application Date
20241107

Claims (12)

  1. 1. A method for extracting key fragments of a metal organic framework material, comprising the steps of: s1, acquiring a crystal structure file of a metal organic framework material; S2, converting the crystal structure into a crystal diagram; S3, obtaining a key segment subgraph according to the crystal diagram and the extraction rule; S4, identifying a hydrogen deficiency site based on the key segment subgraph, obtaining a coordination environment of the hydrogen deficiency site, and supplementing hydrogen atom information to obtain a key segment; and S5, correlating the crystal structure of the metal organic framework material with the key fragments to obtain a database, and completing extraction of the key fragments of the metal organic framework material.
  2. 2. The extraction method according to claim 1, wherein in step S1, the crystal structure file is obtained from a CSD database by using a python-ASE library.
  3. 3. The extraction method according to claim 1 or 2, wherein the crystal structure file comprises at least parameters of unit cell parameters, space group, element type, atomic coordinates.
  4. 4. The extraction method according to claim 1, wherein in step S2, the crystal map is composed of nodes representing atoms and edges representing bonds between two atoms; wherein when the distance between two atoms is 1.25 times or less the covalent radius of the two atoms, bonding between the two atoms is considered to be connected by one side to the node corresponding to the two atoms.
  5. 5. The extraction method according to claim 1 or 4, wherein in step S2, the crystal map includes properties of nodes, edges of the crystal map after expansion; The attributes of the nodes comprise atom types, atom covalent radii and atom position coordinates; the attributes of the edge include whether bonds are formed between atoms.
  6. 6. The extraction method according to claim 1, wherein in step S3, the extraction rule comprises extracting an organic ligand or a cluster of metal nodes comprising a partially coordinated environment.
  7. 7. The extraction method according to claim 6, wherein the step of extracting the organic ligand comprises the steps of identifying metal nodes in the crystal graph, extracting all the communication graphs according to a neighbor matrix after deleting the metal nodes and adjacent edges thereof, and obtaining the organic ligand structure in the crystal of the organic metal framework material after graph de-duplication; The method for extracting the metal node cluster containing the partial coordination environment comprises the steps of identifying the metal cluster in a crystal diagram, judging whether three adjacent atoms before metal atoms are located on an aromatic ring, if so, keeping an aromatic ring structure, cutting outside the aromatic ring, and if not, cutting at the adjacent atoms of the third layer, and obtaining the metal node cluster containing the partial coordination environment after graph de-weighting, wherein the aromatic ring comprises benzene rings, naphthalene rings, pyridine, pyrrole, imidazole and thiophene.
  8. 8. The extraction method according to claim 1, wherein in step S3, the key segment subgraph includes attributes of nodes and attributes of edges; The attributes of the nodes comprise atom types, atom covalent radiuses, atom coordinate positions and whether the nodes are cutting point atoms or not; The attributes of the edge include whether or not a bond is formed between atoms, and a bond length, a bond angle, and a plane angle in the case of bonding.
  9. 9. The extraction method according to claim 1 or 6, wherein in step S4, the completion of the hydrogen atom information includes determining the number of hydrogen atoms missing on a single atom, calculating the position coordinates of the hydrogen atoms based on the spatial relationship, wherein the atomic bond length uses the sum of the covalent radii of two atoms.
  10. 10. The extraction method according to claim 1 or 9, wherein, in step S4, the completion of the hydrogen atom information is performed as follows: (1) Supplementing hydrogen to the extracted organic ligand, namely converting a key segment subgraph into an SMILES text through RDkit, comparing a specific functional group graph structure to identify whether a cutting position is a metal site coordination group or a matched middle non-aromatic ring part, and if so, judging that a hydrogen-deficient site exists in an atom at the cutting position; Then, calculating the vector sum v1 from the adjacent atoms to the hydrogen-deficient atoms according to the adjacent atoms of the hydrogen-deficient atoms, determining the direction vector v2 from the hydrogen-deficient atoms to the hydrogen atoms according to the number of hydrogen supplements, and adding the hydrogen atoms along the vector, wherein the included angle between v1 and v2 is 0 degrees if the number of hydrogen supplements is equal to 1, the included angle between v1 and v2 is 60 degrees if the number of hydrogen supplements is equal to 2, the included angle between v1 and v2 is 71 degrees if the number of hydrogen supplements is equal to 3, and the bond length between the hydrogen-deficient atoms and the hydrogen atoms is set to be the sum of covalent radiuses of the hydrogen-deficient atoms and the hydrogen atoms; (2) Hydrogen supplementing the metal node cluster containing partial coordination environment, namely identifying whether the coordination number of atoms at the cutting part is full or not according to the chemical eight-corner rule, and if not, judging that the atoms at the cutting part have hydrogen deficiency sites; Then, calculating the vector sum v1 from the adjacent atoms to the hydrogen-deficient atoms according to the adjacent atoms of the hydrogen-deficient atoms, determining the direction vector v2 from the hydrogen-deficient atoms to the hydrogen atoms according to the number of hydrogen supplements, and adding the hydrogen atoms along the vector, wherein the included angle between v1 and v2 is 0 degrees if the number of hydrogen supplements is equal to 1, the included angle between v1 and v2 is 60 degrees if the number of hydrogen supplements is equal to 2, the included angle between v1 and v2 is 71 degrees if the number of hydrogen supplements is equal to 3, and the bond length between the hydrogen-deficient atoms and the hydrogen atoms is set as the sum of covalent radiuses of the hydrogen-deficient atoms and the hydrogen atoms; Wherein the metal site ligand group comprises carboxyl, amino, hydroxyl and sulfonic group; the ligand intermediate non-aromatic ring moiety includes saturated carbon-carbon bonds and/or unsaturated carbon-carbon bonds.
  11. 11. A system for extracting critical segments of a metal organic framework material, comprising: the acquisition module is used for acquiring a crystal structure file of the metal organic framework material; The data conversion module is used for converting the crystal structure into a crystal diagram according to the crystal structure file of the metal organic framework material acquired by the acquisition module; the extraction module is used for obtaining a key segment subgraph according to the crystal and the extraction rule; The completion module is used for identifying a hydrogen deficiency site based on the key segment subgraph to obtain a coordination environment of the hydrogen deficiency site and completing hydrogen atom information to obtain a key segment; and the association module is used for associating the crystal structure of the metal organic framework material with the key fragments to obtain a database.
  12. 12. An extraction apparatus for critical segments of a metal organic framework material, comprising: A memory for storing a computer program; and a processor for implementing the extraction method of the key fragments of the metal organic framework material when executing the computer program.

Description

Extraction method, extraction system and extraction equipment for key segments of metal organic framework material Technical Field The invention relates to a method, a system and equipment for extracting key fragments of a metal organic framework material, and belongs to the technical field of chemical structure analysis. Background Metal organic framework materials are a class of porous framework compounds that are self-assembled from metal ions or metal clusters through strong coordination bonds interacting with organic ligands. The metal-organic framework material has various structures, novel topology types and excellent performances, and rapidly develops into research hotspots in coordination chemistry. 202311501785.6 Discloses a method for mining workflow by MOFs fingerprint features, and a general scheme of crystal structure fingerprint expression and structural defects of MOFs secondary structural units and ligands is extracted through metal clustering operation, iterative directed acyclic graph, ligand cutting and other steps. The method comprises the steps of constructing super cells by CIF files, identifying atomic links, clustering metals, iterating 4 layers of directed acyclic graphs by taking the metals as 0 nodes to cut metal SBU, judging and cutting out organic ligands according to connectivity, finally applying MOFUN algorithm to cut MOFs by the ligands, and forming a universal scheme for extracting MOFs crystal structure fingerprint expression and structural defects. The method has limited extraction structure range, only comprises an independent SBU, an independent ligand and a defective MOFs, and does not judge the type of a functional group at a cutting part, so that the hydrogen supplementing operation is carried out on a hydrogen-deficient structure to balance charges, and therefore, the quantum chemical calculation analysis cannot be directly carried out. BJ Bucior et al (BJ Bucior,AS Rosen,M Haranczyk,Z Yao,RQ Snurr;Identification Schemes for Metal–Organic Frameworks To Enable Rapid Search and Cheminformatics Analysis;Crystal Growth&Design,2019,Vol 19,Issue 11) provides a metal-organic framework decomposition method by decomposing the MOF into its building blocks and underlying topology network algorithms comprising (1) identifying and analyzing chemical bonds, first assigning adjacency matrices from the crystal structure using the simple distance truncation method of Open Babel, (2) classifying key segments, which generally defines inorganic building blocks as metal-oxide clusters (metal and oxide clusters in the MOF are considered as separate inorganic segments, the rest as organic segments) and node-connector methods (simplifying the MOF structure into a topology network consisting of inorganic nodes and organic connectors, nodes and connectors are treated as separate molecular segments), decomposing the MOF into inorganic nodes and organic connectors, respectively, as separate molecular segments, (3) applying single-node algorithms (treating each inorganic node as a vertex in the topology network and simplifying its surrounding organic connectors) and full-node algorithms (considering the detailed connection situation within the nodes and converting it into a geometric center (centroid) for further topology simplification), analyzing the MOF structure as a further topology simplification), creating a topology identifier by software, and creating a topology identifier of the MOF and a topology identifier of the system by means of the software, MOFid, and the topology identifier of the system being further 35 s, and the topology identifier of the network, and the topology identifier of the system, and the topology identifier of the topology identifier, respectively, which is determined by MOFid. The main purpose of this approach is to encode the MOF structure as a standardized identifier for fast searching and data mining. Meanwhile, the technical extraction structure is limited in scope, and the hydrogen supplementing operation is not carried out so that the charge is balanced, thus the quantum chemical calculation analysis cannot be carried out independently. Prosun Halder et al (Prosun Halder,Prerna,and Jayant K.Singh;Building Unit Extractor for Metal–Organic Frameworks;J.Chem.Inf.Model.;2021,61,5827-5840) discloses a platform named mBUD for extracting Building Units (BUs) from Metal Organic Frameworks (MOFs), comprising metal nodes, organic linkers and functional groups. The method is characterized in that firstly, crystal structure data of MOFs are imported and preprocessed to identify and remove solvent molecules or ions possibly existing in the structure, then chemical bonds and structural characteristics in the MOFs are identified by establishing a bonding network of crystals, the MOF structure is decomposed by using a Grid Hash (GH) algorithm, the whole crystal structure is divided into a plurality of small grid units, bonding relations in each grid are identified accordi