CN-122023590-A - Machine learning graph preparation method based on multidimensional storage structure and storage medium
Abstract
The invention discloses a machine learning graph preparation method based on a multidimensional storage structure, which comprises the steps of 1, inputting a three-dimensional seismic SEG-Y file, 2, checking file integrity, 3, identifying a SEGY-Y program, 4, identifying key files in the file, 5, forming a JSON file describing a file header, 6, converting the SEG-Y file into a multidimensional data volume format block and a JSON file, 7, storing the multidimensional data volume format block, 8, randomly denoising a multidimensional data volume, 9, normalizing, 10, setting a space node corresponding path, 11, setting a cost function, 12, establishing a global path iteration process, 13, determining a stop tracking condition, 14, extracting an ant colony history path, 15, forming a multidimensional data volume, 16, dividing a path label and an amplitude data block of the multidimensional data volume, and obtaining a slice graph. The method and the device can improve the generation efficiency of the seismic data graph and solve the problem of large effect difference.
Inventors
- LU ZIQING
- SHEN SHIAN
- WU XUEFEI
- SONG XIAOZHONG
- JIN DAN
- GAO YAOQUAN
- DUAN JIANHUA
Assignees
- 中煤科工西安研究院(集团)有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260112
Claims (7)
- 1. The machine learning graph preparation method based on the multidimensional storage structure is characterized by comprising the following steps of: step 1, inputting a three-dimensional earthquake SEG-Y file; step 2, checking the integrity of the SEGY file by using an SEG association standard tool; Step 3, identifying the SEGY-Y sequence of the SEGY-Y file; Step 4, identifying the position information, the type and the occupied byte number of the key file header XLine, inline, binX, binY, offset in the SEGY-Y file; step 5, forming a JSON file describing the file header according to the key file header identification result in the step 4, wherein the JSON file comprises the position, the byte number and the type of the key effective header in the SEG-Y file; Step 6, converting the SEG-Y file into a multidimensional data volume format block and a JSON file, wherein main body amplitude information in the SEG-Y file is converted into a multidimensional data volume format block with a self-defined size for storage, an earthquake volume amplitude format block is formed, and an index described by a SEG-Y file header is converted into JSON file metadata; step 7, storing the multidimensional data volume format block into a storage device, wherein the storage device is compatible with local storage and network storage; step 8, random denoising is applied to the multidimensional data volume format block in the storage device, the signal to noise ratio of each seismic trace is counted, and the seismic trace with the signal to noise ratio lower than 0 dB is removed; Step 9, normalizing the denoised seismic body amplitude format block obtained in the step 8 by utilizing the JSON file main index obtained in the step 6 and the amplitude characteristic three-dimensional format block stored in the step 7; Step 10, setting Is a space node The corresponding path is composed of a three-dimensional space node sequence: ; Step 11, setting a route path Cost function of (2) : In the formula, Characterization of pathways Path The gradient minimization represents the path optimization; step 12, setting the current iteration times A global path iteration process is established, Minor iterations The following relationship exists for each iteration: Wherein, the Representing spatial nodes To the point of Path In process of The pheromone concentration of the next iteration, In order to guide the evaporation rate of the information, Represents 10% of pheromone and ant number evaporated per iteration , The number of ants is the maximum; represents the first Ant only at The number of iterations directs the information enhancement, which is summed to represent the overall contribution of the ant colony: (4) wherein: to guide the enhancement strength constant; 、 Respectively represent The initial path and the best path at the time of the iteration, Initial path of secondary iteration Is that Optimal path for multiple iterations ; Step 13, determining the first step in step 12 Ant-only stop tracking conditions: (5) Applying the cost function given in step 11 Determining that the global path in step 12 is in Whether or not after a number of iterations should be terminated, when Minor iterations The minimum improvement of the sub-iteration optimal path cost function is smaller than Description of the embodiments After iteration, global optimization is achieved, tracking is stopped at the moment, and otherwise, iteration tracking is continued; Step 14, extracting an ant colony history path after iteration reaches a stop condition, and outputting a path labeling spatial index; step 15, using the path labeling spatial index as a spatial channel information of the multidimensional data in step 6, so that the normalized seismic volume data amplitude format block in step 9 is provided with a new spatial index channel, and combining the spatial index channel with the normalized seismic volume data amplitude format block to form a multidimensional data volume; and step 16, dividing the path labeling and amplitude data blocks of the multidimensional data body in the step 15 according to the machine learning training and reasoning sample scale parameters to obtain slice patterns in different directions, and completing pattern preparation work.
- 2. The machine learning graphic preparation method based on the multidimensional storage structure according to claim 1, wherein in step 6, the format block size is defined as an integer multiple of a common size of 32 3 、64 3 、128 3 or the like.
- 3. The method for preparing machine-learning graphics based on multidimensional storage structure according to claim 1, wherein in step 7, the format blocks stored locally are directly read and written through SATA, SAS, NVMe protocol, and the format blocks stored in the network are read and written through iSCSI and FC protocol.
- 4. The machine learning graph preparation method based on the multidimensional storage structure as claimed in claim 1, wherein in step 7, the storage device is divided into a physical layer, a logic layer and a functional layer, the physical layer is composed of a hard disk, an optical disk and a magnetic tape, the logic layer decomposes the physical layer into metadata storage and format block storage, the metadata and the format block are associated through index mapping, the functional layer realizes seismic data query and read-write operation, the query and the read-write operation firstly operate the metadata, and then the corresponding partial format block is called according to a specified index.
- 5. The machine learning graphic preparation method of claim 1, wherein the machine learning graphic is based on a multidimensional storage structure, Is set to be 0.1 to 0.2, The value of the water-based paint is 100, The value is 0.01.
- 6. An electronic device, the electronic device comprising: a memory for storing executable instructions; a processor configured to implement the method according to any one of claims 1 to 5 when executing the executable instructions or computer programs stored in the memory.
- 7. A computer readable storage medium storing executable instructions or a computer program, wherein the executable instructions when executed by a processor implement the method of any one of claims 1 to 5.
Description
Machine learning graph preparation method based on multidimensional storage structure and storage medium Technical Field The invention belongs to the technical field of geophysical exploration, and relates to a machine learning graph preparation method based on a multidimensional storage structure and a storage medium. Background In the machine learning model training process of the seismic data, two types of image data, namely an original slice image of a seismic data body and a labeling feature image containing professional geological feature information, need to be prepared in advance. The original image is usually obtained by slicing and extracting SEG-Y format seismic data proposed by the International exploration geophysicist society, and the labeling image can be obtained by manually labeling slices or automatically generating slices from attribute volumes. SEG-Y is the most widely used data format in the field of seismic exploration, formulated by the american society of geophysicists in 1973, and revised many times, revised version rev1 was introduced in 2002, and supplemental revision was released in 2019 china (SY/T5453-2019). The standard SEG-Y format is organized in units of seismic traces, and the data file contains three components, an EBCDIC/ASCII text header of 3200 bytes, a binary header of 400 bytes, and a trace record consisting of a trace header (240 bytes) and trace data (4 bytes float or integer). Because SEG-Y adopts a sequentially stored architecture design, the overall data index needs to be retrieved when extracting the seismic data slice graph, and therefore the efficiency is relatively low. In the field of general machine learning, a multidimensional storage structure is a widely used data storage form, such as HDF, netCDF, zarr, tileDB, parquet. A common feature of multidimensional storage structures is the separation of metadata from volumetric data. Metadata embedded with descriptive information (e.g., coordinate system, unit, operation record, etc.) is usually stored in JSON compatible format, and volume data may contain various body attribute information, dividing the data into equal-sized blocks (e.g., 643, 1283), and storing and compressing separately. In the multidimensional storage structure, metadata and volume data are mutually associated through indexes, local metadata can be searched based on a query target in machine learning training, and the local block storage of the read volume data after the indexes are generated is high in efficiency. By designing a multidimensional storage format, the SEG-Y data volume is reconstructed, so that the seismic data has multidimensional data structure characteristics, the graph sectioning efficiency and the graph segmentation efficiency are improved, and the method is one of the machine learning optimization directions of the seismic data. Machine learning of seismic data requires stable seismic data characteristics as input samples, whereas samples provided by manual interpretation have problems of inefficiency and great variability in expert experience. Technical schemes such as automatic or semi-automatic interpretation schemes of an isochronous leveler, a chaotic body, a variance body, an ant body and the like are also proposed in academia and industry by adopting a professional seismic attribute analysis technology as machine learning labeling. In many technical schemes, ant tracking technology is the practical application of ant colony optimization algorithm in the geophysical field. The technology realizes automatic tracking of small faults and cracks by simulating the behavior characteristics of a foraging path marked by the natural ant colony by utilizing pheromones. However, the method is mainly suitable for small-scale geological phenomenon recognition, and the difference of parameter setting can significantly influence the final calculation result, and repeated tests are needed to determine the proper ant colony scale and track the threshold parameters. Because of the poor stability of the tracking method, algorithms, parameters and procedures must be modified to provide a stable feature set for subsequent machine learning tasks. In summary, the conventional method based on SEG-Y storage has the problems of low reading and writing efficiency, low sample labeling efficiency and large effect difference, in the process of generating a machine learning feature set, the SEG-Y data format cannot store more than three-dimensional seismic data, space storage indexes are required to be repeatedly built, a large amount of data space is occupied, the SEG-Y sequential reading and writing mode cannot be adapted to a multithreaded CPU (Central processing Unit) and a GPU (graphics processing Unit), network multithreading reading and writing and version control are not supported, the volume expansion and transmission cost of files are increased, and the graph segmentation efficiency is low. In the process of marking the seismic featu