Search

CN-121996217-A - Automatic parallel development device and method for seismic data processing

CN121996217ACN 121996217 ACN121996217 ACN 121996217ACN-121996217-A

Abstract

The invention provides an automatic parallel development device and method for seismic data processing, and belongs to the field of seismic exploration. The device comprises a gather data construction module, an RDD construction module, a C++ base class construction module, a JNI interface module, a data parallel input module, a module development engineering construction plug-in and a module development engineering construction plug-in, wherein the gather data construction module is used for acquiring functions and transfer functions, the RDD construction module is used for carrying out data processing, the C++ base class construction module is used for receiving seismic data transmitted by the RDD construction module through the gather data construction module, the JNI interface module is used for mutually transmitting data in a Scala code and data in a C++ code, the data parallel input module is used for calling the C++ base class construction module and the RDD construction module to process the seismic data and carrying out parallel division on the seismic data, the data parallel output module is used for outputting the processed calculation result data in parallel, and the module development engineering construction plug-in is used for constructing a parallel development template. The invention avoids the parallel process of the Spark learning and the data reading and outputting of module developers, and effectively improves the module development efficiency.

Inventors

  • KANG YONGGAN
  • LI ZHENJIE
  • ZHU HAIWEI
  • CHEN HAIYANG
  • ZHAO DEMING

Assignees

  • 中国石油化工股份有限公司
  • 中石化石油物探技术研究院有限公司

Dates

Publication Date
20260508
Application Date
20241107

Claims (10)

  1. 1. An automated parallel development device for seismic data processing, the device comprising: the gather data construction module is used for acquiring a function and a transfer function; The RDD construction module is used for carrying out data processing; The C++ base class building module is used for receiving the seismic data transmitted by the RDD building module through the channel set data building module; the JNI interface module is used for transmitting the data in the Scala code and the data in the C++ code mutually; The data parallel input module is used for calling the C++ base class construction module and the RDD construction module to process the seismic data and divide the seismic data in parallel; The data parallel output module is used for outputting the processed calculation result data in parallel; the module development engineering construction plug-in is used for constructing the parallel development templates.
  2. 2. The automatic parallel development device for seismic data processing according to claim 1, wherein the gather data construction module is constructed by an elastic distributed data set RDD, the minimum record unit of the elastic distributed data set RDD is a gather, a key-value pair is formed by the gather number of the gather and the data body of the gather, the unique identification number of the gather is recorded by the gather number key, and the data body of the gather is recorded by the gather data value.
  3. 3. The automated parallel development device for seismic data processing according to claim 1 or 2, wherein the gather data constructing module uses a map function as a core function.
  4. 4. The automated parallel development device for seismic data processing according to claim 2, wherein the seismic data in the gather is composed of a plurality of data packets, each data packet is composed of a header and a trace data body, the header records acquisition information of the trace data, including coordinate positions, sampling intervals, sampling lengths, grid information, and the trace data body records amplitude values of sampling points.
  5. 5. The automated parallel development device for seismic data processing according to claim 4, wherein the header and the trace data volume are stored independently, the header is stored in the form of byte data, and the trace data volume is stored in the form of a floating point array.
  6. 6. The automated parallel development device for seismic data processing of claim 1, wherein the operations performed by the RDD construction module comprise: constructing an elastic distributed data set RDD, namely providing a basic class and a template, and creating the elastic distributed data set RDD from a seismic data source; Defining a template of parallel computation to realize user-defined parallel computation logic; data calculation, namely executing seismic data preprocessing, feature extraction and filtering calculation through the base class of data processing; Processing calculation result data of the elastic distributed data set RDD, namely summarizing, counting and converting the calculation result data; And outputting the RDD data of the elastic distributed data set, namely defining a base class of data output, and storing the calculation result data of the processed RDD data of the elastic distributed data set into a storage system.
  7. 7. The automated parallel development device for seismic data processing according to claim 1, wherein the c++ base class building module is configured to provide a core algorithm interface, and embed computing functions of the core algorithm directly into the c++ base class through the core algorithm interface.
  8. 8. The automatic parallel development device for seismic data processing according to claim 1, wherein the data transmission of the JNI interface module is completed through Record of the elastic distributed data set RDD, and the file header data transmission is completed while the complete transmission of the seismic data in the trace set is realized.
  9. 9. The automatic parallel development device for seismic data processing according to claim 1, wherein the module development engineering construction plug-in is based on an Eclipse development module construction template plug-in, and directly constructs a parallel development template, and the parallel development template comprises Spark parallel framework types, C++ core function integration types and parameter interface information setting files of modules.
  10. 10. A method of automated parallel development of seismic data processing implemented by an automated parallel apparatus of seismic data processing according to any one of claims 1 to 9, the method comprising: step one, constructing a Spark parallel frame, namely constructing the Spark parallel frame, and adopting a map function to realize parallel processing of seismic data through an elastic distributed data set RDD; Step two, processing input data, namely reading seismic data through a data parallel input module, and constructing an elastic distributed data set RDD; step three, the core calculation method processes data, namely obtaining seismic data of the RDD of the elastic distributed data set, integrating the core calculation method into C++ class, and calculating by using Spark parallel frame function; And step four, outputting the completed data, namely acquiring the seismic data of the RDD of the new elastic distributed data set processed by the core computing method through a data parallel output module, and outputting the seismic data to a distributed file system HDFS.

Description

Automatic parallel development device and method for seismic data processing Technical Field The invention belongs to the field of seismic exploration, and particularly relates to an automatic parallel development device and method for seismic data processing. Background With the continuous development of petroleum exploration and development business, exploration and development technology and information technology, the requirements of petroleum exploration and development on exploration precision and accuracy are continuously improved, and the development of geophysical technology, in particular the development of seismic exploration data acquisition technology, data processing and interpretation technology, leads to the exponential growth of seismic exploration data volume and calculated volume, and the data processing has greater and greater requirements on large-scale high-performance calculation. Large-scale high-performance computing not only requires great amounts of computer resources, including networks, storage, CPUs, etc., but also presents challenges to the design capabilities of algorithmic personnel. Some problems of parallel programs, especially performance problems, will not generally appear when the problem size is small or the computing clusters are not large, and many problems will be exposed when the problem size is large or the number of nodes of the clusters is increased. Such as the expansibility of parallel programs, fault tolerance to software and hardware, resource utilization efficiency, and the like. The Spark parallel technology provides an efficient solution for processing and analyzing big data, and is widely applied. However, since Spark technology is developed based on the Scala language, and the seismic data processing algorithm is mostly developed in programming languages such as C++, fortran and the like, the Spark parallel technology faces the difficult problem of high implementation difficulty in seismic exploration data processing. Disclosure of Invention The invention aims to solve the problems in the prior art, and designs an automatic parallel device for processing the seismic data aiming at the defects in the prior art, so as to realize the rapid integration and automatic parallelization of the seismic data processing algorithm. The invention is realized by the following technical scheme: In a first aspect of the present invention, there is provided an automated parallel development apparatus for seismic data processing, the apparatus comprising: the gather data construction module is used for acquiring a function and a transfer function; The RDD construction module is used for carrying out data processing; The C++ base class building module is used for receiving the seismic data transmitted by the RDD building module through the channel set data building module; the JNI interface module is used for transmitting the data in the Scala code and the data in the C++ code mutually; The data parallel input module is used for calling the C++ base class construction module and the RDD construction module to process the seismic data and divide the seismic data in parallel; The data parallel output module is used for outputting the processed calculation result data in parallel; the module development engineering construction plug-in is used for constructing the parallel development templates. The data structure module of the gather is further formed by an elastic distributed data set RDD, the minimum record unit of the elastic distributed data set RDD is the gather, a key-value pair is formed by the gather number of the gather and the data body of the gather, the unique identification number of the gather is recorded by the gather number key, and the data body of the gather is recorded by the gather data value. Further, the gather data construction module adopts a map function as a core function. Further, the seismic data in the track set consists of a plurality of data packets, each data packet consists of a track head and a track data body, the track head records acquisition information of the track data and comprises coordinate positions, sampling intervals, sampling lengths and grid information, and the track data body records amplitude values of sampling points. And the track head and the track data body are respectively and independently stored, the track head is stored in a byte data form, and the track data body is stored in a floating point array mode. Further, the operations performed by the RDD construction module include: constructing an elastic distributed data set RDD, namely providing a basic class and a template, and creating the elastic distributed data set RDD from a seismic data source; Defining a template of parallel computation to realize user-defined parallel computation logic; data calculation, namely executing seismic data preprocessing, feature extraction and filtering calculation through the base class of data processing; Processing calculation re