CN-122017959-A - Method for realizing three-dimensional median filter and storage device

CN122017959ACN 122017959 ACN122017959 ACN 122017959ACN-122017959-A

Abstract

The invention provides a method for realizing a three-dimensional median filter and a storage device, and belongs to the field of seismic exploration data processing. The method for realizing the three-dimensional median filter comprises the steps of 1, inputting seismic data, determining element dimensions and slicing dimensions, 2, initializing a cluster environment, setting calculation parameters, slicing data, 3, calculating data slicing aiming at each calculation node, 4, summarizing the data slicing of each calculation node, and 5, outputting a calculation result data set. The invention uses the logic of the algorithm as the center, does not need to consider a parallel mode and a series of elements biased to the system side such as data division, network communication, node management and the like caused by the parallel mode, reduces the cost of realizing the computing task, reduces various problems related to the use of computing hardware possibly caused by unreasonable operation of a program, and reduces the sensitivity and the dependence on the hardware.

Inventors

WANG MINGQIU
DUAN XINGBIAO
YANG XIANGSEN
KANG YONGGAN

Assignees

中国石油化工股份有限公司
中石化石油物探技术研究院有限公司

Dates

Publication Date: 20260512
Application Date: 20241112

Claims (10)

1.A method for realizing a three-dimensional median filter is characterized by comprising the following steps: step 1, inputting seismic data, and determining element dimension and fragment dimension; step 2, initializing a cluster environment, setting calculation parameters and cutting data fragments; Step 3, carrying out data slicing calculation aiming at each calculation node; step 4, summarizing the data fragments of all the computing nodes; and 5, outputting a calculation result data set.
2. The method for realizing the three-dimensional median filter according to claim 1, wherein the seismic data in the step 1 are three-dimensional data, and the storage mode of the three-dimensional data in the distributed data set RDD is defined to complete the construction of the distributed data set RDD.
3. The method for realizing the three-dimensional median filter according to claim 1, wherein the initializing cluster environment in the step 2 is dividing data slices according to the assignable computing resources in the cluster, and the computing resources comprise memory of a single node, memory bandwidth, network bandwidth and CPU performance.
4. The method of claim 1, wherein the calculation parameters set in the step 2 include radius parameters of a filtering window of the three-dimensional median filter, and the range of input data in element dimension and slice dimension.
5. The method according to claim 1, wherein the step 3 is to perform three-dimensional median filtering operation on each data slice in parallel by using a spark framework, and includes numbering data slices as split1, split2, split3, splitN, repartitioning the whole distributed data set RDD so that each partition contains only one data slice, and for each data point, sorting up the values by computing the values of all points in the neighborhood, and selecting the middle value as the new value of the data point.
6. The method of claim 5, wherein the computing is performed by using a JNI interface to transfer the fragmented data from the JVM to a local memory via a program in a native language.
7. The method of three-dimensional median filter implementation of claim 1, wherein the step 4 includes: confirming that the calculation results among the data fragments are not intersected in three dimensions; Acquiring the position of each data fragment in an original data set; all of the original data points in the original data set are replaced using the calculation results.
8. The method of claim 1, wherein step 4 further comprises slicing the calculated data, transferring the sliced data into the JVM from the local memory, and managing the JVM by the Spark framework to be incorporated into the RDD of the calculation result data set for collecting the calculation result.
9. The method of three-dimensional median filter implementation according to claim 1, wherein the step 5 includes outputting the calculation result data set RDD as a separate file and converting the calculation result data set RDD into a storage format within the application system.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores at least one program executable by a computer, which when executed by the computer, causes the computer to perform the steps of the method implemented by the three-dimensional median filter according to any one of claims 1-9.

Description

Method for realizing three-dimensional median filter and storage device Technical Field The invention belongs to the field of seismic exploration data processing, and particularly relates to a method for realizing a three-dimensional median filter and a storage device. Background The three-dimensional median filter is a signal processing tool with wide application, and is widely used in a plurality of fields such as seismic data processing, image processing and the like. The three-dimensional median filter is calculated by calculating the median of the data in the three-dimensional neighborhood of each data point by point and replacing the value of the data point with the median. In the case of seismic data processing, a typical three-dimensional data volume is a three-dimensional post-stack data volume. The trace set processed by various offset algorithms usually contains offset distance, azimuth angle and other information, at this time, a post-stack data body of a certain imaging space can be obtained through superposition processing, and a superposition section can be obtained by taking one two-dimensional line, so that the distribution condition of an underground structure is reflected. And determining the position of a sampling point in the whole three-dimensional data body by using the time domain three-dimensional post-stack data through the Inline, the XLine and the travel time. One three-dimensional median filtering requires the same calculation for all the sampling points. In recent years, along with the change of the exploitation situation of oil and gas resources in China, the seismic exploration target is gradually changed to complex, concealed, deep ultra-deep and various unconventional oil and gas reservoirs. These goals present new challenges to current seismic data processing methods. Meanwhile, as the area of the exploration work area increases, the spatial and temporal resolutions are continuously improved, and the mode of processing the actual work area data by a single workstation or a small cluster is no longer of practical significance in production. Modern seismic data processing pursues large clusters and high-performance computational processing methods. Programs running on large systems vary greatly in design and implementation from programs running on stand-alone or small clusters. For efficient implementation of three-dimensional median filters, processing is currently typically done from both software and hardware aspects. The processing mode of the software aspect is usually to design new data structures and algorithms so as to improve the parallel computing capacity of the program. The hardware aspect considers using various special or general purpose acceleration hardware such as GPU, FPGA and DSP to accelerate the computation. The method realizes good acceleration effect on a plurality of different types of local filters, and partial results are integrated into various commercial or industrial computing software, so that good practical application is achieved. But current solutions for high performance computing environments running on large clusters are relatively lacking. Spark framework has been used in a variety of commercial big data business scenarios, but its application in high performance scientific computing has not been widely developed. As a successful commercial big data computing framework, spark framework provides many functions not provided by traditional solutions in design, such as higher level data structure abstraction and richer computing libraries, so exploring three-dimensional median filter implementation based on Spark framework is of great significance for research of scientific computing modes in big data background. Disclosure of Invention The invention aims to solve the problems in the prior art, and provides a three-dimensional median filter implementation method based on APACHE SPARK computing frames aiming at the defects in the prior art. Because the Hadoop architecture provides related components of resource management and task planning scheduling at the bottom layer, the Spark framework oriented parallel handler does not need to have too many operations on the cluster management level. The invention is realized by the following technical scheme: In a first aspect of the present invention, a method for implementing a three-dimensional median filter is provided, which is characterized in that the method includes: step 1, inputting seismic data, and determining element dimension and fragment dimension; step 2, initializing a cluster environment, setting calculation parameters and cutting data fragments; Step 3, carrying out data slicing calculation aiming at each calculation node; step 4, summarizing the data fragments of all the computing nodes; and 5, outputting a calculation result data set. Further, the seismic data in the step 1 are three-dimensional data, and a storage mode of the three-dimensional data in the distributed data set RDD i