Search

CN-122019472-A - Method for extracting and storing radio astronomical electromagnetic interference data

CN122019472ACN 122019472 ACN122019472 ACN 122019472ACN-122019472-A

Abstract

The invention provides an extraction and storage method of radio astronomical electromagnetic interference data, which comprises the steps of obtaining original baseband data and storing the original baseband data in PSRFITS format, defining a storage data structure, extracting and obtaining an original electromagnetic interference data matrix and attribute data according to the original baseband data, basic observation attributes and file paths of PSRFITS format files, extracting and grouping the original electromagnetic interference data matrix to obtain a denoised data matrix and attribute data key value pair set which are polarized and separated into interference service frequency bands and are provided with frequency indexes and time indexes, storing each HDF5 file in the HDF5 file corresponding to a combination of a polarization mode POL and the interference service frequency bands, and storing a logic group matrix, a time index vector and the attribute data key value pair set together. The method of the invention realizes the conversion from the original baseband data to the standardized storage, and has the advantages of obviously reducing the data volume, and simultaneously taking into account the access speed, the analysis flexibility and the long-term maintainability.

Inventors

  • MA LING
  • LIU QI
  • DU QINGQING
  • WANG NA

Assignees

  • 中国科学院新疆天文台

Dates

Publication Date
20260512
Application Date
20251230

Claims (10)

  1. 1. The method for extracting and storing radio astronomical electromagnetic interference data is characterized by comprising the following steps: Step S1, acquiring original baseband data and storing the original baseband data in PSRFITS format; step S2, defining a storage data structure to comprise an original electromagnetic interference data matrix and an attribute data key value pair set, wherein the attribute data key value pair set comprises a plurality of items of attribute data, and extracting the original electromagnetic interference data matrix and the attribute data according to the original baseband data, basic observation attributes and file paths of PSRFITS format files for each PSRFITS format file; Step S3, further extracting and grouping the original electromagnetic interference data matrix to obtain a denoised data matrix with frequency indexes and time indexes and a corresponding attribute data key value pair set, wherein the denoised data matrix is polarized and interfered with the service frequency bands; And step S4, defining each HDF5 file as a combination corresponding to one polarization mode POL and interference service frequency band, taking the denoised and filtered logic group matrix and the time index vector under a plurality of space coordinates as a plurality of core data sets, and storing the core data sets and the corresponding attribute data key value pair sets in the corresponding HDF5 files.
  2. 2. The method for extracting and storing radio astronomical electromagnetic interference data according to claim 1, wherein the original baseband data are obtained by observing radio telescope at different azimuth angle and pitch angle positions, each space coordinate corresponds to a folder when stored in PSRFITS format, each folder stores only one PSRFITS format file, each PSRFITS format file comprises the original baseband data and basic observation attribute under one space coordinate, and the space coordinate is a combination of azimuth angle and pitch angle.
  3. 3. The method for extracting and storing radio astronomical electromagnetic interference data according to claim 1, wherein the stored data structure is defined as: , wherein P is a storage data structure, For a set of attribute data key-value pairs, Representing item 42 attribute data; The matrix is four-dimensional, and has a size of sub-integration number Ns×number Nt of sub-integration sampling points×number channel×number of frequency channels of polarization mode of original baseband data ; Extracting the original electromagnetic interference data matrix from the original baseband data refers to integrating heterogeneous data dispersed in PSRFITS format files to obtain the number channel×frequency channel number of polarization modes with the size of sub-integration number Nt×per sub-integration sampling point Nt×the original baseband data Is a matrix of raw electromagnetic interference data.
  4. 4. The method for extracting and storing electromagnetic interference data of radio astronomy according to claim 1, wherein in the attribute data key value pair set, the attribute data is obtained from the header file of PSRFITS files or from the file path.
  5. 5. The method for extracting and storing radio astronomical electromagnetic interference data according to claim 1, wherein the step S3 specifically includes: Step S31, separating the original electromagnetic interference data matrix D according to the dimension of the polarization mode to obtain a horizontal polarization electromagnetic interference data matrix D H and a vertical polarization electromagnetic interference data matrix D V ; Step S32, adding frequency index and time index to the horizontal polarization electromagnetic interference data matrix D H and the vertical polarization electromagnetic interference data matrix D V to obtain a horizontal polarization electromagnetic interference data matrix with frequency index and time index And a vertically polarized electromagnetic interference data matrix ; Step S33, according to the predefined multiple interference service frequency bands freq_ ranges, horizontally polarized electromagnetic interference data matrix with frequency index and time index And a vertically polarized electromagnetic interference data matrix The method comprises the steps of splitting the logical group matrixes into different interference service frequency bands, wherein the logical group matrixes are in one-to-one correspondence with the interference service frequency bands, and acquiring attribute data key value pair sets of each logical group matrix; Step S34, for each logical group matrix Denoising, filtering, and then carrying out denoising filtering on the logic group matrix And its time index vector is compressed.
  6. 6. The method according to claim 5, wherein in the step S32, when calculating the frequency index, the frequency index is determined by using the START frequency START, the STOP frequency STOP and the bandwidth chan_bw of each frequency channel in the basic observation attribute of the PSRFITS format file as the START point, the end point and the step bandwidth of the frequency index, and when calculating the time index, the time index STARTs from 1, the time index of each row is added with 1, and the time index of the last row is 0.
  7. 7. The method for extracting and storing electromagnetic interference data of radio astronomy according to claim 5, wherein in the step S33, the set of interference service frequency bands is , For the ith interference service band, each interference service band With a corresponding frequency space , For the ith interference service band Starting frequency index and ending frequency index of (c), frequency band selection function Frequency index representing the q-th frequency channel Whether or not to be in the ith interference service frequency band If yes, the frequency band selection function Is true; For any polarized electromagnetic interference data matrix with frequency index and time index, the ith interference service frequency band Is a logical group matrix of (a) The method comprises the following steps: , Wherein M is any polarized electromagnetic interference data matrix with frequency index and time index, , Representing a frequency selection function in an electromagnetic interference data matrix M A sub-matrix formed for the column of the true frequency index, Is a vector of time indexes, Is the first Time index of sampling point; At the same time, according to the collection of the interference business frequency bands Each interference service frequency band in the network For each interference service frequency band The attribute data related to the frequency is corrected to obtain each interference service frequency band Is a logical group matrix of (a) Attribute data key value pair sets of (a).
  8. 8. The method for extracting and storing radio astronomical electromagnetic interference data according to claim 1, wherein the denoising filtered logic group matrix Compressing the denoised and filtered logical group matrix and the time index vector thereof into the types of uint8 and uint32 respectively.
  9. 9. The method for extracting and storing radio astronomical electromagnetic interference data according to claim 1, wherein the file name of the HDF5 file is the name of an interference service frequency band.
  10. 10. The method for extracting and storing radio astronomical electromagnetic interference data according to claim 1, wherein the HDF5 file includes a plurality of core data sets, each denoised and filtered logical group matrix and each time index vector are respectively used as a core data set, a name of each core data set includes a space coordinate and a timestamp, and a key value pair set of attribute data is stored in an attribute item of each core data set in the HDF5 file.

Description

Method for extracting and storing radio astronomical electromagnetic interference data Technical Field The invention relates to the technical field of radio astronomical observation, in particular to a method for extracting and storing radio astronomical electromagnetic interference data, which aims at multi-source heterogeneous radio astronomical electromagnetic compatibility data and is used for realizing conversion, grouping and noise filtering from original PSRFITS-format observation data to structured HDF5 data. Background For the field of radioastronomy, radio frequency interference (RFI, radio Frequency Interference) refers to other unwanted signals received by a radio telescope in addition to the observation target signals. Because the signals received by the radio telescope are extremely weak, they are extremely susceptible to Radio Frequency Interference (RFI) [1] from terrestrial radio transmitting devices (e.g., base stations, radar, satellite, etc.). To ensure scientific validity of astronomical data, strict electromagnetic compatibility (EMC) analysis must be performed on the observed data to identify, quantify and reject these artifacts [2]. Therefore, efficient and accurate electromagnetic compatibility data processing has become an indispensable key link in the field of radioastronomy. The current radio astronomical electromagnetic compatible data processing mainly faces three technical bottlenecks. Firstly, because the original FITS data contains information of four-dimensional baseband data (sub-integration×sampling points×polarization×channels) and coordinate parameters (AZ/EL) scattered at different levels, the data format has a significant problem of isomerism, so that data of cross-equipment and cross-observation period are difficult to integrate effectively. In the aspect of noise interference processing, the traditional fixed threshold method is difficult to adapt to the dynamic characteristics of different frequency points, and researches show that the accurate noise reference [3] can be established by relying on sliding window standard deviation analysis (such as find_clean_segment function). Finally, the processing speed is severely restricted by the ungrouped full-band scanning mode in terms of analysis efficiency, and the analysis efficiency can be remarkably improved by adopting a predefined band grouping model (for example, 935-948MHz mobile communication band). Reference is made to: [1] wang, liu Ji, liu, su Xiaoming automated electromagnetic environment monitoring system software development and implementation astronomical research and techniques, 2020. [2] Liu Ji, chen Mao, li Ying et al radio astronomical site electronic device electromagnetic radiation assessment [ J ]. Astronomical research and techniques, 2015, 12 (3): 292-298. [3] Liu Ji, feng Dongdong, cai Minghui, etc. A signal-to-noise separation method based on neighbor comparison [ J ]. Scientific technology and engineering 2021,21 (09): 3656-3661. Disclosure of Invention The invention aims to provide an extraction and storage method of radio astronomical electromagnetic interference data, so as to realize the conversion process from original baseband data to standardized storage, and the access speed, analysis flexibility and long-term maintainability are considered while the data volume is obviously reduced. In order to achieve the above object, the present invention provides a method for extracting and storing radio astronomical electromagnetic interference data, comprising: s1, acquiring original baseband data and storing the original baseband data in PSRFITS format; S2, defining a storage data structure to comprise an original electromagnetic interference data matrix and an attribute data key value pair set, wherein the attribute data key value pair set comprises a plurality of items of attribute data, and extracting the original electromagnetic interference data matrix and the attribute data according to the original baseband data, basic observation attributes and file paths of PSRFITS format files for each PSRFITS format file; s3, further extracting and grouping the original electromagnetic interference data matrix to obtain a denoised data matrix with frequency indexes and time indexes and a corresponding attribute data key value pair set, wherein the denoised data matrix is polarized and interfered with service frequency bands; And S4, defining each HDF5 file as a combination corresponding to one polarization mode POL and interference service frequency band, taking the denoised and filtered logic group matrix and the time index vector under a plurality of space coordinates as a plurality of core data sets, and storing the core data sets and the corresponding attribute data key value pair sets in the corresponding HDF5 files. When the original baseband data are stored in PSRFITS format, each space coordinate corresponds to a folder, each folder only stores one PSRFITS format file, each PSRFITS format file