CN-122019405-A - Astronomical observation data key value storage optimization method and system based on persistent memory
Abstract
The invention discloses an astronomical observation data key value storage optimization method and system based on a persistent memory. The method comprises the steps of receiving an original data stream of astronomical observation equipment, analyzing observation attribute information carried in the original data stream, generating a hierarchically encoded key based on the observation attribute information, packaging the key and a corresponding observation data value into a writing request, accumulating the writing request into batch data, distributing continuous storage space in a data area of a persistent memory for the batch data, writing the batch data into the distributed storage space by using a non-temporary storage instruction, updating an index structure based on storage addresses of the key and the observation data value in the persistent memory, and positioning the address of target data in the persistent memory and reading the data through the index structure according to the type and the condition of a query request. The invention realizes the real-time reliable storage and the rapid multidimensional retrieval of the new generation astronomical observation data by utilizing the nanosecond access delay and byte addressing characteristics of the persistent memory.
Inventors
- LIU YINGBO
Assignees
- 云南财经大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260129
Claims (10)
- 1. The astronomical observation data key value storage optimization method based on the persistent memory is characterized by comprising the following steps of: Receiving an original data stream of astronomical observation equipment, analyzing observation attribute information carried in the original data stream, generating a hierarchically encoded key based on the observation attribute information, and packaging the key and a corresponding observation data value into a writing request; accumulating the writing request as batch data, distributing continuous storage space for the batch data in a data area of a persistent memory, and writing the batch data into the distributed storage space by using a non-temporary storage instruction; updating an index structure based on the key and the storage address of the observed data value in the persistent memory; and the query processing step is that the query request is responded, and the address of the target data in the persistent memory is positioned through the index structure and the data is read according to the type and the condition of the query request.
- 2. The method for optimizing astronomical observation data key value storage based on persistent memory according to claim 1, wherein the step of generating hierarchically encoded keys based on observation attribute information comprises: And splicing the observation item identification, the timestamp, the frequency channel identification, the pointing coordinate code, the polarization mode identification and the serial number with the bit width according to a preset sequence to form a globally unique key.
- 3. The method for optimizing astronomical observation data key-value storage based on persistent memory according to claim 1, wherein the step of persistent writing further comprises a compressing sub-step of: sampling the batch data to calculate a statistical feature value representing the redundancy of the data before writing; Selecting to start or skip compression operation based on the comparison result of the statistical characteristic value and a preset threshold value; If compression is started, according to the type of the observed data, one compression strategy of differential coding combined with LZ4 compression, wavelet transformation combined with quantization compression and Rice coding is selected for compression.
- 4. The method for optimizing astronomical observation data key storage based on persistent memory according to claim 1, wherein before writing batch data into the allocated storage space using a non-temporary storage instruction, further comprising a persistence log sub-step of: writing a log record containing the key, the target storage address and the operation type into a volatile log buffer area of the thread local; using a cache line write-back instruction to brush the content in the volatile log buffer into a log area of a persistent memory; and executing a memory barrier instruction, and executing subsequent batch data writing operation after the log record is made to be persistent.
- 5. The method for optimizing key-value storage of astronomical observation data based on persistent memory according to claim 1, wherein the index constructing step includes: constructing a main index based on an adaptive radix tree, and storing the mapping relation from the key to a data storage address; constructing a time slicing index, dividing a time axis into time slices with fixed intervals, maintaining an independent index structure for each time slice, and updating only the index of the current active time slice; And constructing a space hash index, taking a pointing coordinate code in the observation attribute information as a key, and taking an address list of all observation data in a corresponding coordinate range as a value.
- 6. The method for optimizing astronomical observation data key value storage based on persistent memory according to claim 5, wherein the query processing step includes: If the key is the point query, judging whether the key exists in the storage space, if so, searching the key from top to bottom through the main index, and reading data corresponding to a data storage address; If the time range is queried, determining a related time slice set according to the query time range, acquiring all keys which are in the related time slice set and meet the time condition through the time slice index, and acquiring corresponding data through the main index; If the space range is inquired, converting the inquired celestial coordinate range into a corresponding pointing coordinate coding set, and acquiring a data address list corresponding to the pointing coordinate coding set through the space hash index; If the query is a compound condition query, evaluating the condition constraint range of each query condition, and sequentially applying the filtering conditions according to the order of the condition constraint ranges from small to large.
- 7. The method for optimizing astronomical observation data key value storage based on persistent memory according to claim 1, further comprising a persistent memory configuration step of: configuring the persistent memory into a mode allowing direct access of the application program, and mapping the persistent memory to a process address space; dividing the process address space into a metadata area, an index area and a data area; space allocation and reclamation of the data region is managed by adopting a sectional strategy, wherein the sectional strategy comprises a plurality of allocation algorithms aiming at objects with different sizes.
- 8. The method for optimizing astronomical observation data key value storage based on persistent memory according to claim 7, wherein the persistent writing step further comprises a wear leveling control sub-step of: The method comprises the steps of sorting the memory sections according to the writing frequency, distributing the memory sections for the current writing operation according to the sorting from low to high, marking the current memory section as read-only if the writing frequency of the current memory section reaches the upper limit of the service life of a medium, and migrating data in the current memory section to other memory sections.
- 9. The method for optimizing astronomical observation data key value storage based on persistent memory according to claim 1, further comprising a system maintenance step of: Scanning and identifying an original data segment with the utilization rate lower than a threshold value in a storage space, triggering data segment compression operation, migrating the original data segment to a data segment of a newly allocated storage space, and releasing the original data segment; For the historical time slicing index in the read-only state, periodically merging the internal fragmented index nodes of the historical time slicing index, and updating a metadata pointer pointing to the index; And periodically recording the current log lifting point, each index root node address and the memory allocator state as consistent snapshots, and alternately writing the consistent snapshots into two independent storage areas by using a double-buffer strategy.
- 10. The astronomical observation data key value storage optimization system of the astronomical observation data key value storage optimization method based on the persistent memory according to any one of claims 1-9, comprising: the data receiving and key generating layer is configured to receive an original data stream of astronomical observation equipment, analyze observation attribute information carried in the data stream, generate a hierarchically encoded key based on the observation attribute information, and package the key and a corresponding observation data value into a writing request; A persistence engine layer configured to accumulate the write requests as batch data, allocate continuous storage space for the batch data in a data area of a persistent memory, and write the batch data into the allocated storage space using a non-temporary storage instruction; an index management layer configured to update a master index structure based on the key and a storage address of the observed data value in the persistent memory; And the query service layer is configured to respond to a query request, locate the address of target data in the persistent memory through the main index structure or the auxiliary index structure according to the type and the condition of the query request and read the data.
Description
Astronomical observation data key value storage optimization method and system based on persistent memory Technical Field The invention relates to the technical field of astronomical data processing and storage, in particular to a key value storage optimization system and method of high-speed astronomical observation data based on a persistent memory. Background The data generation rate of modern astronomical observers has increased exponentially, presenting unprecedented challenges to the performance of storage systems. Taking square kilometer array radio telescope (Square Kilometre Array, SKA) as an example, the project is expected to produce several TB levels of raw observations per second after construction, and the annual data volume will reach EB levels. Under the high-time-resolution pulsar search mode, the data generation rate of the sky-eye FAST radio telescope can reach tens of GB/s when the 19-beam receiver works simultaneously. The adaptive optics system of a very large telescope (European Extremely Large Telescope, E-ELT) requires processing thousands of frames of high resolution image data per second. These new generation astronomical observers place stringent demands on the write bandwidth, access latency, and data persistence of data storage systems. Current astronomical data storage faces the following technical bottlenecks: First, the write latency is too high. Conventional storage systems based on mechanical hard disks or solid state hard disks have difficulty meeting the real-time storage requirements of such high-speed data streams. Mechanical hard disks are limited by head seek time, with random write delays typically on the order of milliseconds, and sequential write bandwidths on the order of hundreds of MB/s. Although the solid state disk is obviously improved in the random access performance, the solid state disk is limited by the erasing-writing characteristic of a Flash storage medium, the problem of writing amplification exists under the continuous high-load writing scene, the delay is still in the level of tens of microseconds, the data generation rate of nanosecond cannot be matched, and the overflow of a data buffer area and the loss of observed data are caused. In the high-speed sampling observation mode, even brief storage delay jitter may cause missed detection of critical astronomical events. For example, fast Radio Burst (FRB) has a duration of only a few milliseconds, and requires extremely high real-time response capability for a memory system. Second, the data persistence overhead is significant. Conventional storage systems rely on the page cache mechanism of the operating system to ensure data persistence, and existing Key-Value (Key/Value) storage systems require frequent calls fsync or similar system calls to force data in memory to be flushed into the storage medium to ensure data persistence. The operation involves multiple links of switching between kernel mode and user mode of the operating system, updating metadata of the file system, refreshing a cache of the storage controller, and the like, and the process introduces significant software overhead and uncertainty delay, so that serious performance loss is caused. In a high throughput write scenario, the persistence overhead may account for more than 50% of the total write time. Third, the indexing structure is inefficient. The traditional B+ tree index structure has serious write amplification problem under the high concurrent write scene, and each leaf node update can cause the splitting and adjustment of multi-level nodes. While LSM tree (Log-Structured MERGE TREE) mitigates write amplification by appending write patterns, background Compaction operation consumes a large amount of I/O bandwidth and range queries require merging of multiple levels of data. None of these index structures is suitable for sequential dense writing of astronomical data and multidimensional range query features. Fourth, metadata management is complex. Astronomical observation data has multi-dimensional attribute identifications including observation time stamps, frequency channels, celestial pointing coordinates, polarization modes, observation item numbers, and the like. These attributes constitute a multidimensional index space of data, and existing storage systems lack a targeted metadata organization scheme, so that it is difficult to efficiently support compound condition queries based on time ranges, space ranges and frequency ranges. Fifth, the data compression policy is single. Astronomical observation data has diversified data characteristics, time domain sampling data shows strong time correlation, spectrum data has sparsity characteristics, and spatial redundancy exists in image data. The general compression algorithm is difficult to optimize for the characteristics of different types of data, and the compression rate and the compression speed are difficult to balance. The persistent memory (PERSISTENT MEMORY, PMem) i