CN-116541398-B - Data storage method and device, data searching method and device and terminal
Abstract
A data storage method, a data storage device, a data searching method, a data searching device and a terminal comprise the steps of determining original data, sorting the original data according to the original indexes to obtain multiple rows of original data, traversing the sorted rows of original data and dividing data files one by one, traversing the rows of original data and dividing data blocks one by one for each divided data file, constructing a secondary index list for each data file, and constructing a primary index list based on each secondary index list. The invention can improve the searching efficiency and accuracy.
Inventors
- CHEN CHENG
- SONG XIANGPING
Assignees
- 杭州数云信息技术有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20230428
Claims (12)
- 1. A method of data storage, comprising: determining raw data, the raw data comprising a plurality of raw indices, each raw index being for indicating a single complete data and having one or more raw values; sorting the original data according to the original indexes to obtain a plurality of rows of original data, wherein each row of original data comprises the original indexes and original values; Traversing the ordered rows of original data and dividing the original data into data files one by one, wherein in each two adjacent data files, the original index of the last row of original data of the previous data file is different from the original index of the first row of original data of the next data file; For each divided data file, traversing each row of original data and dividing the data blocks one by one, wherein the original index of the last row of original data of the previous data block is different from the original index of the first row of original data of the next data block in every two adjacent data blocks; Constructing a secondary index list for each data file, each secondary index list having a plurality of rows of secondary index information, wherein each row of secondary index information is used to represent one or more data blocks; a primary index list is constructed based on each secondary index list, wherein the primary index list has a plurality of rows of primary index information, wherein each row of primary index information is used to represent one or more data files.
- 2. The method of claim 1, wherein traversing the ordered rows of raw data and performing a data file-by-data file division comprises: and when the first preset line number is reached, if the original index of the original data of the next line is different from the original index of the original data of the current line, adopting the original data of the first preset line number as a single data file, otherwise, carrying out back-extension until the original index of the original data of the next line is different from the original index of the original data of the current line, and adopting the sum of the original data of the first preset line number and the back-extension as the single data file.
- 3. The method of claim 2, wherein traversing each row of raw data and performing a block-by-block division comprises: when the second preset line number is reached, if the original index of the original data of the next line is different from the original index of the original data of the current line, the original data of the second preset line number is adopted as a single data block, otherwise, the original data of the next line is delayed until the original index of the original data of the next line is different from the original index of the original data of the current line, and the sum of the original data of the second preset line number and the original data of the later delay is adopted as a single data block; the second preset line number is smaller than the first preset line number.
- 4. A method according to any one of claim 1 to 3, wherein, Each row of secondary index information comprises a corresponding original data start row number and end row number of each data block, and a corresponding minimum original index and maximum original index of each data block.
- 5. The method of claim 1, wherein the step of determining the position of the substrate comprises, Each row of first-level index information comprises a second-level index serial number, a start row number and an end row number of the second-level index information of the corresponding data file, and a minimum original index and a maximum original index of the second-level index information of the corresponding data file; the secondary index serial numbers and the secondary index list have a one-to-one correspondence.
- 6. A data search method based on the data storage method according to any one of claims 1 to 5, comprising: Searching the primary index list by adopting an original index to be searched, and determining a secondary index list to be searched; searching a determined secondary index list by adopting the original index to be searched, and determining a data block to be searched; And searching the determined data block by adopting the original index to be searched to determine the original value of the original index to be searched.
- 7. The method of claim 6, wherein each row of primary index information comprises a secondary index number, a start row number and an end row number of the secondary index information of the corresponding data file, a minimum original index and a maximum original index of the secondary index information of the corresponding data file; Searching the primary index list by adopting an original index to be searched, and determining a secondary index list to be searched, wherein the method comprises the following steps: determining the row number of the original index to be searched in the primary index list according to the minimum original index and the maximum original index of each data file; Determining secondary index information corresponding to the original index to be searched based on the line number of the original index to be searched in the primary index list and the starting line number and the ending line number of each data file, and determining a secondary index list of the data file corresponding to the secondary index information based on the secondary index information.
- 8. The method of claim 7, wherein each row of secondary index information comprises a corresponding starting row number and ending row number of the original data of each data block, a corresponding minimum original index and a corresponding maximum original index of each data block; searching the determined secondary index list by adopting the original index to be searched, and determining a data block to be searched, wherein the method comprises the following steps: Determining the row number of the original index to be searched in the secondary index list according to the minimum original index and the maximum original index of each data block; Determining a data file to be searched based on the line number of the secondary index list; and determining the data block to be searched based on the starting line number and the ending line number of each data block in the determined data file.
- 9. A data storage device, comprising: a raw data determination module for determining raw data, the raw data comprising a plurality of raw indices, each raw index being for indicating a single complete data and having one or more raw values; The sorting module is used for sorting the original data according to the original indexes to obtain a plurality of rows of original data, wherein each row of original data comprises the original indexes and original values; The file dividing module is used for traversing the ordered rows of original data and dividing the original data into data files one by one, wherein in each two adjacent data files, the original index of the last row of original data of the previous data file is different from the original index of the first row of original data of the next data file; The block dividing module is used for traversing each row of original data and dividing the data blocks one by one for each divided data file, wherein the original index of the last row of original data of the previous data block is different from the original index of the first row of original data of the next data block in every two adjacent data blocks; A secondary list construction module, configured to construct a secondary index list for each data file, where each secondary index list has multiple rows of secondary index information, and each row of secondary index information is used to represent one or more data blocks; The primary list construction module is used for constructing a primary index list based on each secondary index list, wherein the primary index list is provided with a plurality of rows of primary index information, and each row of primary index information is used for representing one or more data files.
- 10. A data lookup device based on the data storage device of claim 9, comprising: the secondary list determining module is used for searching the primary index list by adopting an original index to be searched and determining a secondary index list to be searched; The data block determining module is used for searching the determined secondary index list by adopting the original index to be searched, and determining the data block to be searched; and the original value determining module is used for searching the determined data block by adopting the original index to be searched so as to determine the original value of the original index to be searched.
- 11. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, performs the steps of the data storage method of any one of claims 1 to 5, or performs the steps of the data lookup method of any one of claims 6 to 8.
- 12. A terminal comprising a memory and a processor, the memory having stored thereon a computer program executable on the processor, characterized in that the processor, when executing the computer program, performs the steps of the data storage method of any of claims 1 to 5 or the steps of the data lookup method of any of claims 6 to 8.
Description
Data storage method and device, data searching method and device and terminal Technical Field The present invention relates to the field of computer technologies, and in particular, to a data storage method and apparatus, a data searching method and apparatus, and a terminal. Background As the requirements of users on the capacity and the read-write performance of the data storage system in the communication device are continuously enhanced, the data storage system of the current communication device often needs to store massive data. In the prior art, massive data is usually preprocessed, divided into a plurality of data blocks, and stored, and when searching for certain data, the plurality of data blocks need to be searched to obtain all symbols of the data, so that the original data is restored. However, the above method takes a long time, resulting in low data read/write efficiency and increased error rate. Disclosure of Invention The technical problem to be solved by the invention is to provide a data storage method and device, a data searching method and device and a terminal, which can effectively improve searching efficiency and accuracy under the condition of keeping the storage space unchanged, and can also adopt a storage medium with lower cost under the condition of keeping the searching efficiency, and improve searching accuracy. In order to solve the technical problems, the embodiment of the invention provides a data storage method, which comprises the steps of determining original data, wherein the original data comprises a plurality of original indexes, each original index is used for indicating single complete data and is provided with one or more original values, sorting the original data according to the original indexes to obtain a plurality of rows of original data, each row of original data comprises the original indexes and original values, traversing the sorted rows of original data and dividing the data files one by one, wherein the original indexes of the original data of the last row of the previous data file are different from the original indexes of the original data of the first row of the next data file, traversing each row of divided data files and dividing the original data of each row of adjacent data blocks one by one, wherein the original indexes of the original data of the last row of the previous data block are different from the original indexes of the first row of the original data of the next data block, constructing a secondary index list for each data file, each secondary index list is provided with secondary index information, and each secondary index list is used for constructing one or more rows of index information, wherein each secondary index is used for constructing one or more primary index lists. Optionally, traversing the sequenced raw data of each row and dividing the data file one by one, including adopting the raw data of the first preset row as a single data file if the raw index of the raw data of the next row is different from the raw index of the raw data of the current row whenever the first preset row is reached, otherwise, postponing until the raw index of the raw data of the next row is different from the raw index of the raw data of the current row, and adopting the sum of the raw data of the first preset row and the postponed raw data as the single data file. Optionally, traversing the original data of each row and dividing the data blocks one by one, wherein each time a second preset row number is reached, if the original index of the original data of the next row is different from the original index of the original data of the current row, the original data of the second preset row number is adopted as a single data block, otherwise, the data blocks are delayed until the original index of the original data of the next row is different from the original index of the original data of the current row, and the sum of the original data of the second preset row number and the delayed original data is adopted as a single data block, wherein the second preset row number is smaller than the first preset row number. Optionally, each row of the secondary index information includes a start row number and an end row number of the original data of the corresponding respective data block, and a minimum original index and a maximum original index of the corresponding respective data block. Optionally, each row of first-level index information comprises a second-level index serial number, a start row number and an end row number of the second-level index information of the corresponding data file, and a minimum original index and a maximum original index of the second-level index information of the corresponding data file, wherein the second-level index serial numbers and the second-level index list have a one-to-one correspondence. In order to solve the technical problems, the embodiment of the invention provides a data searching method based on the data s