Search

CN-115427943-B - Data storage method and device and storage medium

CN115427943BCN 115427943 BCN115427943 BCN 115427943BCN-115427943-B

Abstract

A data storage method, a data storage device and a storage medium comprise the steps of acquiring cluster parameter information for storing data to be processed, the number of documents for historic writing indexes and the document size of the historic writing indexes when an index of the data to be processed is created (S101), distributing a plurality of index fragments for the data to be processed according to the cluster parameter information, the number of documents and the document size, and writing the data to be processed into the plurality of index fragments, wherein the plurality of index fragments are positioned on at least one node of a cluster (S102).

Inventors

  • GUO ZILIANG

Assignees

  • 深圳市欢太科技有限公司
  • 深圳市欢太科技有限公司
  • OPPO广东移动通信有限公司
  • OPPO广东移动通信有限公司

Dates

Publication Date
20260421
Application Date
20200602
Priority Date
20200602

Claims (14)

  1. 1. A method of data storage, the method comprising: when an index of data to be processed is created, cluster parameter information for storing the data to be processed, the number of documents of a history writing index and the size of the documents of the history writing index are obtained; And distributing a plurality of index fragments for the data to be processed according to the cluster parameter information, the number of the documents and the document size, and writing the data to be processed into the index fragments, wherein the index fragments are positioned on at least one node of the cluster.
  2. 2. The method of claim 1, wherein the cluster parameter information comprises at least one of: The method comprises the steps of total number of cores of a processor of a cluster, utilization rate of the processor, memory information of the cluster, memory utilization rate of the cluster, network throughput information of the cluster, network card data information in the cluster, total read-write operation times of magnetic disks in the cluster and average read-write operation times of the magnetic disks in the cluster.
  3. 3. The method of claim 1, wherein the allocating a plurality of index slices to the data to be processed according to the cluster parameter information, the number of documents, and the document size comprises: determining the index fragment number corresponding to the data to be processed according to the cluster parameter information, the document number and the document size; And distributing the index fragments to the data to be processed according to the index fragment number.
  4. 4. The method of claim 3, wherein the determining, according to the cluster parameter information, the number of documents, and the document size, the number of index slices corresponding to the data to be processed includes: determining a first number of the index fragments according to the cluster parameter information; determining a second number of the index shards according to the number of the documents; Determining a third number of the index shards according to the document size; and determining the index fragment number according to the first number, the second number and the third number.
  5. 5. The method of claim 3, wherein the allocating a plurality of index slices to the data to be processed according to the cluster parameter information, the number of documents, and the document size further comprises: creating index aliases corresponding to the plurality of index fragments; and indexing the data to be processed in the cluster by using the index alias.
  6. 6. The method of claim 1, wherein the method further comprises: according to the processor parameter information of the cluster, the memory information of the cluster and the total read-write operation times of the disks in the cluster, the number of nodes in the cluster is adjusted; and updating the cluster parameter information according to the clusters with the adjusted node numbers.
  7. 7. A data storage device, the device comprising: An acquisition unit, configured to acquire cluster parameter information storing data to be processed, the number of documents of a history writing index, and the document size of the history writing index when creating an index of the data to be processed; the distribution unit is used for distributing a plurality of index fragments for the data to be processed according to the cluster parameter information, the document quantity and the document size, and writing the data to be processed into the index fragments, wherein the index fragments are positioned on at least one node of the cluster.
  8. 8. The apparatus of claim 7, wherein the cluster parameter information comprises at least one of: The total number of cores of the processor of the data to be processed, the utilization rate of the processor, the memory information of the cluster, the memory utilization rate of the cluster, the network throughput information of the cluster, the network card data information in the cluster, the total read-write operation times of the disks in the cluster and the average read-write operation times of the disks in the cluster are stored.
  9. 9. The apparatus of claim 7, wherein the apparatus further comprises a determination unit; The determining unit is used for determining the number of index fragments corresponding to the data to be processed according to the cluster parameter information, the number of documents and the document size, and distributing the index fragments for the data to be processed according to the number of index fragments.
  10. 10. The apparatus of claim 9, wherein, The determining unit is used for determining the first number of the index fragments according to the cluster parameter information, determining the second number of the index fragments according to the number of the documents, determining the third number of the index fragments according to the size of the documents, and determining the number of the index fragments according to the first number, the second number and the third number.
  11. 11. The apparatus of claim 9, wherein the apparatus further comprises a creation unit; The creating unit is used for creating index aliases corresponding to the index fragments, and indexing the data to be processed in the cluster by using the index aliases.
  12. 12. The apparatus of claim 7, wherein the apparatus further comprises an adjustment unit; the adjusting unit is used for adjusting the number of nodes in the cluster according to the processor parameter information of the cluster, the memory information of the cluster and the total read-write operation times of the disks in the cluster, and updating the cluster parameter information according to the cluster with the adjusted number of nodes.
  13. 13. A data storage device, the device comprising: A memory, a processor and a communication bus, the memory being in communication with the processor via the communication bus, the memory storing a data storage program executable by the processor, the data storage program, when executed, performing the method of any of claims 1 to 6 by the processor.
  14. 14. A storage medium having stored thereon a computer program for application to a data storage device, wherein the computer program, when executed by a processor, implements the method of any of claims 1 to 6.

Description

Data storage method and device and storage medium Technical Field The present invention relates to the field of big data technologies, and in particular, to a data storage method and apparatus, and a storage medium. Background The open-source distributed search engine (ELASTICSEARCH, ES) can store and manage large-scale unstructured data and can also realize full-text retrieval of the stored data. This has led to the widespread use of ESs in search engine processes in recent years. In the prior art, when an ES is used to segment an index, a configuration parameter of a cluster needs to be manually obtained, the number of index segments is determined according to the configuration parameter, the created index is segmented according to the determined number of index segments, when index data is more and a plurality of indexes need to be created, the configuration parameter of the cluster needs to be manually and continuously obtained, and then the number of each index segment can be determined, so that a plurality of index segments are created, and the intelligence in creating the index segments is reduced. Disclosure of Invention In order to solve the technical problems, the embodiment of the invention is expected to provide a data storage method, a data storage device and a storage medium, which can improve the intelligence of the data storage device when creating index fragments. The technical scheme of the invention is realized as follows: the embodiment of the application provides a data storage method, which comprises the following steps: when an index of data to be processed is created, cluster parameter information for storing the data to be processed, the number of documents of a history writing index and the size of the documents of the history writing index are obtained; And distributing a plurality of index fragments for the data to be processed according to the cluster parameter information, the number of the documents and the document size, and writing the data to be processed into the index fragments, wherein the index fragments are positioned on at least one node of the cluster. In the above scheme, the cluster parameter information includes at least one of: The method comprises the steps of total number of cores of a processor of a cluster, utilization rate of the processor, memory information of the cluster, memory utilization rate of the cluster, network throughput information of the cluster, network card data information in the cluster, total read-write operation times of magnetic disks in the cluster and average read-write operation times of the magnetic disks in the cluster. In the above scheme, the allocating a plurality of index slices to the data to be processed according to the cluster parameter information, the number of documents and the document size includes: determining the index fragment number corresponding to the data to be processed according to the cluster parameter information, the document number and the document size; And distributing the index fragments to the data to be processed according to the index fragment number. In the above scheme, the determining, according to the cluster parameter information, the number of documents, and the document size, the number of index slices corresponding to the data to be processed includes: determining a first number of the index fragments according to the cluster parameter information; determining a second number of the index shards according to the number of the documents; Determining a third number of the index shards according to the document size; and determining the index fragment number according to the first number, the second number and the third number. In the above solution, the allocating a plurality of index slices to the data to be processed according to the cluster parameter information, the number of documents, and the document size further includes: creating index aliases corresponding to the plurality of index fragments; and indexing the data to be processed in the cluster by using the index alias. In the above scheme, the method further comprises: according to the processor parameter information of the cluster, the memory information of the cluster and the total read-write operation times of the disks in the cluster, the number of nodes in the cluster is adjusted; and updating the cluster parameter information according to the clusters with the adjusted node numbers. An embodiment of the present application provides a data storage device, including: An acquisition unit, configured to acquire cluster parameter information storing data to be processed, the number of documents of a history writing index, and the document size of the history writing index when creating an index of the data to be processed; the distribution unit is used for distributing a plurality of index fragments for the data to be processed according to the cluster parameter information, the document quantity and the document size, and writing th