Search

CN-115576935-B - Storage cleaning method and device for Hadoop, computer equipment and storage medium

CN115576935BCN 115576935 BCN115576935 BCN 115576935BCN-115576935-B

Abstract

The embodiment of the application belongs to the field of big data, and relates to a storage cleaning method, a device, computer equipment and a storage medium for Hadoop, wherein the method comprises the steps of acquiring SQL execution logs; the method comprises the steps of analyzing SQL execution logs to obtain historical access information of each data table in Hadoop, generating a cleaning strategy based on the historical access information, cleaning each data table in the Hadoop according to the cleaning strategy to store and clean the Hadoop, generating cleaning feedback information based on the historical access information, and sending the cleaning feedback information to a preset terminal, wherein the cleaning feedback information is used for storing and optimizing the Hadoop. In addition, the application also relates to a blockchain technology, and SQL execution logs can be stored in the blockchain. The method and the device can comprehensively store and clean the Hadoop, and reduce waste of Hadoop storage resources.

Inventors

  • XIE PEIPEI

Assignees

  • 中国平安财产保险股份有限公司

Dates

Publication Date
20260512
Application Date
20221107

Claims (8)

  1. 1. The storage cleaning method for Hadoop is characterized by comprising the following steps of: Acquiring an SQL execution log; analyzing the SQL execution log to obtain historical access information of each data table in the Hadoop; Generating a cleaning strategy based on the historical access information, wherein the cleaning strategy is a cleaning strategy corresponding to various data tables in the Hadoop; Respectively cleaning each data table in the Hadoop according to the cleaning strategy so as to store and clean the Hadoop; generating cleaning feedback information based on the historical access information, and sending the cleaning feedback information to a preset terminal, wherein the cleaning feedback information is used for carrying out storage optimization on the Hadoop; The step of cleaning each data table in the Hadoop according to the cleaning policy includes: reading each cleaning strategy respectively; when the cleaning strategy belongs to a table cleaning strategy, deleting a data table associated with the cleaning strategy; When the cleaning strategy belongs to a system cleaning strategy, compressing or merging a data table associated with the system cleaning strategy; the step of compressing or merging the data table associated with the system cleaning policy when the cleaning policy is the system cleaning policy includes: When the system cleaning strategy belongs to a first system cleaning strategy, merging data tables associated with the first system cleaning strategy, wherein the data tables are MR bottom data tables; When the system cleaning strategy belongs to a second system cleaning strategy, compressing a data table associated with the second system cleaning strategy, wherein the data table is an offline hive data table; And when the system cleaning strategy belongs to a third system cleaning strategy, compressing a history partition in a data table associated with the third system cleaning strategy, wherein the data table is a real-time hive data table.
  2. 2. The method for Hadoop based storage cleaning as claimed in claim 1, further comprising, before the step of obtaining the SQL execution log: Acquiring SQL sentences submitted by a user; executing the SQL statement, and monitoring the execution process of the SQL statement to obtain an SQL execution log.
  3. 3. The method for cleaning up the storage of hadoops according to claim 1, wherein the step of parsing the SQL execution log to obtain the historical access information of each data table in hadoops comprises: analyzing the SQL execution log to obtain each data table associated with each SQL statement, and access time and access times aiming at each data table; and generating historical access information of each data table in the Hadoop according to each obtained data table and the corresponding access time and access times.
  4. 4. The Hadoop-directed storage cleaning method according to claim 1, wherein the step of generating cleaning feedback information based on the history access information and transmitting the cleaning feedback information to a preset terminal comprises: Based on the historical access information, identifying an invalid data table and a corresponding task link in the Hadoop to generate link cleaning information; Generating historical partition cleaning information aiming at a partition data table in the Hadoop based on the historical access information; Based on the historical access information, generating table conversion information aiming at the full-quantity data table in the Hadoop; Generating statement feedback information based on SQL statements in the historical access information; And generating cleaning feedback information according to the link cleaning information, the historical partition cleaning information, the table conversion information and the statement feedback information, and sending the cleaning feedback information to a preset terminal.
  5. 5. The Hadoop-directed storage cleaning method of claim 4, further comprising: And adjusting the Hadoop according to the cleaning feedback information, wherein the adjustment comprises link cleaning, historical partition cleaning and form conversion.
  6. 6. A storage cleaning device for Hadoop, comprising: the log acquisition module is used for acquiring SQL execution logs; The information generation module is used for analyzing the SQL execution log to obtain historical access information of each data table in the Hadoop; the policy generation module is used for generating a cleaning policy based on the history access information, wherein the cleaning policy is a cleaning policy corresponding to various data tables in the Hadoop; The data table cleaning module is used for cleaning each data table in the Hadoop according to the cleaning strategy so as to store and clean the Hadoop; the feedback generation module is used for generating cleaning feedback information based on the historical access information and sending the cleaning feedback information to a preset terminal, wherein the cleaning feedback information is used for carrying out storage optimization on the Hadoop; the data table cleaning module comprises a strategy reading sub-module, a first cleaning sub-module and a second cleaning sub-module; the strategy reading submodule is used for respectively reading each cleaning strategy; the first cleaning submodule is used for deleting a data table associated with the cleaning strategy when the cleaning strategy belongs to the table cleaning strategy; The second cleaning submodule is used for compressing or merging the data table associated with the system cleaning strategy when the cleaning strategy belongs to the system cleaning strategy; The second cleaning submodule comprises a first cleaning unit, a second cleaning unit and a third cleaning unit; the first cleaning unit is configured to merge data tables associated with a first system cleaning policy when the system cleaning policy belongs to the first system cleaning policy, where the data tables are MR bottom data tables; the second cleaning unit is configured to compress a data table associated with a second system cleaning policy when the system cleaning policy belongs to the second system cleaning policy, where the data table is an offline hive data table; and the third cleaning unit is used for compressing the historical partition in the data table associated with the third system cleaning policy when the system cleaning policy belongs to the third system cleaning policy, wherein the data table is a real-time hive data table.
  7. 7. A computer device comprising a memory having stored therein computer readable instructions which when executed implement the steps of the storage cleaning method for Hadoop of any of claims 1 to 5.
  8. 8. A computer readable storage medium, characterized in that it has stored thereon computer readable instructions which, when executed by a processor, implement the steps of the storage cleaning method for Hadoop according to any of claims 1 to 5.

Description

Storage cleaning method and device for Hadoop, computer equipment and storage medium Technical Field The present application relates to the field of big data technologies, and in particular, to a method and apparatus for cleaning storage of Hadoop, a computer device, and a storage medium. Background Hadoop is a big data software system operation framework, and along with the popularization of big data technology in recent years, the use of Hadoop by various institutions is also increasing, and a large amount of historical data is accumulated. The historical data not only occupies a large amount of storage resources, but also affects the operation efficiency of each component of the Hadoop ecological circle, so that the storage cleaning of the Hadoops is particularly important. However, the current storage cleaning technology for Hadoop mainly aims at merging the MR bottom data table (table file generated by the bottom in the mapreduce executing process), and this single cleaning mode can only slightly and temporarily relieve the storage pressure of Hadoop, and cannot effectively clean the storage of Hadoop. Disclosure of Invention The embodiment of the application aims to provide a storage cleaning method, a device, computer equipment and a storage medium for Hadoop, so that the storage cleaning of the Hadoop is effectively carried out, and the waste of storage resources of the Hadoop is reduced. In order to solve the technical problems, the embodiment of the application provides a storage cleaning method for Hadoop, which adopts the following technical scheme: Acquiring an SQL execution log; analyzing the SQL execution log to obtain historical access information of each data table in the Hadoop; Generating a cleaning strategy based on the historical access information, wherein the cleaning strategy is a cleaning strategy corresponding to various data tables in the Hadoop; Respectively cleaning each data table in the Hadoop according to the cleaning strategy so as to store and clean the Hadoop; And generating cleaning feedback information based on the historical access information, and sending the cleaning feedback information to a preset terminal, wherein the cleaning feedback information is used for carrying out storage optimization on the Hadoop. In order to solve the technical problems, the embodiment of the application also provides a storage cleaning device for Hadoop, which adopts the following technical scheme: the log acquisition module is used for acquiring SQL execution logs; The information generation module is used for analyzing the SQL execution log to obtain historical access information of each data table in the Hadoop; the policy generation module is used for generating a cleaning policy based on the history access information, wherein the cleaning policy is a cleaning policy corresponding to various data tables in the Hadoop; The data table cleaning module is used for cleaning each data table in the Hadoop according to the cleaning strategy so as to store and clean the Hadoop; The feedback generation module is used for generating cleaning feedback information based on the historical access information and sending the cleaning feedback information to a preset terminal, wherein the cleaning feedback information is used for carrying out storage optimization on the Hadoop. In order to solve the above technical problems, the embodiment of the present application further provides a computer device, which adopts the following technical schemes: Acquiring an SQL execution log; analyzing the SQL execution log to obtain historical access information of each data table in the Hadoop; Generating a cleaning strategy based on the historical access information, wherein the cleaning strategy is a cleaning strategy corresponding to various data tables in the Hadoop; Respectively cleaning each data table in the Hadoop according to the cleaning strategy so as to store and clean the Hadoop; And generating cleaning feedback information based on the historical access information, and sending the cleaning feedback information to a preset terminal, wherein the cleaning feedback information is used for carrying out storage optimization on the Hadoop. In order to solve the above technical problems, an embodiment of the present application further provides a computer readable storage medium, which adopts the following technical schemes: Acquiring an SQL execution log; analyzing the SQL execution log to obtain historical access information of each data table in the Hadoop; Generating a cleaning strategy based on the historical access information, wherein the cleaning strategy is a cleaning strategy corresponding to various data tables in the Hadoop; Respectively cleaning each data table in the Hadoop according to the cleaning strategy so as to store and clean the Hadoop; And generating cleaning feedback information based on the historical access information, and sending the cleaning feedback information to a preset t