CN-121979850-A - Compression method and device for software files, storage medium and computer equipment

CN121979850ACN 121979850 ACN121979850 ACN 121979850ACN-121979850-A

Abstract

The invention discloses a compression method, a device, a storage medium and computer equipment of a software file, which comprise the steps of responding to a compression signal of the target software file, obtaining file attribute information, compression demand information and file use information of the target software file, splitting the target software file into a plurality of semantic blocks, determining a compression strategy adapted to the target software file based on the file attribute information, the compression demand information and the file use information, compressing each semantic block based on the compression strategy, judging whether a current semantic block meets a compression interrupt condition in the compression process, interrupting the compression process of the current semantic block to release thread resources if the current semantic block meets the compression interrupt condition, responding to a continuous compression signal of the current semantic block, identifying the compression interrupt position of the current semantic block, and continuing to compress residual uncompressed file data in the current semantic block from the compression interrupt position.

Inventors

Yi Lubo
TIAN YE
HE SHIWEI

Assignees

成都鲁易科技有限公司

Dates

Publication Date: 20260505
Application Date: 20251216

Claims (10)

1. A method for compressing a software file, comprising: Responding to a compression signal of a target software file, acquiring file attribute information, compression demand information and file use information of the target software file, and splitting the target software file into a plurality of semantic blocks; determining a compression strategy adapted to the target software file based on the file attribute information, the compression requirement information and the file use information; Based on the compression strategy, compressing each semantic block, and judging whether the current semantic block meets a compression interrupt condition in the compression process, if so, interrupting the compression process of the current semantic block to release thread resources; and responding to the continuous compression signal of the current semantic block, identifying the compression interrupt position of the current semantic block, and starting from the compression interrupt position, continuing to compress the residual uncompressed file data in the current semantic block.
2. The method according to claim 1, wherein the method further comprises: determining a block fingerprint of each semantic block, splicing a compressed block obtained by compressing each semantic block into a compressed file of the target software file based on the position sequence of each semantic block in the target software file, determining the length and the offset of each compressed block in the compressed file, and packaging the block fingerprint, the length and the offset into a description point tuple of each compressed block; responding to the update signal of the target software file, determining a semantic block to be updated in each semantic block, updating the semantic block to be updated, and determining an updated block fingerprint of the updated semantic block to be updated; Compressing the updated semantic blocks to be updated to obtain updated compressed blocks, determining compression block replacement positions in the compressed files based on description point tuples of the semantic blocks to be updated, replacing the compressed blocks at the compression block replacement positions with the updated compressed blocks, and taking the compressed files after replacing the updated compressed blocks as updated compressed files; and determining the update length and the update offset of the update compression block in the update compression file, and packaging the update block fingerprint, the update length and the update offset into an update description tuple of the update compression block.
3. The method of claim 1, wherein prior to splitting the target software file into a plurality of semantic blocks, the method further comprises: Dividing the target software file into a plurality of file fragments, and respectively determining a file hash value, content semantic information and structural metadata of each file fragment; for each file segment, respectively determining a hash feature vector corresponding to the file hash value, a semantic feature vector corresponding to the content semantic information and a metadata feature vector corresponding to the structured metadata; Based on the file type of the target software file, respectively determining weight coefficients corresponding to the hash feature vector, the semantic feature vector and the metadata feature vector, and based on the weight coefficients, carrying out weighted superposition on the hash feature vector, the semantic feature vector and the metadata feature vector to obtain a comprehensive feature vector corresponding to each file segment; and determining repeated file fragments in each file fragment based on the comprehensive feature vector, and performing deduplication processing on the repeated file fragments in the target software file.
4. The method of claim 1, wherein said compressing each of said semantic blocks based on said compression policy comprises: taking any semantic block in each semantic block as a target semantic block, scanning a dependent file and a configuration file of the target semantic block, and generating a file dependent list based on the dependent file; Determining configuration file attribute information of the configuration file, and determining a format locking strategy of the configuration file based on the configuration file attribute information; And compressing the target semantic block based on the file dependency list and the compression strategy, and performing format locking on the configuration file in the target semantic block by adopting the format locking strategy in the process of compressing the target semantic block.
5. The method of claim 1, wherein said compressing each of said semantic blocks based on said compression policy comprises: Determining the number of compression threads based on the computing resources of a compression system for compressing the semantic blocks, and splitting the compression task of the semantic blocks into a plurality of sub-compression tasks based on the number of compression threads; scheduling each compression thread to compress the corresponding sub-compression task; and in the process of compressing the sub-compression task, calculating the compressed volume and the compression residual time of the semantic block in real time, and displaying the compressed volume and the compression residual time in real time as a compression progress bar.
6. The method of claim 1, wherein the identifying a compression interrupt location of the current semantic block in response to the subsequent compression signal of the current semantic block comprises: Determining interrupt file data compressed at the last compression time in the current semantic block, determining file descriptors of the interrupt file data and context associated information in the current semantic block, and storing the file descriptors and the context associated information to a continuous compression task in a task queue; determining thread pool state information, system memory state information and task priority information in the task queue of a thread pool corresponding to the continuous compression task; judging whether the continuous compression task meets a task scheduling condition or not based on the thread pool state information, the system memory state information and the task priority information, and if yes, identifying the compression interrupt position of the current semantic block based on the file descriptor and the context associated information.
7. The method of claim 1, wherein after continuing to compress remaining uncompressed file data in the current semantic block starting from the compression interrupt location, the method further comprises: Determining software characteristic data of the target software file and compression parameters of the target software file in a compression process, wherein the software characteristic data comprises at least one of a version number, a development team, a release date and compatibility information; Carrying out binding equipment authorization processing and encryption processing on a compressed file formed by the compressed blocks after compressing each semantic block; and generating a digital signature of the encrypted compressed file, and storing the digital signature, the software characteristic data, the compression parameters and the encrypted compressed file in a correlated way.
8. A software file compression apparatus, comprising: the device comprises an acquisition unit, a compression unit and a processing unit, wherein the acquisition unit is used for responding to a compression signal of a target software file, acquiring file attribute information, compression demand information and file use information of the target software file, and splitting the target software file into a plurality of semantic blocks; the determining unit is used for determining the compression strategy adapted to the target software file based on the file attribute information, the compression requirement information and the file use information; The judging unit is used for compressing each semantic block based on the compression strategy, judging whether the current semantic block meets the compression interrupt condition in the compression process, and if so, interrupting the compression process of the current semantic block to release thread resources; The compression unit is used for responding to the continuous compression signal of the current semantic block, identifying the compression interrupt position of the current semantic block, and continuing to compress the residual uncompressed file data in the current semantic block from the compression interrupt position.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program when executed by the processor implements the steps of the method according to any one of claims 1 to 7.

Description

Compression method and device for software files, storage medium and computer equipment Technical Field The present invention relates to the field of software compression technologies, and in particular, to a method and apparatus for compressing a software file, a storage medium, and a computer device. Background In the software development, distribution, storage and operation and maintenance scenarios, software compression is a key link for improving the resource utilization efficiency. Whether a developer uploads a software installation package to an application store, an enterprise files and stores historical software versions, or a user downloads the software through a network, the software files are required to be compressed, so that the file volume is reduced, the storage space is saved, and the network transmission bandwidth and time cost are reduced. For example, mobile application developers need to compress tens of MB or even hundreds of MB APK installation packages and upload the APK installation packages to an application market, so that users can conveniently and quickly download the APK installation packages, and enterprise IT departments need to compress and archive a large number of old version software installation packages, so that storage resources of a server are saved. Currently, a unified compression mode is generally adopted to compress all software. However, the unified compression method is not necessarily applicable to all files, and thus the compression efficiency and compression accuracy of the software may be reduced. Disclosure of Invention The invention provides a compression method, a device, a storage medium and computer equipment of a software file, which mainly aims at improving the compression efficiency and compression precision of the software file. According to a first aspect of the present invention, there is provided a method of compressing a software file, comprising: Responding to a compression signal of a target software file, acquiring file attribute information, compression demand information and file use information of the target software file, and splitting the target software file into a plurality of semantic blocks; determining a compression strategy adapted to the target software file based on the file attribute information, the compression requirement information and the file use information; Based on the compression strategy, compressing each semantic block, and judging whether the current semantic block meets a compression interrupt condition in the compression process, if so, interrupting the compression process of the current semantic block to release thread resources; and responding to the continuous compression signal of the current semantic block, identifying the compression interrupt position of the current semantic block, and starting from the compression interrupt position, continuing to compress the residual uncompressed file data in the current semantic block. Optionally, the method further comprises: determining a block fingerprint of each semantic block, splicing a compressed block obtained by compressing each semantic block into a compressed file of the target software file based on the position sequence of each semantic block in the target software file, determining the length and the offset of each compressed block in the compressed file, and packaging the block fingerprint, the length and the offset into a description point tuple of each compressed block; responding to the update signal of the target software file, determining a semantic block to be updated in each semantic block, updating the semantic block to be updated, and determining an updated block fingerprint of the updated semantic block to be updated; Compressing the updated semantic blocks to be updated to obtain updated compressed blocks, determining compression block replacement positions in the compressed files based on description point tuples of the semantic blocks to be updated, replacing the compressed blocks at the compression block replacement positions with the updated compressed blocks, and taking the compressed files after replacing the updated compressed blocks as updated compressed files; and determining the update length and the update offset of the update compression block in the update compression file, and packaging the update block fingerprint, the update length and the update offset into an update description tuple of the update compression block. Optionally, before splitting the target software file into a plurality of semantic blocks, the method further comprises: Dividing the target software file into a plurality of file fragments, and respectively determining a file hash value, content semantic information and structural metadata of each file fragment; for each file segment, respectively determining a hash feature vector corresponding to the file hash value, a semantic feature vector corresponding to the content semantic information and a metadata featur