CN-122018799-A - Method and system for storing, compressing and optimizing large model data in enterprise safety production

CN122018799ACN 122018799 ACN122018799 ACN 122018799ACN-122018799-A

Abstract

The invention relates to the technical field of data storage and discloses a method and a system for compressing and optimizing large-model data storage in enterprise safety production, wherein the method comprises the steps of performing deep semantic coding on multi-mode original data in the enterprise safety production field to obtain a high-dimensional semantic embedded vector; the method comprises the steps of carrying out combined low-rank tensor decomposition on high-dimensional semantic embedded vectors to obtain core semantic factors and auxiliary factors, carrying out low-rank approximate reconstruction on the high-dimensional semantic embedded vectors to obtain low-dimensional dense vectors, carrying out time sequence behavior analysis on the low-dimensional dense vectors to obtain data access frequency, associated compliance detection task criticality and recent update state, carrying out multidimensional weight analysis on the low-dimensional dense vectors to obtain buffer weight scores, carrying out priority distribution on the low-dimensional dense vectors and storing the low-dimensional dense vectors in a buffer layer of a multi-layer buffer architecture, and improving the efficiency of compression optimization of large-model data storage in enterprise safety production.

Inventors

TAN SHI
WANG XIANG
YANG LU
XIE HAIMING
ZENG DAO

Assignees

祥开瑞(深圳)智能安全技术有限公司

Dates

Publication Date: 20260512
Application Date: 20260128

Claims (10)

1. An enterprise security production large model data storage compression optimization method, which is characterized by comprising the following steps: s1, performing deep semantic coding on multi-mode original data in the field of enterprise safety production to obtain a high-dimensional semantic embedded vector of the multi-mode original data; s2, carrying out joint low-rank tensor decomposition on the high-dimensional semantic embedded vector to obtain a core semantic factor and an auxiliary factor of the high-dimensional semantic embedded vector; s3, based on the core semantic factors and the auxiliary factors, performing low-rank approximate reconstruction on the high-dimensional semantic embedded vectors to obtain low-dimensional dense vectors of the high-dimensional semantic embedded vectors; s4, performing time sequence behavior analysis on the low-dimensional dense vector to obtain the data access frequency, the related compliance detection task criticality and the recent update state of the low-dimensional dense vector; S5, carrying out multidimensional weight analysis on the low-dimensional dense vector based on the data access frequency, the associated compliance detection task criticality and the recent update state to obtain a cache weight score of the low-dimensional dense vector; and S6, based on the buffer weight scores, carrying out priority allocation on the low-dimensional dense vectors, and storing the low-dimensional dense vectors into a buffer level of a multi-layer buffer architecture.
2. The method for compressing and optimizing data storage of large model for enterprise safety production according to claim 1, wherein said deep semantic coding is performed on multi-modal raw data in the field of enterprise safety production to obtain a high-dimensional semantic embedded vector of the multi-modal raw data, comprising: Collecting text, images, videos and structured data in the enterprise safety production field to obtain multi-mode original data in the enterprise safety production field; Carrying out heterogeneous data normalization on the multi-mode original data to obtain standardized data of the multi-mode original data; carrying out modal feature extraction on the standardized data to obtain intermediate semantic features of the standardized data; and projecting the intermediate semantic features to a high-dimensional vector space to obtain a high-dimensional semantic embedded vector of the multi-mode original data.
3. The method for compressing and optimizing data storage of large model for enterprise safety production according to claim 1, wherein said performing joint low-rank tensor decomposition on said high-dimensional semantic embedded vector to obtain a core semantic factor and an auxiliary factor of said high-dimensional semantic embedded vector comprises: According to the data sample, the mode type and the embedding dimension of the high-dimensional semantic embedding vector, carrying out structural organization on the high-dimensional semantic embedding vector to obtain a three-dimensional embedding tensor of the high-dimensional semantic embedding vector; performing tensor deconstructment on the three-dimensional embedded tensor to obtain a core tensor, a modal factor matrix and a characteristic factor matrix of the three-dimensional embedded tensor; taking the core tensor as a core semantic factor of the high-dimensional semantic embedding vector; and performing matrix coupling on the modal factor matrix and the feature factor matrix to obtain auxiliary factors of the high-dimensional semantic embedded vector.
4. The method for compressing and optimizing data storage of large model for enterprise safety production according to claim 1, wherein said performing low-rank approximate reconstruction on said high-dimensional semantic embedded vector based on said core semantic factor and said auxiliary factor to obtain a low-dimensional dense vector of said high-dimensional semantic embedded vector comprises: performing cofactor analysis on the core semantic factors and the auxiliary factors to obtain a low-rank reconstruction mapping relation of the high-dimensional semantic embedded vector; Carrying out orthogonality constraint on the low-rank reconstruction mapping relation to obtain a low-dimensional potential space projection operator of the high-dimensional semantic embedded vector; and performing dimension reduction mapping on the high-dimensional semantic embedded vector based on the low-dimensional potential space projection operator to obtain a low-dimensional dense vector of the high-dimensional semantic embedded vector.
5. The method for compressing and optimizing data storage of large model for enterprise safety production according to claim 4, wherein said performing orthogonality constraint on said low-rank reconstruction mapping relationship to obtain a low-dimensional potential space projection operator of said high-dimensional semantic embedded vector comprises: performing matrix reconstruction on the low-rank reconstruction mapping relation to obtain a parameterized projection matrix of the low-rank reconstruction mapping relation; Orthogonalization decomposition is carried out on the parameterized projection matrix to obtain a standard orthonormal base matrix of the parameterized projection matrix; And taking the standard orthogonal base matrix as a low-dimensional potential space projection operator of the high-dimensional semantic embedding vector.
6. The method for compressing and optimizing data storage of large model for enterprise safety production according to claim 1, wherein said performing time sequence behavior analysis on said low-dimensional dense vector to obtain data access frequency, associated compliance detection task criticality, and recent update status of said low-dimensional dense vector comprises: Collecting a history access record of the low-dimensional dense vector; Performing sliding window statistical analysis on the history access record to obtain the data access frequency of the low-dimensional dense vector; Performing correlation evaluation on the low-dimensional dense vector and a core detection task in a preset compliance rule base to obtain the correlation compliance detection task criticality of the low-dimensional dense vector; And performing difference comparison on the last modified timestamp and the current timestamp of the low-dimensional dense vector to obtain the recent update state of the low-dimensional dense vector.
7. The method for compressing and optimizing data storage of large model for enterprise safety production according to claim 1, wherein said performing multidimensional weight analysis on the low-dimensional dense vector based on the data access frequency, the associated compliance detection task criticality and the recent update status to obtain a cache weight score of the low-dimensional dense vector comprises: normalizing the data access frequency, the associated compliance detection task criticality and the recent update state to obtain a frequency parameter, a criticality parameter and a state parameter of the low-dimensional dense vector; carrying out weighting factor analysis on the frequency parameter, the criticality parameter and the state parameter to obtain a weight coefficient of the low-dimensional dense vector; and based on the weight coefficient, carrying out linear weighting on the frequency parameter, the criticality parameter and the state parameter to obtain the buffer weight score of the low-dimensional dense vector.
8. The method for optimizing data storage compression of large model data for enterprise safety production according to claim 7, wherein the calculation formula of the buffer weight score is as follows: ; In the formula, Represent the first Buffer weight scores for the individual low-dimensional dense vectors, The frequency parameter is represented by a parameter representing the frequency, The parameter of the criticality is represented by, The state parameter is represented by a value representing the state, The frequency weighting coefficient is represented by a number of frequency coefficients, The weight coefficient of the criticality is represented, The state weight coefficient is represented as a function of the state weight coefficient, Representing a preset very small positive constant.
9. The method for compressive optimization of enterprise security production large model data storage of claim 1, wherein the prioritizing the low-dimensional dense vectors based on the cache weight scores and storing to a cache hierarchy of a multi-tiered cache architecture comprises: Based on the buffer weight scores, all the low-dimensional dense vectors are arranged in a descending order to obtain a global priority queue of the low-dimensional dense vectors; Performing capacity sensing segmentation on the global priority queue according to storage capacities and performance indexes of different levels in a multi-level cache architecture to obtain a vector demarcation level distribution boundary of the global priority queue; and defining a hierarchical distribution boundary based on the vector, and performing cross-hierarchical storage distribution on the low-dimensional dense vector.
10. An enterprise safety production large model data storage compression optimization system for implementing an enterprise safety production large model data storage compression optimization method of claim 1, the system comprising: The multi-mode semantic coding module is used for carrying out depth semantic coding on multi-mode original data in the enterprise safety production field to obtain a high-dimensional semantic embedded vector of the multi-mode original data; The combined tensor decomposition module is used for carrying out combined low-rank tensor decomposition on the high-dimensional semantic embedded vector to obtain a core semantic factor and an auxiliary factor of the high-dimensional semantic embedded vector; The low-rank approximate reconstruction module is used for carrying out low-rank approximate reconstruction on the high-dimensional semantic embedded vector based on the core semantic factor and the auxiliary factor to obtain a low-dimensional dense vector of the high-dimensional semantic embedded vector; The vector time sequence analysis module is used for performing time sequence behavior analysis on the low-dimensional dense vector to obtain the data access frequency, the associated compliance detection task criticality and the recent update state of the low-dimensional dense vector; the cache weight analysis module is used for carrying out multi-dimensional weight analysis on the low-dimensional dense vector based on the data access frequency, the associated compliance detection task criticality and the recent update state to obtain a cache weight score of the low-dimensional dense vector; And the intelligent cache allocation module is used for carrying out priority allocation on the low-dimensional dense vector based on the cache weight score and storing the low-dimensional dense vector into a cache level of a multi-layer cache architecture.

Description

Method and system for storing, compressing and optimizing large model data in enterprise safety production Technical Field The invention relates to the technical field of data storage, in particular to a compression optimization method and system for enterprise safety production large model data storage. Background The field of enterprise safety production relates to multi-mode data such as texts, images, videos and structuring, the volume of data is huge, the isomerism is strong, the depth semantic relevance mining of the existing data storage compression technology to the multi-mode data is insufficient, key semantic information is easy to lose in the compression process, meanwhile, the compression algorithm is poor in suitability to high-dimensional semantic data, the compression efficiency is low, and the light-weight requirement of a large model on data storage is difficult to meet. In addition, the existing cache allocation mechanism lacks comprehensive consideration of data access characteristics, task association importance and update states, and the cache resource allocation is unbalanced, so that the call response delay of high-frequency access and high-criticality data is higher. The existing low-rank decomposition type compression method is not fully combined with the structural characteristics of multi-mode data, the data representation capability after dimension reduction and reconstruction is weakened, core semantic information related to safety production cannot be accurately reserved, meanwhile, the time sequence behavior analysis dimension is single, so that cache weight assessment lacks comprehensiveness and accuracy, and dynamic optimization of data storage is difficult to achieve. Therefore, on the basis of guaranteeing the semantic integrity of data, the method improves the storage compression efficiency and cache resource utilization rationality of large-model data in enterprise safety production, and becomes a problem to be solved urgently. Disclosure of Invention The invention provides a method and a system for compressing and optimizing enterprise safety production large model data storage, which are used for solving the problems in the background technology. In order to achieve the above object, the present invention provides a method for compressing and optimizing data storage of large model for enterprise safety production, comprising: s1, performing deep semantic coding on multi-mode original data in the field of enterprise safety production to obtain a high-dimensional semantic embedded vector of the multi-mode original data; s2, carrying out joint low-rank tensor decomposition on the high-dimensional semantic embedded vector to obtain a core semantic factor and an auxiliary factor of the high-dimensional semantic embedded vector; s3, based on the core semantic factors and the auxiliary factors, performing low-rank approximate reconstruction on the high-dimensional semantic embedded vectors to obtain low-dimensional dense vectors of the high-dimensional semantic embedded vectors; s4, performing time sequence behavior analysis on the low-dimensional dense vector to obtain the data access frequency, the related compliance detection task criticality and the recent update state of the low-dimensional dense vector; S5, carrying out multidimensional weight analysis on the low-dimensional dense vector based on the data access frequency, the associated compliance detection task criticality and the recent update state to obtain a cache weight score of the low-dimensional dense vector; and S6, based on the buffer weight scores, carrying out priority allocation on the low-dimensional dense vectors, and storing the low-dimensional dense vectors into a buffer level of a multi-layer buffer architecture. In a preferred embodiment, the deep semantic encoding of the multi-modal raw data in the enterprise security production field to obtain a high-dimensional semantic embedded vector of the multi-modal raw data includes: Collecting text, images, videos and structured data in the enterprise safety production field to obtain multi-mode original data in the enterprise safety production field; Carrying out heterogeneous data normalization on the multi-mode original data to obtain standardized data of the multi-mode original data; carrying out modal feature extraction on the standardized data to obtain intermediate semantic features of the standardized data; and projecting the intermediate semantic features to a high-dimensional vector space to obtain a high-dimensional semantic embedded vector of the multi-mode original data. In a preferred embodiment, the performing a joint low-rank tensor decomposition on the high-dimensional semantic embedded vector to obtain a core semantic factor and an auxiliary factor of the high-dimensional semantic embedded vector includes: According to the data sample, the mode type and the embedding dimension of the high-dimensional semantic embedding vector, carrying ou