KR-102961649-B1 - COMPUTATIONAL STORAGE DEVICE FOR DEEP-LEARNING RECOMMENDATION SYSTEM AND METHOD OF OPERATING THE SAME
Abstract
The computational storage device includes non-volatile memory and a storage controller. The non-volatile memory stores multiple embedding tables used in a deep learning recommendation system. The storage controller stores multiple applications offloaded from a host device running the deep learning recommendation system, and supports the execution of the deep learning recommendation system by executing the multiple applications and performing multiple operations based on multiple embedding tables. The storage controller includes a machine learning engine that analyzes at least one of the multiple embedding tables and multiple applications to determine a management method for at least one of the multiple embedding tables and multiple applications.
Inventors
- 김민호
- 이위직
- 지수영
- 진상화
Assignees
- 삼성전자주식회사
Dates
- Publication Date
- 20260507
- Application Date
- 20210928
Claims (10)
- Non-volatile memory storing multiple embedding tables used in a deep-learning recommendation system (DLRS); and It includes a storage controller that controls the operation of the above non-volatile memory, stores a plurality of applications offloaded from a host device executing the deep learning recommendation system, executes the plurality of applications, and performs a plurality of operations based on the plurality of embedding tables to support the execution of the deep learning recommendation system. The above storage controller is, It includes a machine learning engine that analyzes at least one of the plurality of embedding tables and the plurality of applications to determine a management method for at least one of the plurality of embedding tables and the plurality of applications, and The above machine learning engine is, A mode setting module that receives operation mode information; A search module that receives a plurality of internal information from the above non-volatile memory; A characteristic analysis module that performs an analysis of the plurality of internal information based on the plurality of embedding tables; A clustering module that performs clustering based on the results of the above analysis and classifies at least some of the plurality of internal information into a first cluster and a second cluster; A classification module that receives a plurality of pattern information, selects a first reference value and a second reference value from the first cluster and the second cluster based on the result of the clustering and the plurality of pattern information, and classifies at least some of the plurality of internal information into a first reference group and a second reference group based on the first reference value and the second reference value; and It includes an estimation module that performs an estimation for analyzing the plurality of pattern information using the first reference group and the second reference group, and The above clustering module is a computational storage device that calculates the lowest similarity between clusters and selects an optimal clustering value.
- In Article 1, The machine learning engine analyzes the use of the plurality of embedding tables, and A computational storage device characterized by classifying the plurality of embedding tables based on the above analysis results into a first group of embedding tables with a usage frequency greater than a reference frequency and a second group of embedding tables with a usage frequency less than or equal to the reference frequency.
- In Article 1, It further includes a buffer memory for temporarily storing data stored in or intended to be stored in the above-mentioned non-volatile memory, and The machine learning engine analyzes the use of multiple vector data included in the multiple embedding tables, and A computational storage device characterized by detecting a first vector data among the plurality of vector data based on the above analysis results, and temporarily storing and using the first vector data in the buffer memory.
- In Article 1, It further includes reconfigurable hardware that stores and executes at least one of the plurality of applications mentioned above, and The machine learning engine analyzes the use of multiple operators related to the multiple applications, and A computational storage device characterized by detecting a first application among the plurality of applications based on the above analysis results, and storing and using the first application in the reconfigurable hardware.
- delete
- In Article 1, The clustering module classifies the remainder of the plurality of internal information into a third cluster, and The classification module selects a third reference value from the third cluster and classifies the remainder of the plurality of internal information into a third reference group based on the third reference value, and A computational storage device characterized in that the above-mentioned estimation module performs the estimation using the above-mentioned first, second, and third reference groups.
- In claim 1, the clustering module is, K-means clustering is performed to classify at least some of the plurality of internal information into the first cluster and the second cluster, and A computational storage device characterized by determining the optimal clustering value representing the optimal number of clusters using an elbow technique.
- In claim 7, the classification module is, The internal information having the largest K-means among the internal information included in the first cluster is selected as the first reference value, and the internal information having the largest K-means among the internal information included in the second cluster is selected as the second reference value. A computational storage device characterized by performing a support vector machine (SVM) on internal information included in the first cluster and internal information included in the second cluster based on the first reference value and the second reference value, and classifying at least some of the plurality of internal information into the first reference group and the second reference group.
- In claim 1, the plurality of internal informations are, A computational storage device characterized by including workload information related to the characteristics of the plurality of applications, input/output pattern information related to the characteristics of the host device accessing the computational storage device, and non-volatile memory information related to the characteristics of the non-volatile memory.
- A method of operation of a computational storage device including a non-volatile memory and a storage controller, The storage controller stores a plurality of applications offloaded from a host device running a deep-learning recommendation system (DLRS); A step in which the above non-volatile memory stores a plurality of embedding tables used in the deep learning recommendation system; The step of the storage controller executing the plurality of applications and performing a plurality of operations based on the plurality of embedding tables to support the execution of the deep learning recommendation system; and A machine learning engine included in the storage controller analyzes at least one of the plurality of embedding tables and the plurality of applications and determines a management method for at least one of the plurality of embedding tables and the plurality of applications, and The above machine learning engine is, A mode setting module that receives operation mode information; A search module that receives a plurality of internal information from the above non-volatile memory; A characteristic analysis module that performs an analysis of the plurality of internal information based on the plurality of embedding tables; A clustering module that performs clustering based on the results of the above analysis and classifies at least some of the plurality of internal information into a first cluster and a second cluster; A classification module that receives a plurality of pattern information, selects a first reference value and a second reference value from the first cluster and the second cluster based on the result of the clustering and the plurality of pattern information, and classifies at least some of the plurality of internal information into a first reference group and a second reference group based on the first reference value and the second reference value; and It includes an estimation module that performs an estimation for analyzing the plurality of pattern information using the first reference group and the second reference group, and The above clustering module is a method of operation of a computational storage device that calculates the lowest similarity between clusters and selects an optimal clustering value.
Description
Computational Storage Device for Deep-Learning Recommendation System and Method of Operating the Same The present invention relates to a semiconductor integrated circuit, and more specifically, to a computational storage device that supports a deep learning recommendation system and a method of operating the computational storage device. Recently, storage devices such as solid-state drives (SSDs) that utilize memory have become widely used. These storage devices have the advantages of excellent stability and durability due to the absence of mechanical moving parts, very fast information access speeds, and low power consumption. As electronic circuits are increasingly applied not only to electronic systems like laptops but also to various types of systems such as automobiles, aircraft, and drones, storage devices are also being used in a wide range of applications. In a system including a storage device and a host device, instructions (or programs) and data are stored in the storage device, and in order to perform data processing based on the instructions, the instructions and data must be transferred from the storage device to the host device. Accordingly, even if the processing speed of the host device increases, the data transfer speed between the host device and the storage device acts as an obstacle to performance improvement, which can limit the throughput of the entire system. To solve this problem, computational storage devices containing processor logic are being researched. Meanwhile, the use of user experience (UX)-based deep-learning recommendation systems (DLRS) has recently been increasing. FIG. 1 is a block diagram showing a computational storage device and a storage system including the same according to embodiments of the present invention. FIG. 2 is a diagram illustrating a deep learning recommendation system executed on a host device included in a storage system according to embodiments of the present invention. FIG. 3 is a block diagram showing an example of a computational storage device according to embodiments of the present invention. FIGS. 4, 5a, 5b, and 6 are drawings for explaining the operation of the computational storage device of FIG. 3. FIG. 7 is a block diagram showing another example of a computational storage device according to embodiments of the present invention. FIG. 8a is a block diagram showing an example of a non-volatile memory included in a computational storage device according to embodiments of the present invention. FIGS. 8b, 8c, and 8d are drawings showing examples in which a machine learning engine included in a computational storage device according to embodiments of the present invention is implemented in the form of a neural network. FIG. 9 is a block diagram showing an example of a machine learning engine included in a computational storage device according to embodiments of the present invention. Figure 10 is a diagram illustrating the operation of a feature analysis module included in the machine learning engine of Figure 9. Figure 11 is a block diagram showing an example of a clustering module, a classification module, and an estimation module included in the machine learning engine of Figure 9. FIGS. 12a and 12b are drawings for explaining the operation of the clustering module and the classification module of FIG. 11. FIG. 13 is a flowchart illustrating the operation method of a computational storage device according to embodiments of the present invention. FIGS. 14, 15, 16, and 17 are flowcharts showing examples of steps for determining the management method of FIG. 13. Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the attached drawings. Identical components in the drawings are denoted by the same reference numerals, and redundant descriptions of identical components are omitted. FIG. 1 is a block diagram showing a computational storage device and a storage system including the same according to embodiments of the present invention. Referring to FIG. 1, the storage system (100) includes a host device (200) and a computational storage device (300). The host device (200) controls the overall operation of the storage system (100). The host device (200) may include a host processor (210) and a host memory (220). The host processor (210) can control the operation of the host device (200). For example, the host processor (210) can run an operating system (OS). For example, the operating system may include a file system for file management and a device driver for controlling peripheral devices, including a computational storage device (300), at the operating system level. For example, the host processor (210) may include any processor, such as a CPU (Central Processing Unit). The host memory (220) can store instructions and data that are executed and processed by the host processor (210). For example, the host memory (220) may include volatile memory such as Dynamic Random Access Memory (DRAM)