US-12619559-B2 - Smart storage devices

US12619559B2US 12619559 B2US12619559 B2US 12619559B2US-12619559-B2

Abstract

There is provided a smart storage device. The smart storage device comprises an accelerator which is connected to a host device through a smart interface, and includes a first model distributed from the host device, and a non-volatile memory which includes learning data used for learning the first model, wherein the accelerator learns the first model by using the learning data on the basis of a first weight and a first bias that are set in advance, provides the host device with a first output value which is output by inputting the learning data into the first model, and learns the first model by using the learning data, on the basis of a second weight and a second bias calculated from the host device with reference to the first output value.

Inventors

Soo-Young Ji

Assignees

SAMSUNG ELECTRONICS CO., LTD.

Dates

Publication Date: 20260505
Application Date: 20240702
Priority Date: 20240111

Claims (20)

1 . A smart storage device comprising: an accelerator connected to a host device through a smart interface, wherein the accelerator stores a first model received from the host device; and a non-volatile memory storing training data used for training the first model, wherein the accelerator is configured to: train the first model using the training data based on a first weight and a first bias, provide the host device with a first output value, wherein the first output value is based on inputting the training data into the first model, receive, from the host device, a second weight and a second bias, wherein the second weight and the second bias are calculated by the host device based on the first output value, and train the first model using the training data based on the second weight and the second bias.
2 . The smart storage device of claim 1 , wherein the smart interface is a Compute eXpress Link (CXL) interface, and the smart storage device is a CXL device.
3 . The smart storage device of claim 2 , wherein the accelerator and the smart interface are connected using a CXL.cache protocol and a CXL.mem protocol, and wherein the accelerator is configured to receive the first weight, the second weight, the first bias, and the second bias through the CXL.cache protocol, and to provide the first output value to the host device through the CXL.mem protocol.
4 . The smart storage device of claim 1 , wherein the accelerator is configured to: receive a second model from the host device; train the second model using the training data based on a third weight and a third bias; provide the host device with a second output value, wherein the second output value is based on inputting the training data into the second model; receive, from the host device, a fourth weight and a fourth bias, wherein the fourth weight and the fourth bias are calculated by the host device based on the second output value; and train the second model using the training data based on the fourth weight and the fourth bias.
5 . The smart storage device of claim 1 , wherein the first model is a generative model, and wherein the first output value includes an arithmetic operation accuracy of the first model.
6 . A computing system comprising: a smart interface connected to a host device; a first smart storage device including a first accelerator and a first non-volatile memory; and a second smart storage device including a second accelerator and a second non-volatile memory, wherein the first accelerator and the second accelerator are each connected to the smart interface and each store a first model received from the host device, wherein the first non-volatile memory and the second non-volatile memory respectively store first training data and second training data used for training the first model, and wherein the first accelerator and the second accelerator are configured to: train the first model using the first training data and the second training data, respectively, based on a first weight and a first bias, respectively provide the host device with a first output value, wherein the first output value is based on inputting the first training data into the first model, and a second output value, wherein the second output value is based on inputting the second training data into the first model, and train the first model using the first training data and the second training data, respectively, based on a second weight and a second bias calculated by the host device based on the first output value and the second output value and received at the first smart storage device and the second smart storage device from the host device.
7 . The computing system of claim 6 , wherein the smart interface is a Compute eXpress Link (CXL) interface, and the first smart storage device and the second smart storage device are CXL devices.
8 . The computing system of claim 7 , wherein the first accelerator and the second accelerator are connected to the smart interface are connected using a CXL.cache protocol and CXL.mem protocol, and wherein the first accelerator and the second accelerator are configured to: receive the first weight, the second weight, the first bias, and the second bias from the host device through the CXL.cache protocol, and provide the first output value and the second output value, respectively, to the host device through the CXL.mem protocol.
9 . The computing system of claim 6 , wherein the second weight is an average of a third weight calculated based on the first output value and a fourth weight calculated based on the second output value, and wherein the second bias is an average of a third bias calculated based on the first output value and a fourth bias calculated based on the second output value.
10 . The computing system of claim 6 , wherein the first model is a generative model, and wherein the first output value and the second output value include an arithmetic operation accuracy of the first model.
11 . The computing system of claim 6 , wherein the first accelerator and the second accelerator are configured to: receive a second model from the host device, train the second model using the first training data and the second training data, respectively, based on a third weight and a third bias, respectively provide the host device with a third output value, wherein the third output value is based on inputting the first training data into the second model, and a fourth output value, wherein the fourth output value is based on inputting the second training data into the second model, and train the second model using the first training data and the second training data, respectively, based on a fourth weight and a fourth bias newly calculated by the host device based on the third output value and the fourth output value and received at the first smart storage device and the second smart storage device from the host device.
12 . The computing system of claim 6 , wherein the computing system includes a third smart storage device including a third accelerator and a third non-volatile memory, wherein the third accelerator is connected to the smart interface and stores a second model received from the host device, wherein the third non-volatile memory stores third training data used for training the second model, and wherein the third accelerator is configured to: train the second model using third training data based on a third weight and a third bias, provides the host device with a third output value, wherein the third output value is based on inputting the third training data into the second model, and train the second model using the third training data based on a fourth weight and a fourth bias calculated by the host device based on the third output value and received at the third smart storage device from the host device.
13 . A method of operation of a computing system, wherein the computing system comprises a host device, a smart interface, and a first smart storage device and a second smart storage device connected to the host device through the smart interface, the method comprising: distributing a first model from the host device to the first smart storage device and the second smart storage device; initializing a first weight and a first bias at the host device; training the first model using first training data based on the first weight and the first bias at the first smart storage device; training the first model using the second training data based on the first weight and the first bias at the second smart storage device; providing the host device with a first output value, wherein the first output value is based on inputting the first training data into the first model at the first smart storage device; providing the host device with a second output value, wherein the second output value is based on inputting the second training data into the first model at the second smart storage device; calculating, at the host device, a second weight and a second bias based on the first output value and the second output value; training the first model using the first training data based on the second weight and the second bias at the first smart storage device; and training the first model using the second training data based on the second weight and the second bias at the second smart storage device.
14 . The method of claim 13 , wherein the first smart storage device and the second smart storage device include a first accelerator and a second accelerator, respectively, and wherein distributing the first model to the first smart storage device and the second smart storage device comprises distributing the first model to the first accelerator and the second accelerator.
15 . The method of claim 13 , wherein the smart interface is a Compute eXpress Link (CXL) interface, and the first smart storage device and the second smart storage device are CXL devices.
16 . The method of claim 15 , wherein the first smart storage device and the second smart storage device are connected to the smart interface using a CXL.cache protocol and CXL.mem protocol, wherein training the first model using the first training data based on the first weight and the first bias comprises receiving the first weight and the first bias at the first smart storage device through the CXL.cache protocol, wherein training the first model using the second training data based on the first weight and the first bias comprises receiving the first weight and the first bias at the second smart storage device through the CXL.cache protocol, wherein training the first model using the first training data based on the second weight and the second bias comprises receiving the second weight and the second bias at the first smart storage device through the CXL.cache protocol, wherein training the first model using the second training data based on the second weight and the second bias comprises receiving the second weight and the second bias at the second smart storage device through the CXL.cache protocol, wherein providing the first output value to the host device comprises providing the first output value from the first smart storage device to the host device through the CXL.mem protocol, and wherein providing the second output value to the host device includes providing the second output value from the second smart storage device to the host device through the CXL.mem protocol.
17 . The method of claim 13 , wherein calculating the second weight and the second bias comprises: calculating a third weight and a third bias based on the first output value; calculating a fourth weight and a fourth bias based on the second output value; and calculating an average of the third weight and the fourth weight as the second weight, and calculating an average of the third bias and the fourth bias as the second bias.
18 . The method of claim 13 , wherein the first model is a generative model, and the first output value and the second output value include an arithmetic operation accuracy of the first model.
19 . The method of claim 13 , further comprising: replacing the first model with a second model at the host device; initializing a third weight and a third bias at the host device; training the second model using the first training data based on the third weight and the third bias at the first smart storage device; training the second model using the second training data based on the third weight and the third bias at the second smart storage device; providing the host device with a third output value, wherein the third output value is based on inputting the first training data into the second model at the first smart storage device; providing the host device with a fourth output value, wherein the fourth output value is based on inputting the second training data into the second model at the second smart storage device; calculating, at the host device, a fourth weight and a fourth bias based on the third output value and the fourth output value; training the second model using the first training data based on the fourth weight and the fourth bias at the first smart storage device; and training the second model using the second training data based on the fourth weight and the fourth bias at the second smart storage device.
20 . The method of claim 13 , wherein the computing system includes a third smart storage device connected to the host device through the smart interface, the third smart storage device storing third training data, and wherein the method further comprises: distributing a second model from the host device to the third smart storage device; initializing a third weight and a third bias at the host device; training the second model using the third training data based on the third weight and the third bias at the third smart storage device; providing the host device with a third output value, wherein the third output value is based on inputting the third training data into the second model at the third smart storage device; calculating, at the host device, a fourth weight and a fourth bias based on the third output value; and training the second model using the third training data based on the fourth weight and the fourth bias at the third smart storage device.

Description

CROSS-REFERENCE TO RELATED APPLICATION This application claims the benefit of the filing date of Korean Patent Application No. 10-2024-0004622, filed on Jan. 11, 2024, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein by reference in its entirety. BACKGROUND With advances in technologies such as artificial intelligence (AI), big data, and edge computing, there are needs for devices intended to process larger amounts of data faster. For example, high-bandwidth applications that perform complex arithmetic operations may utilize faster data processing and more efficient memory access. However, connections between host devices and semiconductor devices including a memory may have relatively low bandwidth and long latency, and/or may exhibit problems in memory sharing and/or coherency. Additionally, as generative AI models become larger, significant resources and time are required for learning the artificial intelligence models. When learning the artificial intelligence model, a considerable amount of learning data is repeatedly moved between the storage device and the host device, resulting in a problem of an increase in input/output traffic. SUMMARY Some aspects of the present disclosure provide methods for preventing repeated movement of data between a host device and a smart storage device, by learning an artificial intelligence model in the smart storage device through a cache coherency between the host device and the smart storage device. According to some implementations of the present disclosure, there is provided a smart storage device. The smart storage device comprises an accelerator which is connected to a host device through a smart interface, and includes a first model distributed from the host device, and a non-volatile memory which includes learning data used for learning the first model, wherein the accelerator learns the first model by using the learning data on the basis of a first weight and a first bias that are set in advance, provides the host device with a first output value which is output by inputting the learning data into the first model, and learns the first model by using the learning data, on the basis of a second weight and a second bias calculated from the host device with reference to the first output value. According to some implementations of the present disclosure, there is provided a computing system. The computing system comprises a smart interface connected to a host device, and a first smart storage device including a first accelerator and a first non-volatile memory, and a second smart storage device including a second accelerator and a second non-volatile memory, wherein the first accelerator and the second accelerator are each connected to the smart interface, and include a first model distributed from the host device, the first non-volatile memory and the second non-volatile memory each include first learning data and second learning data used for learning the first model, wherein the first accelerator and the second accelerator respectively learn the first model by using the first learning data and the second learning data, on the basis of a first weight and a first bias that are set in advance, respectively provide the host device with a first output value which is output by inputting the first learning data into the first model, and a second output value which is output by inputting the second learning data into the first model, and respectively learn the first model by using the first learning data and the second learning data, on the basis of a second weight and a second bias newly calculated from the host device with reference to the first output value and the second output value. According to some implementations of the present disclosure, there is provided a method for operating a computing system which includes a host device, a smart interface, and a first smart storage device and a second smart storage device connected to the host device through the smart interface. The method comprises distributing a first model to the first smart storage device and the second smart storage device, in the host device, initializing a first weight and a first bias, in the host device, learning the first model by using first learning data on the basis of the first weight and the first bias, in the first smart storage device, learning the first model by using the second learning data on the basis of the first weight and the first bias, in the second smart storage device, providing the host device with a first output value which is output by inputting the first learning data into the first model, in the first smart storage device, providing the host device with a second output value which is output by inputting the second learning data into the first model, in the second smart storage device, newly calculating a second weight and a second bias with reference to the first output value and the second output value, in the host device, learni