CN-115702427-B - Intelligent digital camera with deep learning accelerator and random access memory

CN115702427BCN 115702427 BCN115702427 BCN 115702427BCN-115702427-B

Abstract

Systems, apparatuses, and methods related to deep learning accelerators and memory are described. For example, a digital camera may be configured to execute instructions having matrix operands and configured with a housing, a lens, an image sensor positioned behind the lens to generate image data of a field of view of the digital camera, a random access memory to store instructions executable by the deep learning accelerator and to store a matrix of artificial neural networks, a transceiver, and a controller configured to generate a description of items or events in the field of view captured in the image data based on an output of the artificial neural networks receiving the image data as input, and to communicate it to a separate computer using the transceiver. The stand-alone computer may selectively request a portion of image data from the digital camera based on the described processing.

Inventors

P. Kali

Assignees

美光科技公司

Dates

Publication Date: 20260512
Application Date: 20210615
Priority Date: 20200619

Claims (20)

1. An integrated circuit device, comprising: an image sensor configured to generate image data of a field of view of the integrated circuit device; at least one processing unit configured to execute an instruction having a matrix operand; a random access memory configured to store first data representing weights of an artificial neural network and to store second data representing instructions executable by the at least one processing unit to perform matrix calculations of the artificial neural network using the first data representing the weights of the artificial neural network; a transceiver configured to communicate with a computer system, the computer system being separate from the integrated circuit device, and A controller coupled with the transceiver, the sensor, and the random access memory, wherein the controller is configured to write the image data to the random access memory as an input to the artificial neural network; Wherein the at least one processing unit is further configured to execute the instructions represented by the second data stored in the random access memory to generate an output of the artificial neural network based at least in part on the first data and the image data stored in the random access memory, and Wherein the controller is further configured to provide third data representing a description of an item or event captured in the image data based on the output of the artificial neural network, and to control the transceiver to provide the third data representing the description to the computer system.
2. The integrated circuit device of claim 1, wherein the output of the artificial neural network includes an identification, classification, or class of an object, person, or feature and a location and size of the object, person, or feature, and the description is based on the identification, classification, or class and on the location and size.
3. The integrated circuit device of claim 2, wherein the output of the artificial neural network includes an identification of an event associated with the item, person, or feature, and the description includes the identification of the event.
4. The integrated circuit device of claim 3, wherein the controller is further configured to control the transceiver to provide a representative image of the item, person, or feature extracted based on the output of the artificial neural network to the computer system.
5. The integrated circuit device according to claim 4, wherein the controller is configured to cause the transceiver to transmit the description to the computer system along with the representative image.
6. The integrated circuit device according to claim 4, wherein the controller is configured to cause the transceiver to transmit the representative image to the computer system in response to a request from the computer system for the description.
7. The integrated circuit device according to claim 4, wherein the controller is configured to selectively store image data for transmission to the computer system based on the output of the artificial neural network.
8. The integrated circuit device according to claim 7, further comprising: An integrated circuit die implementing a field programmable gate array FPGA or an application specific integrated circuit ASIC of a deep learning accelerator, the deep learning accelerator comprising the at least one processing unit and a control unit configured to load the instructions from the random access memory for execution.
9. The integrated circuit device according to claim 8, wherein the control unit includes the controller.
10. The integrated circuit device according to claim 8, further comprising: an integrated circuit package configured to enclose at least the integrated circuit die of an FPGA or ASIC and one or more integrated circuit dies of the random access memory, and Wherein circuitry in the image sensor is connected to the one or more integrated circuit dies of the random access memory using Through Silicon Vias (TSVs).
11. The integrated circuit device according to claim 10, wherein the at least one processing unit includes a matrix-matrix unit configured to operate on two matrix operands of an instruction; wherein the matrix-matrix unit comprises a plurality of matrix-vector units configured to operate in parallel; wherein each of the plurality of matrix-vector units comprises a plurality of vector-vector units configured to operate in parallel; wherein each of the plurality of vector-vector units includes a plurality of multiply-accumulate units configured to operate in parallel, an Wherein each of the plurality of multiply-accumulate units includes a neuromorphic memory configured to perform multiply-accumulate operations via analog circuitry.
12. The integrated circuit device according to claim 11, wherein the random access memory and the deep learning accelerator are formed on separate integrated circuit die and connected by through-silicon TSVs.
13. The integrated circuit device of claim 12, wherein the transceiver is configured to communicate according to a communication protocol of a wireless personal area network or a wireless local area network.
14. A method implemented in a digital camera, the method comprising: Storing first data representing weights of an artificial neural network and second data storing instructions representing instructions in a random access memory of the digital camera, the instructions being executable by at least one processing unit of the digital camera to perform matrix calculations of the artificial neural network using the first data representing the weights; generating, by an image sensor of the digital camera, image data capturing a field of view of the digital camera; Storing the image data into the random access memory of the digital camera as input to the artificial neural network; Executing, by the at least one processing unit, the instructions represented by the second data stored in the random access memory of the digital camera to calculate an output from the artificial neural network based at least in part on the first data and the image data stored in the random access memory of the digital camera; generating, by the digital camera, third data representing a description of an item or event in the field of view in the image data based on the output of the artificial neural network, and The third data representing the description is transmitted to a computer system via a transceiver of the digital camera.
15. The method as recited in claim 14, further comprising: Determining, by the digital camera, whether to discard the image data based on a result of the processing of the description in the computer system.
16. The method as recited in claim 14, further comprising: Determining, by the digital camera, whether to transmit a portion of the image data based on a result of the processing of the description in the computer system.
17. The method as recited in claim 16, further comprising: the portion of the image data is extracted based on an identification of the item or event, wherein the output of the artificial neural network includes the identification of the item or event.
18. The method of claim 17, wherein the identification of the item or event includes a size and a location of the item.
19. A digital camera, comprising: A housing; A lens; An image sensor positioned behind the lens to generate image data capturing a field of view of the digital camera through the lens; a random access memory configured to store a model of an artificial neural network; a field programmable gate array FPGA or an application specific integrated circuit ASIC having: A memory interface for accessing the random access memory, and At least one processing unit configured to execute instructions having matrix operands to perform a calculation of the artificial neural network according to the model, and A transceiver configured to communicate with a computer system using a wired or wireless connection; Wherein the image sensor is configured to store the image data into the random access memory as input to the artificial neural network; Wherein the FPGA or ASIC is configured to perform the computation of the artificial neural network according to the model to convert the input into an output from the artificial neural network, and Wherein the digital camera is configured to generate and communicate to the computer system a description of an item or event in the field of view captured in the image data.
20. The digital camera of claim 19, wherein the random access memory comprises a non-volatile memory configured to store the model of the artificial neural network, the model comprises instructions executable by the FPGA or ASIC, and the at least one processing unit comprises a matrix-matrix unit configured to operate on two matrix operands of instructions.

Description

Intelligent digital camera with deep learning accelerator and random access memory RELATED APPLICATIONS U.S. patent application Ser. No. 16/906,224, filed on even 19/6/2020, and titled "smart digital camera with deep learning accelerator and random Access memory (INTELLIGENT DIGITAL CAMERA HAVING DEEP LEARNING ACCELERATOR AND RANDOM ACCESS MEMORY)", the entire disclosure of which is hereby incorporated by reference. Technical Field At least some embodiments disclosed herein relate to digital cameras, and more particularly, but not limited to smart digital cameras powered by integrated accelerators of Artificial Neural Networks (ANNs), such as ANNs configured through machine learning and/or deep learning. Background An Artificial Neural Network (ANN) uses a neural network to process inputs to the network and produce outputs from the network. For example, each neuron in the network receives a set of inputs. Some inputs to neurons may be outputs of some neurons in a network, and some inputs to neurons may be inputs provided to a neural network. The input/output relationship between neurons in the network represents the connectivity of neurons in the network. For example, each neuron may have a bias, an activation function, and a set of synaptic weights for its inputs, respectively. The activation function may take the form of a step function, a linear function, a logarithmic sigmoid (log-sigmoid) function, or the like. Different neurons in a network may have different activation functions. For example, each neuron may generate a weighted sum of its input and its bias, and then generate an output as a function of the weighted sum, which is calculated using the activation function of the neuron. The relationship between the input and output of an ANN is generally defined by an ANN model that contains data representing the connectivity of neurons in a network, as well as the bias, activation function, and synaptic weight of each neuron. Based on a given ANN model, the computing device may be configured to compute an output of the network from a given set of inputs to the network. For example, input to the ANN network may be generated based on camera input, and output from the ANN network may be identification of an item such as an event or an item. In general, an ANN may be trained using a supervised approach, in which parameters in the ANN are adjusted to minimize or reduce errors between known outputs associated with or generated by respective inputs and calculated outputs generated via application of the inputs to the ANN. Examples of supervised learning/training methods include reinforcement learning and learning with error correction. Alternatively or in combination, an unsupervised approach may be used to train an ANN in which the exact output produced by a given set of inputs is not known until the training is completed. ANNs may be trained to classify items into multiple categories, or to classify data points into clusters. Multiple training algorithms may be used for complex machine learning/training paradigms. Deep learning uses multiple layers of machine learning to progressively extract features from input data. For example, a lower layer may be configured to identify edges in an image, and a higher layer may be configured to identify items, such as faces, objects, events, etc., captured in the image based on using the edges detected by the lower layer. Deep learning may be implemented via an Artificial Neural Network (ANN), such as a deep neural network, a deep belief network, a recurrent neural network, and/or a convolutional neural network. Deep learning has been applied to many application fields such as computer vision, speech/audio recognition, natural language processing, machine translation, bioinformatics, drug design, medical image processing, games, and the like. Drawings Embodiments are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like reference numerals refer to similar elements. FIG. 1 shows an integrated circuit device with a configured deep learning accelerator and random access memory, according to one embodiment. Fig. 2 illustrates a processing unit configured to perform a matrix-matrix operation according to one embodiment. FIG. 3 shows a processing unit configured to perform matrix-vector operations according to one embodiment. FIG. 4 shows a processing unit configured to perform vector-vector operations according to one embodiment. FIG. 5 illustrates a deep learning accelerator and random access memory configured to autonomously apply to inputs of a trained artificial neural network, according to one embodiment. Fig. 6-8 illustrate a digital camera with a configured deep learning accelerator and random access memory, in accordance with some embodiments. Fig. 9 illustrates a method implemented in a digital camera, according to one embodiment. Detailed Description At least some embodiments disclosed herein provide