EP-3776378-B1 - ELECTRONIC APPARATUS AND CONTROLLING METHOD THEREOF

EP3776378B1EP 3776378 B1EP3776378 B1EP 3776378B1EP-3776378-B1

Inventors

USHAKOV, YURY

Dates

Publication Date: 20260506
Application Date: 20190607

Claims (13)

An electronic apparatus, comprising: a microphone; a storage configured to store a liquid-state machine (LSM) model and a recurrent neural networks (RNN) model; and a processor configured to: input a feature data (310) acquired from an input data for machine learning algorithm to the LSM model for performing a data preprocessing operation, wherein the input data is a speech data acquired through the microphone; process the input feature data (310) using the LSM model; input an output value output by the LSM model to the RNN model; process the output value output by the LSM model using the RNN model; and identify whether a preset object is included in the input data based on an output value output by the RNN model, wherein the preset object is a wake-up word, wherein the RNN model is trained by a sample data related to the preset object to provide an output value indicating whether the preset object is included in input data, wherein the LSM model includes a plurality of interlinked neurons (320), wherein a weight applied to a link (330) between the plurality of interlinked neurons (320) is identified based on a spike at which a neuron value is greater than or equal to a preset threshold in a preset unit time, wherein the feature data (310) is data after a feature extraction operation is performed in the acquired input data, and wherein the LSM model during training, using the feature data (310) for which the feature extraction operation is performed, based on a number of the spikes being greater than a target number during the preset unit time in an arbitrary neuron (320) from among the plurality of interlinked neurons, reduces a weight corresponding to a link (330) of the neuron (320), and based on a number of the spikes being less than the target number during the preset unit time period in the arbitrary neuron, increases a weight corresponding to a link (330) of the neuron (320).
The electronic apparatus as claimed in claim 1, wherein a weight of a link (330) between an arbitrary transmitting neuron (320) and a receiving neuron (320) corresponding to the transmitting neuron (330) from among the plurality of interlinked neurons is acquired based on a first number of spikes at which a neuron value of the transmitting neuron is greater than or equal to the preset threshold in the preset unit time and a second number of spikes at which a neuron value of the receiving neuron is greater than or equal to the preset threshold in the preset unit time.
The electronic apparatus as claimed in claim 2, wherein a weight of the link (330) in a current unit time is acquired by adding a change amount to a weight of the link in a previous unit time, and wherein the change amount is acquired based on a value calculated by a target number of the spikes, the first number and the second number.
The electronic apparatus as claimed in claim 3, wherein the LSM model sets an initial target number of the spikes to a preset minimum value, and increases the set target number by a preset number by the preset unit time.
The electronic apparatus as claimed in claim 4, wherein the LSM model acquires an information entropy value of the transmitting neuron (320) by the preset unit time based on a difference of occurrence time between spikes of the transmitting neuron (320), based on an information entropy value of each of the plurality of interlinked neurons (320) being acquired, acquires a sum of the acquired entropy values, and sets a target number set in a time period where the sum reaches a maximum value as a final target number.
The electronic apparatus as claimed in claim 3, wherein the LSM model, based on the second number (320) being greater than the target number, sets the change amount to be a negative number, based on the second number being less than the target number, sets the change amount to be a positive number, and based on the second number being equal to the target number, sets the change amount to be 0.
The electronic apparatus as claimed in claim 6, wherein the weight is identified based on the mathematical formula shown below: δw ij = − α w ij n i δn j where wij is a weight corresponding to a link (330) from an i neuron (320) which is a transmitting neuron (320) to a j neuron which is a receiving neuron, δwij is a change amount of the weight, α is a preset constant, ni = Ni / NT (where Ni is the number of spikes of the i neuron in a preset unit time, and NT is a target number of spikes), and nj = Nj / NT (where Nj is the number of spikes of the j neuron in a preset unit time, and NT is a target number of spikes), and δnj=nj-1.
The electronic apparatus as claimed in claim 1, wherein the feature data (310) is at least one of a Fourier transform coefficients or a Mel-frequency cepstral coefficients (MFCC), and wherein the processor is configured to input at least one of the Fourier transform coefficients or the Mel-frequency cepstral coefficients (MFCC) to the LSM model.
The electronic apparatus as claimed in claim 1, wherein the LSM model converts the feature data (310) changing over time to a spatio-temporal pattern based on an activity of the plurality of interlinked neurons, and outputs the converted spatio-temporal pattern.
A controlling method of an electronic apparatus for storing a liquid-state machine (LSM) model and a recurrent neural networks (RNN) model, the controlling method comprises: acquiring a feature data (310) from an input data for machine learning algorithm, wherein the input data is a speech data; inputting the acquired feature data (310) to the LSM model for performing a data preprocessing operation; processing the input feature data (310) using the LSM model; inputting an output value output by the LSM model to the RNN model; processing the output value output by the LSM model using the RNN model; and identifying whether a preset object is included in the input data based on an output value output by the RNN model, wherein the preset object is a wake-up word, wherein the RNN model is trained by a sample data related to the preset object to provide an output value indicating whether the preset object is included in input data, wherein the LSM model includes a plurality of interlinked neurons (320), wherein a weight applied to a link (330) between the plurality of interlinked neurons (320) is identified based on a spike at which a neuron value is greater than or equal to a preset threshold by in a preset unit time, wherein the feature data (310) is data after a feature extraction operation is performed in the acquired input data, and wherein the LSM model during training, using the feature data (310) for which the feature extraction operation is performed, based on a number of the spikes being greater than a target number during the preset unit time in an arbitrary neuron (320) from among the plurality of interlinked neurons, reduces a weight corresponding to a link (330) of the neuron, and based on a number of the spikes being less than the target number during the preset unit time period in the arbitrary neuron, increases a weight corresponding to a link of the neuron (320).
The controlling method as claimed in claim 11, wherein the LSM model, based on a number of the spikes being greater than a target number during the preset unit time in an arbitrary neuron (320) from among the plurality of interlinked neurons, reduces a weight corresponding to a link (330) of the neuron, and based on a number of the spikes being less than the target number during the preset unit time period in the arbitrary neuron, increases a weight corresponding to a link of the neuron (320).
The controlling method as claimed in claim 11, wherein a weight of a link (330) between an arbitrary transmitting neuron (320) and a receiving neuron (320) corresponding to the transmitting neuron (320) from among the plurality of interlinked neurons is acquired based on a first number of spikes at which a neuron value of the transmitting neuron (320) is greater than or equal to the preset threshold in the preset unit time and a second number of spikes at which a neuron value of the receiving neuron is greater than or equal to the preset threshold in the preset unit time.
A non-transitory computer readable medium configured to store computer instructions that, when executed by a processor of an electronic apparatuses in which a liquid-state machine (LSM) model and a recurrent neural networks (RNN) model are stored, causes the electronic apparatus to perform an operation, the operation comprising: acquiring a feature data (310) from an input data for machine learning algorithm, wherein the input data is a speech; inputting the acquired feature data (310) to the LSM model for performing a data preprocessing operation; processing the input feature data (310) using the LSM model; inputting an output value output by the LSM model to the RNN model; and processing the output value output by the LSM model using the RNN model; identifying whether a preset object is included in the input data based on an output value by the RNN model, wherein the preset object is a wake-up word, wherein the RNN model is trained by a sample data related to the preset object to provide an output value indicating whether the preset object is included in input data, wherein the LSM model includes a plurality of interlinked neurons (320), wherein a weight applied to a link between the plurality of interlinked neurons is identified based on a spike at which a neuron value is greater than or equal to a preset threshold in a preset unit time, wherein the feature data (310) is data after a feature extraction operation is performed in the acquired input data, and wherein the LSM model during training, using the feature data (310) for which the feature extraction operation is performed, based on a number of the spikes being greater than a target number during the preset unit time in an arbitrary neuron (320 from among the plurality of interlinked neurons, reduces a weight corresponding to a link (330) of the neuron (320), and based on a number of the spikes being less than the target number during the preset unit time period in the arbitrary neuron, increases a weight corresponding to a link of the neuron (320).

Description

Technical Field The disclosure relates to an artificial intelligence (AI) system simulating functions such as recognition, determination, etc. of a human brain by utilizing a machine learning algorithm such as deep learning and the like, an electronic apparatus for performing an application thereof, and a controlling method thereof. Background Art An artificial intelligence (AI) system is a computer system realizing intelligence of a human level, which is a system in which a machine learns and performs determination on its own and a recognition rate is improved as the machine is used. In recent years, an artificial intelligence (AI) system realizing intelligence of a human level has been used in various fields. An artificial intelligence (AI) system is a system in which a machine learns and performs determination on its own, unlike the existing rule-based smart system. In the AI system, a recognition rate is improved and user preferences are more accurately understood as it is used more, and thus the existing rule-based smart system has been eventually replaced with an artificial intelligence system based on deep learning. Artificial intelligence technology includes machine learning (for example, deep learning), and element technology utilizing machine learning. Machine learning is an algorithm technology that classifies and learns features of input data on its own. Element technology is a technology that simulates functions such as recognition, determination, etc. of a human brain by utilizing a machine learning algorithm such as deep learning and the like, which may include technical fields such as linguistic understanding, visual understanding, inference/prediction, knowledge expression, motion control and the like. Various fields to which the artificial intelligence technology is applicable are shown below. Linguistic understanding is a technology of recognizing languages and characters of human, and applying and processing the recognized human languages and characters, which may include natural language processing, machine translation, dialogue system, question and answer, voice recognition and synthesis, etc. Visual understanding is a technology of recognizing and processing an object just like a human vision, which may include object recognition, object tracking, image search, human recognition, scene understanding, space understanding, image improvement, etc. Inference and prediction is a technique of identifying information to perform logical inference and prediction, which may include knowledge/probability-based inference, optimization prediction, preference-based planning, recommendation, etc. Knowledge expression is a technique of performing automatic processing of human experience information as knowledge data, which may include knowledge construction (data generation/classification), knowledge management (data utilization), etc. Motion control is a technique of controlling autonomous driving of a vehicle and a robot motion, which may include a motion control (navigation, collision and driving), manipulation control (behavior control), etc. Zhang Yong et al: "A Digital Liquid State Machine With Biologically Inspired Learning and Its Application to Speech Recognition", IEEE TRANSACTIONS ON NEURAL NETWORK AND LEARNING SYSTEMS, vol. 26, no. 11 and Wang Qian et al: "D-LSM: Deep Liquid State Machine with-unsupervised recurrent reservoir tuning",2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), IEEE, 4 December 2016 (2016-12-04), pages 2652-2657 relate to the use of a liquid state machine in speech recognition. The above information is presented as background information only to assist with an understanding of the disclosure. No determination has been made, and no assertion is made, as to whether any of the above might be applicable as prior art with regard to the disclosure. Disclosure of Invention Technical Problem Meanwhile, the performance of a machine learning algorithm may differ depending on a data input to the artificial intelligence system. In addition, the performance of the machine learning algorithm may differ depending on a situation in which data is input. In general, there may be situations where noise is heard or image data is broken. Such a situation may be regarded that noise is present. In a case that noise is present in an input data, the performance of the machine learning algorithm may be deteriorated. Accordingly, a data preprocessing operation of preprocessing input data is demanded. Here, a recognition rate of an artificial machine learning algorithm may differ depending on a data preprocessing process. Accordingly, a method for improving performance of a machine learning algorithm while minimizing a data processing is demanded. Solution to Problem Aspects of the disclosure are to address at least the above-mentioned problems and/or disadvantages and to provide at least the advantages described below. The invention is described in independent claims 1, 10 and