KR-20260062472-A - AI sign language model-based sign language recognition system

KR20260062472AKR 20260062472 AKR20260062472 AKR 20260062472AKR-20260062472-A

Abstract

The present invention relates to a sign language recognition system, comprising a shooting device for capturing images, a display device for visually displaying data, and a control device that, upon receiving image data captured by the shooting device, detects an object in the input image data, determines whether a sign language action is performed for the detected object data through an AI sign language model, which is a sign language recognition model using artificial intelligence technology, distinguishes between hand gesture areas and non-hand gesture areas for the sign language action, recognizes the meaning of the sign language by considering both the hand gesture areas and non-hand gesture areas, and displays the recognized sign language through the display device. According to the present invention, by providing a sign language recognition system based on an AI sign language model, it is possible to include non-manual areas that account for more than 55% of the meaning transmission, thereby significantly improving the accuracy of overall meaning comprehension in sign language recognition.

Inventors

박형석
박기항

Assignees

칼스케이드 주식회사

Dates

Publication Date: 20260507
Application Date: 20241029

Claims (5)

A shooting device for capturing images; A display device for visually displaying data; and A control device that, upon receiving video data captured by the above-mentioned capturing device, detects an object in the input video data, determines whether the detected object data is a sign language movement through an AI sign language model, which is a sign language recognition model using artificial intelligence technology, distinguishes between hand and non-hand areas for the sign language movement, recognizes the meaning of the sign language by considering both hand and non-hand areas, and displays the recognized sign language through the above-mentioned display device. A sign language recognition system including
In claim 1, A sign language recognition system characterized by the above-described control device learning hand gesture areas and non-hand gesture areas through the above-described AI sign language model.
In claim 2, The above control device is a sign language recognition system characterized by recognizing emotions by analyzing non-manual areas through the AI sign language model.
In claim 3, The above-described control device is a sign language recognition system characterized by extracting feature points from a face image and using the extracted feature points to recognize emotions when analyzing a non-manual region.
In claim 2, A sign language recognition system characterized by the above AI sign language model being a modified LSTM model that adds peephole connections and zoneout to a basic LSTM (Long Short-Term Memory) model.

Description

AI sign language model-based sign language recognition system The present invention relates to sign language recognition technology, and more specifically, to a technology for recognizing and interpreting sign language by considering both hand and non-hand areas using artificial intelligence (AI) technology. Sign language is a visual language that conveys meaning through physical movements, including hand movements, instead of speech, and with the advancement of Artificial Intelligence (AI) technology, sign language recognition technology is being developed to automatically convert the sign language of the hearing impaired into speech and subtitles. Sign language is a primary means of communication for the hearing impaired, conveying meaning through various elements such as hand and arm movements, as well as facial expressions and body posture. However, existing sign language recognition technology has the following limitations. Most existing sign language recognition systems focus primarily on the hand and arm movements. As a result, information regarding non-hand movements (facial expressions, gaze, mouth shape, head movements, body movements, etc.), which play a crucial role in conveying overall meaning, is omitted, leading to limitations in accurate meaning transmission. Research shows that in actual sign language conversations, more than 55% is conveyed through the non-manual area. Therefore, systems that recognize only the manual area are at risk of missing more than half of the total meaning. In sign language, facial expressions and body movements play a crucial role in conveying the tone, emphasis, and interrogative forms of sentences. Without recognizing these non-manual elements, it is difficult to grasp the precise intent of a sentence. Currently, to extract surveillance or search scenes based on video footage such as CCTV, it is common practice to recognize dangerous or unusual situations by training machine learning techniques to track objects or patterns of tracked behavior, based on object detection or recognition technologies. Furthermore, while existing Recurrent Neural Network (RNN) models, particularly Long Short-Term Memory (LSTM) models, have been widely used for time-series data processing, they have shown limitations in processing data with complex spatiotemporal patterns, such as sign language. FIG. 1 is a conceptual diagram schematically illustrating an AI sign language model-based sign language recognition system according to one embodiment of the present invention. FIG. 2 is a face image with feature points marked according to one embodiment of the present invention. Figure 3 illustrates a basic LSTM structure. Figure 4 illustrates an LSTM structure with added peephole connections. Figure 5 illustrates an LSTM structure with added zone-out. The present invention is capable of various modifications and may have various embodiments, and specific embodiments are illustrated in the drawings and described in detail. However, this is not intended to limit the present invention to specific embodiments, and it should be understood that it includes all modifications, equivalents, and substitutions that fall within the spirit and scope of the invention. The terms used in this application are used merely to describe specific embodiments and are not intended to limit the invention. The singular expression includes the plural expression unless the context clearly indicates otherwise. In this application, terms such as "comprising" or "having" are intended to indicate the presence of the features, numbers, steps, actions, components, parts, or combinations thereof described in the specification, and should be understood as not precluding the existence or addition of one or more other features, numbers, steps, actions, components, parts, or combinations thereof. Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as generally understood by those skilled in the art to which the present invention pertains. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with their meaning in the context of the relevant technology, and should not be interpreted in an ideal or overly formal sense unless explicitly defined in this application. Furthermore, in the description referring to the attached drawings, identical components are assigned the same reference numeral regardless of drawing symbols, and redundant descriptions thereof are omitted. In describing the present invention, if it is determined that a detailed description of related prior art could unnecessarily obscure the essence of the present invention, such detailed description is omitted. FIG. 1 is a conceptual diagram schematically illustrating an AI sign language model-based sign language recognition system according to one embodiment of the present invention. Referring to FIG. 1, an AI sign language model-based sign language rec