Search

KR-20260067211-A - SIGN LANGUAGE TRANSLATION MODEL LEARNING METHOD AND SIGN LANGUAGE TRANSLATION METHOD

KR20260067211AKR 20260067211 AKR20260067211 AKR 20260067211AKR-20260067211-A

Abstract

A method for training a sign language translation model may include: a step in which a sign language image processing device acquires a sign language image and gloss data; a step in which the sign language image processing device reflects the gloss data onto the sign language image; a step in which the sign language image processing device constructs a gloss dictionary based on the gloss data; and a step in which the sign language image processing device trains a learning model based on the gloss dictionary and the sign language image reflecting the gloss data.

Inventors

  • 김하영
  • 김정은
  • 박지훈
  • 전형우

Assignees

  • 연세대학교 산학협력단

Dates

Publication Date
20260512
Application Date
20241105

Claims (13)

  1. A step in which a sign language image processing device acquires sign language images and gloss data; A step in which the sign language image processing device reflects the gloss data into the sign language image; The step of the sign language image processing device constructing a gloss dictionary based on the gloss data; and The above sign language image processing device includes the step of training a learning model based on the gloss dictionary and the sign language image reflecting the gloss data; The above learning model is a model that predicts gloss from sign language videos, Sign language translation model training method.
  2. In paragraph 1, Reflecting the gloss data in the sign language video includes expressing text corresponding to the gloss data in the sign language video. Sign language translation model training method.
  3. In paragraph 1, The above gloss dictionary includes a dictionary in which the result of embedding the gloss included in the above gloss data is recorded. Sign language translation model training method.
  4. In paragraph 1, The above learning model includes a model that associates text pixels within a sign language video reflecting the gloss data with sign language movements appearing in the sign language video to extract features that play a central role in identifying gloss within the sign language video. Sign language translation model training method.
  5. In paragraph 4, The above learning model includes a model that finds glosses corresponding to the extracted features in the gloss dictionary. Sign language translation model training method.
  6. In paragraph 1, The above learning model includes a model that divides a sign language video reflecting the gloss data into multiple patches, identifies the relationships between the multiple patches, and extracts a patch that plays a central role in identifying gloss based on the identified relationships between the patches. Sign language translation model training method.
  7. A storage device comprising an arithmetic unit and instructions that cause a sign language image processing device to perform operations when executed by said arithmetic unit, wherein The above operations are, Operation of acquiring sign language video and gloss data; The operation of reflecting the above gloss data into the above sign language video; The operation of constructing a gloss dictionary based on the above gloss data; and The operation of training a learning model based on the above gloss dictionary and the sign language video reflecting the above gloss data; The above learning model is a model that predicts gloss from sign language videos, Sign language image processing device.
  8. In Paragraph 7, Reflecting the gloss data in the sign language video includes expressing text corresponding to the gloss data in the sign language video. Sign language image processing device.
  9. In Paragraph 7, The above gloss dictionary includes a dictionary in which the result of embedding the gloss included in the above gloss data is recorded. Sign language image processing device.
  10. In Paragraph 7, The above learning model includes a model that associates text pixels within a sign language video reflecting the gloss data with sign language movements appearing in the sign language video to extract features that play a central role in identifying gloss within the sign language video. Sign language image processing device.
  11. In Paragraph 10, The above learning model includes a model that finds glosses corresponding to the extracted features in the gloss dictionary. Sign language image processing device.
  12. In Paragraph 7, The above learning model includes a model that divides a sign language video reflecting the gloss data into multiple patches, identifies the relationships between the multiple patches, and extracts a patch that plays a central role in identifying gloss based on the identified relationships between the patches. Sign language image processing device.
  13. A step in which a sign language image processing device acquires a sign language image; The above sign language image processing device predicts gloss from the sign language image using a learning model; and The above sign language image processing device includes the step of translating the predicted gloss; wherein The above learning model is a model trained through the sign language video translation model learning method described in claim 1, Sign language translation method.

Description

Sign Language Translation Model Learning Method and Sign Language Translation Method The technique described below is about how to train a sign language translation model. Sign language can refer to a language using hands. It can refer to a visual language called sign language. Sign language can be a language developed to enable the hearing impaired to communicate. Sign language can be a method that conveys meaning through hand movements, body language, facial expressions, and the like. Gloss is a method used when transcribing sign language into text. Unlike spoken language, sign language consists of visual elements such as hand movements, facial expressions, and gestures, making it difficult to accurately represent them in writing. Therefore, to express sign language in writing, it is necessary to select words corresponding to each sign language symbol, and this process can be referred to as Gloss. With the recent technological development of Artificial Neural Network (ANN)-based artificial intelligence models, technologies aiming to translate sign language using AI models are being developed. FIG. 1 is one of the embodiments in which a sign language image processing device (100) performs a sign language translation model learning method and a sign language translation method. FIG. 2 is one of the embodiments (200) of a sign language translation model learning method. Figure 3 is one example of a sign language translation model learning method. FIG. 4 is one of the embodiments (300) of a sign language translation method. Figure 5 is one example of a sign language translation method. FIG. 6 is one of the examples of translating predicted gloss. FIG. 7 is a configuration of one of the embodiments of a sign language image processing device (400). The technology described below may be subject to various modifications and may have various embodiments. Specific embodiments of the technology described below may be described in the drawings of the specification. However, this is for the purpose of explaining the technology described below and is not intended to limit the technology described below to specific embodiments. Accordingly, it should be understood that all modifications, equivalents, and substitutions that fall within the spirit and scope of the technology described below are included in the technology described below. In the terms used below, singular expressions should be understood to include plural expressions unless the context clearly indicates otherwise, and terms such as "includes" should be understood to mean that the described features, number, steps, actions, components, parts, or combinations thereof exist, and not to exclude the existence or addition of one or more other features, numbers, steps, actions, components, parts, or combinations thereof. Before providing a detailed description of the drawings, it is to clarify that the classification of components in this specification is merely based on the primary function each component is responsible for. That is, two or more components described below may be combined into a single component, or a single component may be divided into two or more components based on more subdivided functions. Furthermore, each component described below may additionally perform some or all of the functions of other components in addition to its own primary function, and it is obvious that some of the primary functions of each component may be exclusively performed by other components. Furthermore, in performing the method or operation method, each process constituting the method may occur differently from the specified order unless a specific order is clearly indicated in the context. That is, each process may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order. FIG. 1 is one of the embodiments in which a sign language image processing device (100) performs a sign language translation model learning method and a sign language translation method. The sign language image processing device (100) can be physically implemented in various forms. For example, the sign language image processing device (100) can take the form of a PC, laptop, smart device, server, or a chipset dedicated to data processing. There may be at least one sign language image processing device (100). That is, the sign language translation model learning method may be performed by one sign language image processing device (100) or divided and performed by multiple sign language image processing devices. Alternatively, the sign language translation method may be performed by one sign language image processing device or divided and performed by multiple sign language image processing devices (100). The sign language image processing device (100) may be a device that performs a sign language translation model training method. The sign language image processing device (100) may acquire sign language images and gloss data. The