JP-2026074706-A - Programs, devices, and methods for assisting users with mouth and/or tongue exercises.
Abstract
[Problem] To provide a program to assist users with mouth and/or tongue exercises. [Solution] When the program of the present invention is executed by the processor unit of the device, it causes the processor unit to at least receive the user's voice input, recognize at least one monosyllabic word contained in the user's voice input, and display content associated with the recognized at least one monosyllabic word on the device. In one embodiment, recognizing at least one monosyllabic word may include recognizing at least one monosyllabic word using a speech recognition model suitable for recognizing multiple different monosyllabic words spoken in succession in a short period of time and/or the same monosyllabic word spoken multiple times in succession in a short period of time. [Selection Diagram] Figure 1B
Inventors
- 十河 基文
- 西願 雅也
- 水篠 公範
Assignees
- 株式会社アイキャット
Dates
- Publication Date
- 20260507
- Application Date
- 20241021
Claims (10)
- A program for assisting a user with mouth and/or tongue exercises, wherein the program is executed by the device's processor unit, Receiving the voice input of the aforementioned user, Recognizing at least one monosyllabic word included in the voice input of the user, A program that causes the processor unit to at least display on the device the content associated with the recognized at least one monosyllabic word.
- The program according to claim 1, wherein the recognition of the at least one monosyllabic word includes recognizing the at least one monosyllabic word using a speech recognition model suitable for recognizing multiple different monosyllabic words spoken in succession in a short period of time and/or the same monosyllabic word spoken multiple times in succession in a short period of time.
- The program according to claim 2, wherein the training data for the speech recognition model is the correspondence between monosyllabic words and the speech data of those monosyllabic words.
- When the program is executed by the processor unit of the device, The program according to claim 1, further causing the processor unit to display an indicator on the device for adjusting a first distance between the user's face and the device.
- The program according to claim 4, wherein the indicator includes a display area for displaying the user as captured by the device's camera, a bar representing the difference between the first distance and a predetermined appropriate distance, a numerical value representing the difference between the first distance and the predetermined appropriate distance, or guide lines for guiding the position of the user's face as displayed on the device by being captured by the device's camera.
- The program according to claim 5, wherein the size of the display area is adjusted such that the distance between the user's face and the device is within a desired range when the user's face within the display area is of a desired size.
- When the program is executed by the processor unit of the device, To identify the volume of the user's voice input, Determining whether the volume of the user's voice input exceeds a first threshold, If it is determined that the volume of the user's voice input exceeds the first threshold, the processor unit is further instructed to increase the magnification of the user's face displayed in the display area, and/or When the program is executed by the processor unit of the device, Determining whether the volume of the user's voice input is below a second threshold, The program according to claim 6, further comprising causing the processor unit to reduce the magnification of the user's face displayed in the display area when it is determined that the volume of the user's voice input is below the second threshold.
- When the program is executed by the processor unit of the device, To identify the volume of the user's voice input, Determining whether the volume of the user's voice input exceeds a first threshold, If it is determined that the volume of the user's voice input exceeds the first threshold, the processor unit is further instructed to perform a process to cause the user to lower the volume of the user's voice input, and/or When the program is executed by the processor unit of the device, Determining whether the volume of the user's voice input is below a second threshold, The program according to claim 1, further comprising: causing the processor unit to perform a process to cause the user to increase the volume of the user's voice input when it is determined that the volume of the user's voice input is below the second threshold.
- A device for assisting a user in mouth and/or tongue exercises, wherein the device comprises a processor unit, and the processor unit is Receiving the voice input of the aforementioned user, Recognizing at least one monosyllabic word included in the voice input of the user, A device configured to at least display on the device the content associated with the recognized at least one monosyllabic word.
- A method performed in a device for assisting a user's mouth and/or tongue exercises, wherein the device comprises a processor unit, and the method is The processor unit receives the user's voice input, The processor unit recognizes at least one monosyllabic word included in the user's voice input, A method comprising the processor unit displaying content associated with the recognized at least one monosyllabic word on the device.
Description
An application for application of Article 30, Paragraph 2 of the Patent Law was filed. This was presented at the 15th Annual Meeting of the Japan Digital Dentistry Society, held on May 12, 2024, at Benex Nagasaki Brick Hall. This invention relates to a program, device, and method for assisting a user with oral and/or tongue exercises. The decline of the mouth area (so-called oral frailty) has been a subject of attention for some time (see, for example, Non-Patent Document 1). Japan Dental Association, “Oral Frailty,” [online], [searched September 26, 2024], Internet <https://www.jda.or.jp/enlightenment/oral/about.html> A diagram showing an example of a screen displayed on a device operated by a user.This diagram shows another example of a screen displayed on a device operated by a user.This diagram shows another example of a screen displayed on a device operated by a user.A diagram showing an example configuration of a device 300 for assisting a user with mouth and/or tongue exercises.This figure shows an example of processing performed in device 300.This figure shows another example of processing performed in device 300. The following definitions are used in this specification. A "monosyllabic word" refers to a word that has a single syllable consisting of a single vowel, or a single syllable consisting of a combination of a single consonant and a single vowel. Examples of monosyllabic words include, but are not limited to, "pa," "ta," "ka," and "ra." "Mouth and/or tongue exercises" refer to exercises that stimulate the muscles necessary for moving the mouth and/or the tongue. Therefore, "mouth and/or tongue exercises" include both exercises performed with vocalization and those performed without vocalization. The embodiments of the present invention will be described below with reference to the drawings. 1. Support for user oral and/or tongue exercises The applicant believed that improving oral frailty requires an integrated approach of oral frailty testing and training. Furthermore, considering the recent interest in games among the elderly and the expected increase in the number of elderly people who can use smartphones, the applicant believed that attempting to improve oral frailty through a game application that can be run on the devices of the elderly would be the best approach. One exercise that contributes to improving oral frailty involves repeatedly pronouncing a specific monosyllabic word multiple times in quick succession. However, conventional speech recognition models have been unable to recognize that a specific monosyllabic word has been repeatedly pronouncing it in quick succession, even when the user pronounces it multiple times in quick succession. This is because conventional speech recognition models cannot predict that the same monosyllabic word will be received multiple times in quick succession. For example, if a user pronounces the specific monosyllabic word "pa" as "papapapapa" in quick succession, a conventional speech recognition model, after receiving the initial "papa" input, cannot predict that "pa" will be pronounced again immediately after "papa," leading to a breakdown and the inability to recognize what is being input. Therefore, the applicant has developed a new speech recognition model suitable for recognizing multiple monosyllabic words pronounced consecutively in quick succession, and the same monosyllabic word pronounced multiple times in quick succession. Furthermore, the applicant found that stabilizing the volume of the user's voice input is also important in recognizing multiple monosyllabic words spoken in quick succession and the same monosyllabic word spoken multiple times in quick succession. If the volume of the user's voice input is unstable, even if the user speaks a monosyllabic word multiple times in quick succession (e.g., "pa-pa-pa-pa"), the device can only recognize that the user has spoken the monosyllabic word only once (e.g., "pa"). This is because, for example, if the volume of the user's voice input continues to exceed a predetermined threshold, a monosyllabic word spoken multiple times in quick succession (e.g., "pa-pa-pa-pa") will be recognized as a single long monosyllabic word (e.g., "paaaa"), and a single long monosyllabic word (e.g., "paaaa") will be considered a single short monosyllabic word (e.g., "pa") as a monosyllabic voice input. Therefore, stabilizing the volume of the user's voice input is necessary for the device to more accurately recognize the monosyllabic words spoken by the user. 2. Screen displayed on a user-operated device Figure 1A shows an example of a screen displayed on a user-operated device. The screen 110 shown in Figure 1A is a game screen that operates in response to the user's voice input to assist the user with mouth and/or tongue exercises. The screen 110 may be displayed on the device when a program to assist the user with mouth and/or tongue exercises is launched on the device and a game for mouth and/or tongue