CN-122024733-A - Text conversion method for audio data, information processing device, and program
Abstract
The present disclosure provides a text conversion method of sound data, an information processing apparatus, and a program. A text conversion method for voice data executed by an information processing device includes acquiring voice data, detecting a specific expression in the voice data and feature information related to a sound production of the specific expression, converting the specific expression into a corresponding standard language based on the detected specific expression and the feature information, and outputting text information related to the voice data.
Inventors
- MORISHITA HIROFUMI
Assignees
- 丰田自动车株式会社
Dates
- Publication Date
- 20260512
- Application Date
- 20251031
- Priority Date
- 20241112
Claims (7)
- 1. A text conversion method of sound data executed by an information processing apparatus, the text conversion method comprising: acquiring sound data; Detecting a specific expression in the sound data and feature information related to sound production of the specific expression; Transforming the specific expression into a corresponding standard language according to the detected specific expression and the characteristic information, and Outputting text information related to the sound data.
- 2. The text transformation method according to claim 1, further comprising: And according to the detected specific expression, the pairing of the characteristic information and the transformation rule of the nonstandard language and the standard language, transforming the specific expression into the corresponding standard language.
- 3. The text transformation method of claim 1, wherein, The characteristic information related to the utterance is tone information.
- 4. The text transformation method of claim 1, wherein, The specific expression includes dialects, slang.
- 5. The text transformation method of claim 1, wherein, The sound data is sound in a commercial conversation related to a predetermined provider, and The text transformation method comprises the following steps: Determining regional information corresponding to the speaker based on the sound data, and And presenting suggestions related to the preset provider according to the regional information.
- 6. An information processing apparatus, characterized by comprising one or more processors, the one or more processors are configured to: acquiring sound data; Detecting a specific expression in the sound data and feature information related to sound production of the specific expression; Transforming the specific expression into a corresponding standard language according to the detected specific expression and the characteristic information, and Outputting text information related to the sound data.
- 7. A program that causes one or more processors to perform the following functions, characterized in that the functions include: acquiring sound data; Detecting a specific expression in the sound data and feature information related to sound production of the specific expression; Transforming the specific expression into a corresponding standard language according to the detected specific expression and the characteristic information, and Outputting text information related to the sound data.
Description
Text conversion method for audio data, information processing device, and program Technical Field The present disclosure relates to a text conversion method of sound data, an information processing apparatus, and a program. Background Techniques for analyzing the content of a commercial conversation are known. For example, japanese patent application laid-open No. 2019-28910 discloses a dialogue analysis system that checks matters to be described by a business person and does not describe matters not to be described in a business dialogue with a customer. In addition, for example, sound recognition technology of mountain-rich dialects is disclosed in horiba and The like, "deep-frame school xi, base づ, mountain-rich cover sound ren shi d そ d, standard of The body of Japanese language d, the" The 38 th Annual Conference of The Japanese Society for ARTIFICIAL INTELLIGENCE (2024). Disclosure of Invention In japanese patent application laid-open No. 2019-28910, a technique of analyzing The content of a commercial conversation by machine learning is shown, but in japanese patent application laid-open No. 2019-28910, horizons, and The like, "deep body of a patient" xi, a group づ, a group ren shi, a group そ, a standard of a city of japan language, a "The 38 th Annual Conference of The Japanese Society for ARTIFICIAL INTELLIGENCE (2024), a text conversion technique of transcribing sound, that is, sound data in a commercial conversation, and The like is not described. In particular, there is room for improvement in sound transcription techniques for sound data including non-standard language such as dialects and accents. On the other hand, text conversion techniques for improving sound data are preferable for analysis of contents such as commercial conversations, feedback, and the like. As described above, there is room for improvement in text conversion technology of sound data in commercial conversation and the like. The present disclosure provides text transformation techniques for sound data. A text conversion method for sound data executed by an information processing device according to a first aspect of the present disclosure includes acquiring sound data, detecting a specific expression in the sound data and feature information related to a sound production of the specific expression, converting the specific expression into a corresponding standard language based on the detected specific expression and the feature information, and outputting text information related to the sound data. An information processing device according to a second aspect of the present disclosure includes one or more processors configured to acquire sound data, detect a specific expression in the sound data and feature information related to a sound production of the specific expression, convert the specific expression into a corresponding standard language based on the detected specific expression and the feature information, and output text information related to the sound data. A program according to a third aspect of the present disclosure causes one or more processors to perform functions including acquiring sound data, detecting a specific expression in the sound data and feature information related to sound production of the specific expression, converting the specific expression into a corresponding standard language based on the detected specific expression and the feature information, and outputting text information related to the sound data. According to one embodiment of the present disclosure, text conversion techniques for sound data are improved. Drawings The features, advantages, and technical and industrial significance of the preferred embodiments of the present invention will be described below with reference to the accompanying drawings, in which like numerals represent like parts, and in which: Fig. 1 is a block diagram showing a schematic configuration of a system according to the present embodiment. Fig. 2 is a flowchart showing the operation of the information processing apparatus. Detailed Description Hereinafter, embodiments of the present disclosure are described. (Summary of the embodiments) With reference to fig. 1, an outline and a configuration of a system 1 according to the present embodiment will be described. The system 1 according to the present embodiment includes an information processing device 10 and a terminal device 20. The information processing apparatus 10 and the terminal apparatus 20 are communicably connected to a network 30 including a mobile communication network, the internet, and the like, for example. The information processing apparatus 10 is, for example, a server apparatus provided in a data center or the like. For example, the information processing apparatus 10 is a server belonging to a cloud computing system or other computing systems. In fig. 1, an example in which the information processing apparatus 10 included in the system 1 is shown, but the present invention is not limi