CN-122024732-A - Text conversion method of voice data
Abstract
The present disclosure provides a text conversion method of voice data executed by an information processing apparatus, including acquiring voice data, detecting a specific expression in the voice data and feature information related to a sound production of the specific expression, and outputting text information related to the voice data in a style capable of discriminating a portion in which the standard language conversion processing is not performed, in a case where standard language conversion processing for converting the specific expression into a corresponding standard language based on the detected specific expression and feature information is not performed.
Inventors
- MORISHITA HIROFUMI
Assignees
- 丰田自动车株式会社
Dates
- Publication Date
- 20260512
- Application Date
- 20251024
- Priority Date
- 20241112
Claims (5)
- 1. A text conversion method of voice data performed by an information processing apparatus, comprising: Acquiring voice data; detecting a specific expression in the speech data and feature information related to the sound production of the specific expression, and When standard language conversion processing for converting the specific expression into a corresponding standard language based on the detected specific expression and the feature information is not performed, text information related to the speech data is outputted in a pattern enabling discrimination of a portion where standard language conversion processing is not performed.
- 2. The text conversion method of voice data according to claim 1, wherein, The standard language conversion process is a process of converting the specific expression into a standard language based on the detected pair of the specific expression and the feature information and the conversion rule of the non-standard language and the standard language.
- 3. The text conversion method of voice data according to claim 1, wherein, The characteristic information related to the utterance is tone information.
- 4. The text conversion method of voice data according to claim 1, wherein, The specific expressions include dialects, non-standard language.
- 5. The text conversion method of voice data according to claim 1, wherein the text conversion method of voice data further comprises: Annotation of specific expressions that fail to be converted into standard language is performed.
Description
Text conversion method of voice data Technical Field The present disclosure relates to a method of text conversion of speech data. Background Conventionally, a technique for analyzing a commercial content is known. For example, japanese patent application laid-open No. 2019-28910 discloses a dialogue analysis system for checking whether a sales person has described an item to be described when negotiating with a customer, and has not stated something that cannot be stated. Further, for example, a voice recognition technique of fushan mountain guide is disclosed in The standard of variation of The body position of japanese language by "The 38 th Annual Conference of The Japanese Society for ARTIFICIAL INTEL LIGENCE (2024) of horiba in The city of horiba in xi, base づ by ball-shape and mountain-rich guide sound ren shi by そ. Disclosure of Invention Although japanese patent application laid-open No. 2019-28910 discloses a technique of analyzing The contents of a conversation by machine learning, japanese patent application laid-open No. 2019-28910 and horiba's "deep body of us xi by づ m of ba-fu shan zhan sound ren shi r そ g of us language v of laque, none of The text-conversion techniques of speech data, that is, none of The speech transcription in The conversation and The like, are mentioned in The" The 38 th Annual Conference of The Japanese Society for ARTIFICIAL INTEL LIGENCE (2024) of The present invention. In particular, there is room for improvement in a technique for speech transcription of speech data including non-standard speech such as dialects and accents. On the other hand, in order to analyze and feed back contents of negotiations and the like, it is preferable to improve a text conversion technique of voice data. Therefore, there is room for improvement in a technique of text converting voice data in negotiations and the like. An object of the present disclosure, which has been completed in view of the above-described circumstances, is to improve a text conversion technique of voice data. The text conversion method of voice data of an embodiment of the present disclosure is a text conversion method of voice data executed by an information processing apparatus, including: Acquiring voice data; detecting a specific expression in the speech data and feature information related to the sound production of the specific expression, and When standard language conversion processing for converting the specific expression into a corresponding standard language based on the detected specific expression and the feature information is not performed, text information related to the speech data is outputted in a pattern enabling discrimination of a portion where standard language conversion processing is not performed. According to an embodiment of the present disclosure, text conversion techniques for speech data are improved. Drawings Features, advantages, and technical and industrial significance of exemplary embodiments of the present invention will be described below with reference to the accompanying drawings, in which like reference numerals denote like elements, and in which: fig. 1 is a block diagram showing a schematic configuration of a system according to the present embodiment; Fig. 2 is a flowchart showing the operation of the information processing apparatus. Detailed Description Hereinafter, embodiments of the present disclosure will be described. With reference to fig. 1, an outline and a configuration of a system 1 according to the present embodiment will be described. The system 1 of the present embodiment includes an information processing apparatus 10 and a terminal apparatus 20. The information processing apparatus 10 is a server apparatus installed in a data center or the like, for example. The terminal device 20 is an arbitrary device used by a user. These devices are communicatively connected via a network 30 such as the internet. Although one information processing apparatus 10 and one terminal apparatus 20 each are shown in fig. 1, the system 1 may include a plurality of such devices. First, an outline of the text conversion method of voice data of the present embodiment will be described, and will be described in detail later. The voice data may be, for example, data of voice in a conversation. In the present embodiment, the negotiations are negotiations related to, for example, sales of vehicles, and the provision related to the negotiations is a vehicle, but not limited thereto. For example, the negotiations may be conferences for the purpose of signing various contracts such as purchase and sale of real estate, contract for insurance products, sales of financial products, and the like. In addition, in the present embodiment, the provider related to the negotiation may be a commodity, a service, digital content, a license, data/information, a financial commodity, real estate, intangible asset, other tradable right, or the like. The information processing apparatus 10 acquires voi