DE-102025143762-A1 - METHOD FOR CONVERTING LANGUAGE DATA INTO TEXT

DE102025143762A1DE 102025143762 A1DE102025143762 A1DE 102025143762A1DE-102025143762-A1

Abstract

A speech-to-text conversion process performed by an information processing entity comprises acquiring speech data, capturing a specific expression within the speech data and feature information relating to the pronunciation of that specific expression, and outputting text information relating to the speech data in a manner that enables differentiation of a portion for which conversion processing to standard language is not feasible, where conversion processing to standard language is not feasible in order to convert the specific expression into the appropriate standard language based on the specific expression and feature information being captured.

Inventors

Hirofumi Morishita

Assignees

TOYOTA JIDOSHA KABUSHIKI KAISHA

Dates

Publication Date: 20260513
Application Date: 20251027
Priority Date: 20241112

Claims (5)

A language-to-text conversion process performed by an information processing unit, wherein the conversion process comprises: obtaining language data, capturing a specific expression in the language data and feature information relating to a speech of the specific expression, and outputting text information relating to the language data such that it enables the differentiation of a segment with respect to which standard language conversion processing is not feasible, where standard language conversion processing to convert the specific expression into appropriate standard language based on the specific expression and the feature information being captured is not feasible.
Conversion procedure according to Claim 1 , where standard language conversion processing is a processing of a conversion of the specified expression into standard language based on a pair of the specified expression and feature information being captured, and a conversion rule between non-standard language and standard language.
Conversion procedure according to Claim 1 , where the feature information relating to speech is sound information.
Conversion procedure according to Claim 1 , where the definite expression contains dialect and non-standard language.
Conversion procedure according to Claim 1 , furthermore, by issuing a comment for a particular expression for which conversion to standard language was not feasible.

Description

BACKGROUND OF THE INVENTION 1. Field of the invention The present disclosure relates to a method for converting speech data into text. 2. Description of the related prior art Conventional technologies exist for analyzing the content of business negotiations. For example, the unexamined Japanese patent application disclosure no. 2019-28910 ( JP 2019-28910 A ) a dialogue analysis system that checks whether a sales representative in business negotiations with a customer communicates things that should be communicated and doesn't say things that shouldn't be said. Also Horimoto et al., “Toyama Dialect Recognition and Conversion to Standard Japanese via Deep Learning,” The 38th Annual Conference of the Japanese Society for Artificial Intelligence (2024 ), reveals, for example, a speech recognition technology for the Toyama dialect. BRIEF SUMMARY OF THE INVENTION Although JP 2019-28910 A a technology for analyzing the content of business negotiations using machine learning is revealed, but neither is mentioned. JP 2019-28910 A still Horimoto et al. “Toyama Dialect Recognition and Conversion to Standard Japanese via Deep Learning,” The 38th Annual Conference of the Japanese Society for Artificial Intelligence (2024 The transcription of speech data in business negotiations or similar situations, i.e., a conversion technology for transforming speech data into text, is needed. Speech transcription technology is in need of improvement, particularly for speech data containing dialects, accents, and other non-standard languages. Conversely, improvements to the technology for converting speech data into text are desirable for analyzing the content of business negotiations, providing feedback, and so on. Therefore, the conversion technology for transforming speech data into text in business negotiations, etc., is in need of improvement. In view of the above circumstances, one objective of the present disclosure is to improve the conversion technology of speech data into text. A conversion method of speech data into text, according to an embodiment of the present disclosure, is a method for converting speech data into text, performed by an information processing unit, wherein the conversion method comprises the following Obtaining language data, Capturing a specific expression in the speech data and feature information relating to the pronunciation of that specific expression, and Outputting text information relating to the language data in a manner that makes it possible to distinguish a part for which standard language conversion processing is not feasible, where standard language conversion processing to convert the specified expression into the appropriate standard language based on the specified expression and feature information being captured is not feasible. According to one embodiment of the present disclosure, an improved conversion technology of speech data into text is provided. BRIEF DESCRIPTION OF THE DRAWINGS Features, advantages, and the technical and industrial significance of embodiments of the invention are described below with reference to the accompanying drawings, in which the same symbols denote the same elements and wherein: 1 a block diagram that shows a schematic configuration of a system according to an exemplary embodiment; and 2 A flowchart is a diagram that shows the operation of an information processing facility. DETAILED DESCRIPTION OF THE EXECUTION EXAMPLES An embodiment of the present disclosure is described below. An outline and configuration of a system 1 according to the present embodiment is given with reference to 1 described. System 1 according to the present embodiment comprises an information processing unit 10 and an end-device unit 20. The information processing unit 10 is, for example, a server unit installed in a data center or similar facility. The end-device unit 20 is any unit used by a user. is used. These facilities are interconnected via a network, such as the internet or similar. Note that in 1 Although one information processing unit 10 and one terminal unit 20 are shown, the system 1 can comprise a large number of such units. First, an overview of the procedure for converting speech data to text according to the present embodiment is given, and details are described later. Note that the speech data could be, for example, speech data from business negotiations. In the present embodiment, the business negotiations are, for example, negotiations concerning vehicle sales, where the offer in the business negotiations is a vehicle, but is not limited to it. The business negotiations could, for example, be meetings aimed at concluding various types of contracts, such as the purchase or sale of real estate, the conclusion of an insurance contract, the sale of financial products, etc. The offers in connection with the business negotiations in the present embodiment could also be products, services, digital content, licenses, data/information, financial products, real estate, i