Search

CN-115605920-B - Information processing apparatus, information processing method, and recording medium

CN115605920BCN 115605920 BCN115605920 BCN 115605920BCN-115605920-B

Abstract

An information processing device according to an embodiment includes a feature amount acquisition unit, a range estimation unit, a correspondence unit, a merging unit, and an output unit. The feature amount acquisition unit acquires feature amounts extracted from data composed of a plurality of values. The range estimating unit estimates the range of data that may exist in the element to which the predetermined label is to be applied, based on the acquired feature quantity. The correspondence unit associates each tag with at least one of the plurality of feature amounts. The merging unit performs a merging process of merging, into one data range, the one or more data ranges being estimated from one or more feature amounts associated with the tag, respectively. The output unit outputs a correspondence relationship between the tag and the range of the data subjected to the combination processing.

Inventors

  • Tanaka Liaoping

Assignees

  • 株式会社东芝
  • 东芝数字解决方案株式会社

Dates

Publication Date
20260512
Application Date
20210506
Priority Date
20200515

Claims (10)

  1. 1. An information processing device is provided with: a feature amount acquisition unit that acquires a feature amount corresponding to a dimension of data, which is image or audio data, extracted from data composed of a plurality of values; A range estimating unit configured to estimate a range of the data that may exist in an element to which a predetermined label is to be attached, based on the obtained feature amount; A correspondence establishing unit that associates at least one of the tags in a tag string having a plurality of the tags with at least one of the plurality of feature amounts; A merging unit configured to merge, into one data range, one or more data ranges estimated from one or more feature amounts associated with the tag, and And an output unit configured to output a correspondence between the tag and the range of the data after the combination processing.
  2. 2. The information processing apparatus according to claim 1, wherein, The range estimating unit is a neural network that learns to predict a range of the data in which the element to be given the tag may exist, with respect to the obtained feature amount.
  3. 3. The information processing apparatus according to claim 1 or 2, wherein, The correspondence unit estimates a likelihood of a category from the feature quantity, and associates the tag included in the tag row with the feature quantity for which the likelihood of the category satisfies a predetermined condition.
  4. 4. The information processing apparatus according to claim 1 or 2, wherein, The correspondence creating unit estimates K kinds of probability distributions including a predetermined probability distribution of the tag and the space and a probability distribution of K kinds based on the sequence of the feature quantity of the length T extracted from the data, obtains a maximum likelihood tag row having a highest probability of generation generated from a probability distribution sequence which is a sequence of the probability distribution, from among the tag rows having a length T possibly converted to the length T of the tag row, by using a tag conversion unit which converts the redundant tag row having the length T including the space into the tag row having a length L smaller than the length T excluding the space, and associates each of the tags included in the tag row with the feature quantity based on the maximum likelihood tag row.
  5. 5. The information processing apparatus according to claim 1 or 2, wherein, The merging unit merges the ranges of the data estimated from the one or more feature amounts associated with each tag by a weighted average, The weight of the weighted average is calculated such that the closer to the partial boundary of the range of the data, the greater the weight of the region of the data from which each of the feature amounts is extracted.
  6. 6. The information processing apparatus according to claim 1 or 2, wherein, The feature quantity acquisition unit further includes: A feature extraction unit for extracting a plurality of feature values from the data composed of one or more values, and And a recognition unit that recognizes the tag sequence from the plurality of feature amounts.
  7. 7. The information processing apparatus according to claim 1 or 2, further comprising: an operation receiving unit for obtaining a result obtained by determining whether the result outputted by the output unit is correct or not by a user, and And a storage control unit configured to store the result obtained by the operation receiving unit.
  8. 8. The information processing apparatus according to claim 7, wherein, The information acquired by the operation receiving unit includes correction information indicating correction of the range of the data estimated by the range estimating unit, The storage control unit stores the result including the correction information acquired by the operation receiving unit.
  9. 9. A recording medium having recorded thereon a program for causing a computer to execute the steps of: A feature amount obtaining step of obtaining a feature amount corresponding to a dimension of data, which is image or audio data, extracted from data composed of a plurality of values; A range estimating step of estimating a range of the data in which the element to be given the predetermined label may exist, based on the obtained feature quantity; a correspondence establishing step of establishing correspondence between a plurality of the tags and at least one of the plurality of the feature amounts; A merging step of merging, into one data range, the data ranges being estimated from one or more of the feature amounts associated with the tag, and And outputting the corresponding relation between the label and the range of the data after the combination processing.
  10. 10. An information processing method, comprising: A feature amount obtaining step of obtaining a feature amount corresponding to a dimension of data, which is image or audio data, extracted from data composed of a plurality of values; A range estimating step of estimating a range of the data in which the element to be given the predetermined label may exist, based on the obtained feature quantity; a correspondence establishing step of establishing correspondence between a plurality of the tags and at least one of the plurality of the feature amounts; A merging step of merging, into one data range, the data ranges being estimated from one or more of the feature amounts associated with the tag, and And outputting the corresponding relation between the label and the range of the data after the combination processing.

Description

Information processing apparatus, information processing method, and recording medium Technical Field The embodiment of the invention relates to an information processing device, an information processing method and a program. Background In the recognition of sequence tags such as speech recognition and character string recognition, it is known to improve recognition accuracy by combining a Deep Neural Network (DNN) with a sequence tag recognition technique such as connection timing classification (CTC: connectionist Temporal Classification) and recognizing the sequence tags without clearly dividing the boundaries between the tags. On the other hand, in sequence tag recognition using a sequence tag recognition technique such as CTC, the boundary between tags in the tag sequence of the recognition result is not divided, and therefore the range of the tag corresponding to the recognition result is not clear. In sequence tag recognition using a sequence tag recognition technique such as CTC, there is a demand for knowing the range of tags corresponding to the respective recognition results. Conventionally, although there is a method of estimating a region of each character from a character string image, it is not possible to obtain matching with a high-precision recognition technique such as CTC, which is a method completely different from the sequence tag recognition technique such as CTC. Prior art literature Patent literature Patent document 1 Japanese patent laid-open No. 6-251195 Disclosure of Invention Technical problem to be solved by the invention An object of the present invention is to provide an information processing apparatus, an information processing method, and a program capable of specifying a range of input data corresponding to each tag in a tag string of a recognition result in a recognition process of a sequence tag that does not explicitly divide a boundary between tags. Means for solving the technical problems An information processing device according to an embodiment includes a feature amount acquisition unit, a range estimation unit, a correspondence unit, a merging unit, and an output unit. The feature amount acquisition unit acquires feature amounts extracted from data composed of a plurality of values. The range estimating unit estimates the range of the data in which the element to be given the predetermined label may exist, based on the obtained feature quantity. The correspondence establishing unit associates each of the tags in a tag string having a plurality of the tags with at least one of the plurality of feature amounts. The merging unit performs a merging process of merging, into one data range, one or more data ranges estimated from one or more feature amounts associated with the tag, respectively. The output unit outputs a correspondence between the tag and the range of the data after the combination processing. Drawings Fig. 1 is a schematic diagram showing an example of a functional configuration of an information processing system according to the first embodiment. Fig. 2 is a block diagram showing a schematic functional configuration of the information processing apparatus according to the first embodiment. Fig. 3 is a diagram showing a process in a case where the tag of the first embodiment is associated with a plurality of feature amounts. Fig. 4 is a diagram for explaining weighting in the case where the tag of the first embodiment corresponds to a plurality of feature amounts. Fig. 5 is a flowchart showing a series of operations of the information processing apparatus according to the first embodiment. Fig. 6 is a block diagram showing a schematic functional configuration of the information processing apparatus according to the second embodiment. Fig. 7 is a flowchart showing a series of operations of the information processing apparatus according to the second embodiment. Fig. 8 is a schematic diagram showing an example of a functional configuration of an information processing system according to the third embodiment. Fig. 9 is a block diagram showing a teaching system according to the fourth embodiment. Fig. 10 is a flowchart showing a series of operations of the information processing apparatus according to the fourth embodiment. Fig. 11 is a schematic diagram of a sequence recognition problem in the related art. Fig. 12 is a schematic diagram of a prior art sequence tag identification technique. Fig. 13 is a diagram for explaining a determination range of a conventional sequence tag identification technique. Fig. 14 is a diagram illustrating a determination range of a conventional technique for identifying a sequence tag using an Attention (Attention) method. Detailed Description An information processing apparatus, an information processing method, and a program according to embodiments are described below with reference to the drawings. [ Prior Art ] A conventional sequence tag identification technique will be described with reference to fig. 11 to 13. Fig