JP-7855153-B1 - Image processing device and image processing method

JP7855153B1JP 7855153 B1JP7855153 B1JP 7855153B1JP-7855153-B1

Abstract

The video processing device is configured to include a procedure group candidate acquisition unit (1) that acquires multiple procedure group candidates, each containing multiple possible steps for executing a task shown in a video and the name of each step in those steps; and an action interval detection unit (2) that detects action intervals in a video, which are sections in the video in which each of the steps for executing a task may be shown, based on the changes in the feature quantities of the task shown in the video over time. The video processing device also includes a procedure group candidate selection unit (3) that selects a procedure group candidate from among the multiple procedure group candidates acquired by the procedure group candidate acquisition unit (1) that corresponds to an action interval detected by the action interval detection unit (2).

Inventors

曲佳
三輪祥太郎

Assignees

三菱電機株式会社

Dates

Publication Date: 20260507
Application Date: 20250604
Priority Date: 20250219

Claims (11)

A procedure group candidate acquisition unit acquires multiple procedure group candidates, each containing multiple possible steps for performing a task shown in a video, and the name of each step in those steps. An action interval detection unit detects action intervals in the video that are segments in the video in which each of the steps for performing the task may be shown, based on the changes in the feature quantities of the task shown in the video over time. A video processing apparatus comprising: a procedure group candidate selection unit that selects a procedure group candidate from among a plurality of procedure group candidates acquired by the procedure group candidate acquisition unit that corresponds to an action interval detected by the action interval detection unit.
The aforementioned action interval detection unit, A feature extraction unit extracts the features of the task shown in the aforementioned video and outputs time-series data showing the change in the features over time. The system includes an interval group candidate acquisition unit that acquires multiple interval group candidates, each containing an action interval that may represent a set of steps for performing the task, based on the time-series data output from the feature extraction unit that shows the time changes of the features, The procedure group candidate selection unit, A matching processing unit that compares each procedure group candidate obtained by the procedure group candidate acquisition unit with each interval group candidate obtained by the interval group candidate acquisition unit, The video processing apparatus according to claim 1, further comprising: a procedure group candidate selection processing unit that, based on the matching result of the matching processing unit, selects a procedure group candidate from among a plurality of procedure group candidates acquired by the procedure group candidate acquisition unit that includes an action section in which each of the steps for executing the task is shown; and selects a procedure group candidate from among a plurality of procedure group candidates acquired by the procedure group candidate acquisition unit that corresponds to an action section included in the selected procedure group candidate.
The aforementioned action interval detection unit, A feature extraction unit extracts the features of the task shown in the aforementioned video and outputs time-series data showing the change in the features over time. The system includes a trained model that has learned the correspondence between task features and action intervals, and an interval group candidate acquisition unit that, by providing time-series data output from the feature extraction unit, acquires multiple interval group candidates from the trained model, each of which may represent a set of steps for executing the task. The procedure group candidate selection unit, A matching processing unit that compares each procedure group candidate obtained by the procedure group candidate acquisition unit with each interval group candidate obtained by the interval group candidate acquisition unit, The video processing apparatus according to claim 1, further comprising: a procedure group candidate selection processing unit that, based on the matching result of the matching processing unit, selects a procedure group candidate from among a plurality of procedure group candidates acquired by the procedure group candidate acquisition unit that includes an action section in which each of the steps for executing the task is shown; and selects a procedure group candidate from among a plurality of procedure group candidates acquired by the procedure group candidate acquisition unit that corresponds to an action section included in the selected procedure group candidate.
The aforementioned matching processing unit, The video processing apparatus according to claim 2 or 3, characterized in that it calculates the similarity between each procedure group candidate obtained by the procedure group candidate acquisition unit and each interval group candidate obtained by the interval group candidate acquisition unit, and outputs the calculation result of the similarity as the procedure group candidate selection processing unit.
The video processing apparatus according to claim 2 or 3, further comprising a presentation processing unit that presents an action interval included in the interval group candidate selected by the procedure group candidate selection processing unit , and the procedure name of a procedure included in the procedure group candidate selected by the procedure group candidate selection processing unit.
The procedure group candidate acquisition unit, The video processing apparatus according to claim 1, characterized in that it provides an Artificial Intelligence (Generative AI) with a request to create a series of steps for performing the tasks shown in the aforementioned video, and obtains the plurality of candidate steps from the Generative AI.
The procedure group candidate acquisition unit, The video processing apparatus according to claim 1, characterized in that it obtains the plurality of candidate procedure groups based on text indicating the purpose of the task shown in the video, and labels indicating action sections which are sections in which each of the series of steps for performing the task is shown, and the procedure names of the steps corresponding to the action sections.
A feature extraction unit extracts features of tasks shown in a video and outputs time-series data showing how these features change over time. A procedure name acquisition unit provides the time-series data output from the feature extraction unit to a trained model and obtains from the trained model action intervals, which are segments in which each of the series of steps for executing the task shown in the video is shown, and the procedure name of the step corresponding to the action interval . An action label is obtained that shows an action section, which is a section in which each of the steps for executing the aforementioned task is shown, and the name of the step corresponding to the action section. Multiple action sections indicated by the action label are compared with multiple action sections obtained by the step name acquisition unit to identify identical action sections. If the name of the step indicated by the action label and the name of the step obtained by the step name acquisition unit do not match within the same action section, the step name obtained by the step name acquisition unit is corrected to the name of the step indicated by the action label. A video processing device equipped with a video processing device.
The aforementioned trained model is During training, given time-series data showing the temporal changes in the features of a task shown in a video, and labels indicating the names of the steps corresponding to the action intervals in which each step of the sequence of steps for performing the task is shown, the model learns the correspondence between the features of the task and the names of those steps. The video processing device according to claim 8, characterized in that, when time-series data output from the feature extraction unit is provided during inference, it outputs an action interval, which is an interval in which each of the series of steps for executing the task of extracting features by the feature extraction unit is shown, and the name of the step corresponding to the action interval .
The procedure group candidate acquisition unit acquires multiple procedure group candidates, each including multiple possible steps for executing the task shown in the video, and the name of each step in those steps. The action interval detection unit detects action intervals in the video, which are sections in the video that may contain each of the steps for performing the task, based on the changes in the feature quantities of the task shown in the video over time. A video processing method comprising: a procedure group candidate selection unit selecting a procedure group candidate from among a plurality of procedure group candidates acquired by the procedure group candidate acquisition unit that corresponds to an action interval detected by the action interval detection unit.
The feature extraction unit extracts the features of the tasks shown in the video and outputs time-series data showing the changes in the features over time. The procedure name acquisition unit provides the time series data output from the feature extraction unit to the trained model and obtains from the trained model the action intervals, which are segments in which each of the series of steps for executing the task shown in the video is shown, and the procedure names of the steps corresponding to those action intervals . The procedure name modification unit obtains action labels indicating action sections, which are sections showing each of the steps in a series of steps for executing the task, and the procedure names of the steps corresponding to those action sections. The unit compares the multiple action sections indicated by the action labels with the multiple action sections obtained by the procedure name acquisition unit to identify identical action sections. If the procedure name indicated by the action label and the procedure name obtained by the procedure name acquisition unit do not match within the same action section, the unit modifies the procedure name obtained by the procedure name acquisition unit to the procedure name indicated by the action label. Image processing methods.

Description

This disclosure relates to an image processing device and an image processing method. Within a video, there is a video processing device that detects action segments, which are sections in the video that show each of the steps required to perform the task shown in the video. As an example of such a video processing device, Non-Patent Document 1 discloses a video processing device comprising: a feature extraction unit that extracts feature quantities of tasks shown in a video and outputs time-series data showing the temporal changes in these feature quantities; and a clustering unit that detects action intervals in which each of a series of steps is shown based on the temporal changes in the feature quantities shown in the time-series data output from the feature extraction unit. Kumar, Sateesh, et al. "Unsupervised action segmentation by joint representation learning and online clustering." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. This is a configuration diagram showing an image processing device according to Embodiment 1.This is a hardware configuration diagram showing the hardware of the video processing device according to Embodiment 1.This is a hardware configuration diagram of a computer when the video processing device is implemented using software or firmware.This is a flowchart showing the image processing method, which is the processing procedure of an image processing device.This is an explanatory diagram showing an example of three candidate procedure sets created by generative AI.This is an explanatory diagram illustrating an example where multiple interval group candidates are output from a neural model such as a TAS model.This is an action send date graph showing an example of interval group candidate selection by the procedure group candidate selection processing unit 3b.This is an explanatory diagram showing an example of an action interval and procedure name presented by the presentation processing unit 4.This is a configuration diagram showing the video processing device according to Embodiment 2.This is a hardware configuration diagram showing the hardware of the video processing device according to Embodiment 2.This is a configuration diagram showing an image processing device according to Embodiment 3.This is a hardware configuration diagram showing the hardware of the video processing device according to Embodiment 3.This is a flowchart showing the image processing method, which is the processing procedure of an image processing device. To provide a more detailed explanation of this disclosure, the forms for implementing this disclosure will be described below with reference to the attached drawings. Embodiment 1. Figure 1 is a configuration diagram showing an image processing device according to Embodiment 1. Figure 2 is a hardware configuration diagram showing the hardware of the video processing device according to Embodiment 1. The video processing device shown in Figure 1 comprises a procedure group candidate acquisition unit 1, an action interval detection unit 2, a procedure group candidate selection unit 3, and a presentation processing unit 4. The procedure group candidate acquisition unit 1 is implemented, for example, by the procedure group candidate acquisition circuit 21 shown in Figure 2. The procedure group candidate acquisition unit 1 acquires multiple procedure group candidates that are different from each other. Candidate procedure sets include multiple possible steps for performing the task shown in the video, along with the names of each step within those steps. The procedure group candidate acquisition unit 1 outputs information indicating multiple procedure group candidates to the procedure group candidate selection unit 3. The action interval detection unit 2 is implemented, for example, by the action interval detection circuit 22 shown in Figure 2. The action interval detection unit 2 includes a feature extraction unit 2a and an interval group candidate acquisition unit 2b. The action interval detection unit 2 detects action intervals in a video, which are segments in the video that may contain each of the steps required to perform a task, based on how the feature quantities of the task shown in the video change over time. The action interval detection unit 2 outputs the detection result of the action interval to the procedure group candidate selection unit 3. The feature extraction unit 2a acquires a video in which the task is shown. The feature extraction unit 2a extracts the features of the task shown in the video. The feature extraction unit 2a outputs time-series data showing the change in features over time to the interval group candidate acquisition unit 2b. The interval group candidate acquisition unit 2b acquires time series data from the feature extraction unit 2a. The interval group candidate acquisition unit 2b acquires multiple distinct interval group candidates based on the time changes of the feature