EP-3780638-B1 - INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING DEVICE, AND PROGRAM

EP3780638B1EP 3780638 B1EP3780638 B1EP 3780638B1EP-3780638-B1

Inventors

YAMAGISHI, YASUAKI
KIYAMA, YUKA

Dates

Publication Date: 20260513
Application Date: 20190318

Claims (12)

An information processing apparatus (4), comprising: a media reproduction unit (41; 43; 44; 45; 46) configured to acquire and reproduce video data from a moving image content server (11), the video data including at least one service object for which a service that processes a service request from a user (U) through voice recognition by a voice AI assistant is available, wherein the service object comprises a person or product appearing in the reproduced video data, and acquire metadata from the moving image content server (11) or a point of interest, POI, metadata server (13), the metadata corresponding to the video data and including identification information of the video data and information of a start time and an end time of a duration during which an additional image for informing the user (U) about the service object is superimposed on the reproduced video data; and a controller (42; 43; 50; 51) configured to superimpose the additional image for informing the user (U) about the service object in a reproduced video scene based on the metadata, and in response to an instruction from the user (U), save the metadata which is used to generate the additional image in the video scene as a bookmark that is selected by the user (U) for a time-shifted reproduction of the video scene with the additional image.
The information processing apparatus (4) according to claim 1, wherein the controller (42; 43; 50; 51) is configured to receive a selection of the saved bookmark from the user (U), and reproduce the video scene with the additional image based on the identification information of the video scene and the information of the start time and the end time of the additional image which correspond to the selected bookmark.
The information processing apparatus (4) according to any one of the previous claims, wherein the metadata includes service back-end control information including a function name that indicates a function of the service identified by utterance from the user, and the controller (42; 43) is configured to present the function name of the service back-end control information to the user, the service back-end control information being included in the metadata corresponding to the bookmark selected by the user.
The information processing apparatus (4) according to claim 3, wherein the metadata includes information for requesting a different function for each time zone by using one function name, and the controller (42; 43) is configured to transmit the request to a server that switches the function of the service based on the information.
The information processing apparatus (4) according to any one of the previous claims, wherein the controller (42; 43) is configured to restrict use of the service for each service object.
The information processing apparatus (4) according to claim 5, wherein the restriction is a restriction by charging.
The information processing apparatus (4) according to claim 5, wherein the restriction is a restriction regarding whether sharing of the metadata of the additional image on a community service is possible or not.
The information processing apparatus (4) according to any one of the previous claims, wherein the additional image includes a visual feature unique to each service object such that the service object is uniquely determined by voice recognition in the service.
The information processing apparatus (4) according to any one of the previous claims, wherein the additional image is presented at a position attached to the service object.
The information processing apparatus according to any one of the previous claims, wherein the controller acquires an MPD file including AdaptationSet of the metadata, analyzes the MPD file, acquires each of the video data and the metadata as a Media Segment of MPEG-DASH, and presents the video data and the additional image based on the metadata in synchronization with each other.
An information processing method, comprising: acquiring and reproducing video data from a moving image content server (11), the video data including at least one service object, for which a service that processes a service request from a user through voice recognition by a voice AI assistant is available, wherein the service object comprises a person or product appearing in the reproduced video data; acquiring metadata from the moving image content server (11) or a point of interest, POI, metadata server (13), the metadata corresponding to the video data and including identification information of the video data and information of a start time and an end time of a duration during which an additional image for informing the user (U) about the service object is superimposed on the reproduced video data; superimposing the additional image for informing the user about the service object in a reproduced video scene based on the metadata; and in response to an instruction from the user, saving the metadata which is used to generate the additional image as a bookmark that is selected by the user for a time-shifted reproduction of the video scene with the additional image.
A computer program having a program code for executing the method of claim 11, when the computer program is executed on a computer.

Description

Technical Field The present technology relates to an information processing apparatus, an information processing apparatus, and a program that perform information processing for receiving and reproducing moving image content including videos, and particularly, to an information processing apparatus, an information processing method, and a program that are suitable for, for example, a case where the moving image content works with a voice-based information service for a user of the information processing apparatus. Background Art Voice artificial intelligence (AI) assistant services have been recently prevailing. Those services are information services in which a terminal supporting the services picks up a request through voice made by a user of an information processing apparatus by using a microphone or the like, recognizes the request, analyzes data, executes a service corresponding to the request of the user, and responds to the user for a result of the execution through sound or the like (see, for example, Patent Literature 1). Alexa (registered trademark) of Amazon Echo (registered trademark) is currently known as a voice AI assistant service based on cloud. Patent Literature 2 describes a concept for real-time updating of virtual assistant media knowledge. Virtual assistant knowledge can be updated with timely information associated with playing media (e.g., a sporting event, a television show, or the like). A data feed can be received that includes data relating events to particular times in a media stream. A user request can be received based on speech input, and the user request can be associated with an event in a media stream or show. In response to receiving the request, the media stream can be cued to commence playback at a time in the media stream associated with the event referred to in the request. In another example, a response to the user request can be generated based on the data relating to the events. The response can then be delivered to the user (e.g., spoken aloud, displayed, etc.). Patent Literature 3 describes bookmark setting concept to set a bookmark in which a display position where a display subject is displayed on the display screen and a temporal position within the movie content where the display subject is displayed on the display screen are mutually related to each other. Patent Literature 4 describes a concept for navigating hypermedia using multiple coordinated input/output device sets. The concept allows a user and/or an author to control what resources are presented on which device sets, and provide for coordinating browsing activities to enable such a user interface to be employed across multiple independent systems. Patent Literature 5 describes a reception apparatus that includes circuitry configured to receive a digital data stream. The circuitry is configured to acquire closed caption information included in the digital data stream. The circuitry is configured to acquire control information including selection information indicating a selection of a specific mode from a plurality of modes for specifying when closed caption text is to be displayed. The circuitry is further configured to output the closed caption text included in the closed caption information for display to a user, at a display time according to the specific mode, based on the selection information included in the control information. Patent Literature 6 describes a method for utilizing interactive video with tagged objects which includes receiving video data including both video media and an interactive Video layer and receiving a user input selecting a selectable video object from the interactive video layer during rendering of the video data. Patent literature 7 describes a method for controlling rate adaptation behaviour of dynamic adaptive streaming HTTP (DASH) client, which generates signal to change rate adaptation behaviour of DASH client devices and sends signal to DASH client devices. Citation List Patent Literature Patent Literature 1: Japanese Patent Application Laid-open No. 2015-022310Patent Literature 2: US 2015/382079 A1Patent Literature 3: US 2011/126105 APatent Literature 4: US 8 161 172 B2Patent Literature 5: WO 2016/203726 A1Patent Literature 6: US 2012/0167145 A1Patent literature 7: US 2014/250479 A1 Disclosure of Invention Technical Problem The inventors of the present technology have examined a mechanism in which the voice AI assistant service as described above is used as means for collecting information regarding people or products appearing in a video in an environment where moving image content including the video is reproduced. For example, in a case where a user as a viewer wants to know then and there various things such as a role of a person appearing in the moving image content, a relationship with other people appearing therein, and further the profile of the actor who acts that person, the user can receive information from the voice AI assistant service in real tim