BR-112019013609-B1 - METHOD AND APPARATUS FOR PROCESSING INFORMATION FROM STREAMING MEDIA

BR112019013609B1BR 112019013609 B1BR112019013609 B1BR 112019013609B1BR-112019013609-B1

Abstract

A method of information processing and apparatus for media transmission, wherein said method of information processing for media transmission comprises: acquisition of target spatial information from a target spatial object, the target spatial object being a spatial object in two spatial objects associated with data from two images in the target video data, and the data from the two images being data from two images; the target spatial information includes spatial information of equal attributes, the spatial information of equal attributes comprising identical information between the respective spatial information of the two spatial objects, while the spatial information of spatial objects other than the target spatial object in the two spatial objects is assigned spatial information; and according to the target spatial information, determining the video data to be reproduced. By replacing a set of spatial information of equal attributes with a duplicated part in the respective spatial information of the two spatial objects, spatial information redundancy is reduced, thus reducing the volume of spatial information data.

Inventors

Peiyun Di
Qingpeng Xie

Assignees

HUAWEI TECHNOLOGIES CO., LTD

Dates

Publication Date: 20260317
Application Date: 20170329
Priority Date: 20161230

Claims (11)

1. A method of processing streaming media information performed by a client, characterized in that the method comprises the steps of: receiving a Media Presentation Description (MPD) from a server; wherein the MPD contains spatial information of each spatial region among a plurality of non-overlapping spatial regions, wherein the entire surface of a sphere, as a panoramic space of the sphere, is divided into a plurality of non-overlapping spatial regions, wherein each of the non-overlapping spatial regions corresponds to a different Dynamic Adaptive Streaming over HTTP (DASH) bitstream among multiple DASH bitstreams, and each DASH bitstream among the multiple DASH bitstreams is an image stream of that spatial region to which the DASH bitstream corresponds, and wherein, for two spatial regions among the plurality of non-overlapping spatial regions, the spatial information contained in the MPD consists of: spatial information of the same attribute, wherein spatial information of the same attribute is spatial information that is shared by the two spatial regions. and therefore, it is the same for both spatial regions, and spatial information with a different attribute from the first spatial region among the two spatial regions and spatial information with a different attribute from the other spatial region among the two spatial regions, wherein the spatial information with a different attribute from the first spatial region is different from the spatial information with a different attribute from the other spatial region between the two spatial regions; receive, from a client user, an instruction to obtain, from the server, a bitstream corresponding to a given target field of view selected by the user, wherein the target field of view is a field of view among multiple fields of view, each field of view among the multiple fields of view corresponds to a different spatial region among the multiple non-overlapping spatial regions, and the target field of view corresponds to a target spatial region between the two spatial regions; determine, based on the MPD and the target field of view, target spatial information of the target spatial region; determine, based on the target spatial information, the target spatial region; request, from the server, target video data from a DASH bitstream among the multiple DASH bitstreams that correspond to the target spatial region, and receive the target video data from the server.
2. A method according to claim 1, characterized in that the target spatial information comprises location information for a central point of the target spatial region or location information for an upper left point of the target spatial region, and the target spatial information further comprises a width of the target spatial region and a height of the target spatial region.
3. Method, according to claim 2, characterized in that the location information of the center point or the location information of the upper left point of the first target spatial region comprises: a pitch angle and a yaw angle of the center point or the upper left point in an angular coordinate system; or, a tilt angle, a yaw angle and a roll angle of the center point or the upper left point in an angular coordinate system.
4. A method according to any one of claims 1 to 3, characterized in that the target spatial information is encapsulated in spatial information data or in a spatial information track, the spatial information data being a bitstream of the target video data, metadata of the target video data, or a file independent of the target video data, and the spatial information track being a track independent of the target video data.
5. Method, according to claim 4, characterized in that the spatial information data or the spatial information trail further comprise a spatial information type identifier, and the spatial information type identifier indicates a spatial information type of the same attribute, and the spatial information type identifier is used to indicate that it is in the target spatial information and that it belongs to the spatial information of the same attribute.
6. Method, according to claim 5, characterized in that the spatial information type identifier and the spatial information of the same attribute are encapsulated in the same box.
7. A method according to any one of claims 4 to 6, characterized in that the spatial information data or the spatial information trail further comprises a coordinate system identifier used to indicate a coordinate system corresponding to the target spatial information, and the coordinate system is either a pixel coordinate system or an angular coordinate system.
8. Method, according to claim 7, characterized in that the coordinate system identifier and the spatial information of the same attribute are encapsulated in the same box.
9. A method according to any one of claims 4 to 8, characterized in that the spatial information data or the spatial information trail further comprises a spatial rotation information identifier, and the spatial rotation information identifier is used to indicate whether the target spatial information comprises the spatial rotation information of the target spatial object.
10. Streaming media information processing device, characterized in that the device comprises: an acquisition module, configured to perform the following steps: receiving a Media Presentation Description (MPD) from a server; wherein the MPD contains spatial information of each spatial region among a plurality of non-overlapping spatial regions, wherein the entire surface of a sphere, as a panoramic space of the sphere, is divided into a plurality of non-overlapping spatial regions, wherein each of the non-overlapping spatial regions corresponds to a different dynamic adaptive streaming over HTTP (DASH) bitstream among multiple DASH bitstreams, and each DASH bitstream among the multiple DASH bitstreams is an image stream of that spatial region to which the DASH bitstream corresponds, and wherein, for two spatial regions among the plurality of non-overlapping spatial regions, the spatial information contained in the MPD consists of: spatial information of the same attribute, wherein spatial information of the same attribute is spatial information that is shared by two spatial regions and, therefore, is the same for both spatial regions, and spatial information with a different attribute from the first spatial region among the two spatial regions and spatial information with a different attribute from the other spatial region among the two spatial regions, wherein the spatial information with a different attribute from the first spatial region is different from the spatial information with a different attribute from the other spatial region between the two spatial regions; receive, from a client user, an instruction to obtain, from the server, a bit stream corresponding to a given target field of view selected by the user, wherein the target field of view is a field of view among multiple fields of view, each field of view among the multiple fields of view corresponds to a different spatial region among the multiple non-overlapping spatial regions, and the target field of view corresponds to a target spatial region between the two spatial regions; A determination module, configured to determine, based on the MPD and the target field of view, target spatial information of the target spatial region; to determine, based on the target spatial information, the target spatial region; wherein the determination module is further configured to request, from the server, target video data from a DASH bitstream among the multiple DASH bitstreams that correspond to the target spatial region, and to receive the target video data from the server.
11. Apparatus, according to claim 10, characterized in that the target spatial information comprises location information of a central point of the target spatial region or location information of an upper left point of the target spatial region, and the target spatial information further comprises a width of the target spatial region and a height of the target spatial region.

Description

TECHNICAL FIELD [0001] The present invention relates to the field of streaming media processing and, in particular, to a method and apparatus for information processing. BACKGROUND I. Introduction to MPEG-DASH technology [0002] In November 2011, the MPEG organization approved the DASH standard. The DASH standard (referred to as the DASH technical specification below for short) is a technical specification for transmitting a media stream according to the HTTP protocol. The DASH specification mainly includes two parts: a Media Presentation Description (MPD) and a media file format. 1. Media file format [0003] The media file format is a type of file format. In DASH, a server prepares a plurality of bitstream versions for the same video content, and each bitstream version is referred to as a representation in the DASH standard. A representation is a set and encapsulation of one or more bitstreams in a transmission format, and a representation includes one or more segments. Different bitstream versions may have different encoding parameters, such as bitrates and resolutions. Each bitstream is segmented into a plurality of small files, and each small file is referred to as a segment. A client can switch between different media representations in a media segment data request process. The segment can be encapsulated based on a format (an ISO BMFF (Base Media File Format)) in the ISO/IEC 14496-12 standard, or it can be encapsulated based on a format (MPEG-2 TS) in ISO/IEC 13818-1. 2. Media Presentation Description [0004] In the DASH standard, the media presentation description is referred to as an MPD, and the MPD can be an XML file. Information in the file is described in a hierarchical manner. As shown in Figure 1, all information at a current level is inherited at the next level. Some media metadata is described in the file. The metadata can allow the client to understand media content information on the server, and the client can use the information to construct an HTTP URL to request a segment. [0005] In the DASH standard, a media presentation is a set of structured data that presents media content. A media presentation description is a file that normatively describes a media presentation and is used for the provision of a streaming media service. A period and a group of consecutive periods form an entire media presentation, and the period has continuous, non-overlapping features. In MPD, a representation is a set and encapsulation of descriptive information of one or more bitstreams in a broadcast format, and a representation includes one or more segments. An adaptation set represents a set of a plurality of mutually interchangeable encoding versions of the same media content component, and an adaptation set includes one or more representations. A subset is a combination of adaptation sets. When playing all adaptation sets in the combination, a player can obtain a corresponding media content. A segment information is a media unit referenced by an HTTP Uniform Resource Locator in the media presentation description. The segment information describes a segment of media data. The media data segment can be stored in a file or it can be stored separately. In one possible way, the MPD stores the media data segment. [0006] For a related technical concept of the MPEG-DASH technology in the present invention, refer to the related provisions in ISO/IEC 23009-1: Information technology--Dynamic adaptive streaming over HTTP (DASH)-- Part 1: Media presentation description and segment formats; or refer to related provisions in a historical standard version, such as ISO/IEC 23009-1:2013 or ISO/IEC 23009-1:2012. II. Introduction to virtual reality (VR) technology [0007] Virtual reality technology is a computer simulation system that can create a virtual world and make the virtual world experienceable. Virtual reality technology generates a simulated environment through the use of a computer, and is a fused information system simulation of multiple sources of interactive three-dimensional dynamic vision and physical behavior. The technology can allow a user to be immersed in the environment. VR mainly includes aspects such as simulated environment, perception, a natural ability, and a sensing device. The simulated environment is a realistic, three-dimensional, dynamic, real-time, computer-generated image. Perception means that an ideal VR should have all types of human perception. In addition to visual perception generated by computer graphics technology, perceptions such as auditory sensation, tactile sensation, force sensation, and movement are also included, and even olfactory sensation, taste sensation, and the like are also included. This is also referred to as multiple perception. Natural ability refers to a person's head or eye movement, gesture, or other human behavior or action. The computer processes data appropriate for a participant's action, creates a response to user input in real time, and separately returns the r