EP-4736139-A1 - METHOD FOR PROCESSING VISUAL CONTENT

EP4736139A1EP 4736139 A1EP4736139 A1EP 4736139A1EP-4736139-A1

Abstract

The invention relates to a method for processing visual content (Cont), the method comprising a step (E1) of generating metadata (Meta) associated with the visual content (Cont), the generating step comprising a step (E11) of segmenting the visual content (Cont) into a plurality of zones (Z1-Z6), a step (E12) of determining descriptors (Desc) for the zones from the plurality of zones (Z1-Z6), the descriptors (Desc) of a zone comprising a parameter (P) for the acceptability of the zone to receive a visual element. The processing method further comprises a step of associating zone descriptors (Desc) with the visual content as metadata (Meta) for the visual content.

Inventors

Floury, Cédric
MERCIER, Violaine
LE TROCQUER, Mickael
GILABERT SENAR, Alexis

Assignees

ORANGE

Dates

Publication Date: 20260506
Application Date: 20240627

Claims (18)

1. A method of processing visual content (Cont), said processing method comprising: - a generation (E1) of metadata (Meta) associated with said visual content (Cont), said generation comprising: • a segmentation (E11) of the visual content (Cont) into a plurality of zones (Z1-Z6), said segmentation taking into account the semantic elements detected in said visual content; • a determination (E12) of descriptors (Desc) of zones of the plurality of zones (Z1 -Z6), the descriptors (Desc) of a zone comprising a parameter (P) of acceptability of said zone to receive a visual element (Text), said acceptability parameter taking into account the presence or absence of at least one of said semantic elements detected on said zone; - an association with said visual content of the zone descriptors (Desc), as metadata (Meta) of said visual content.
2. Processing method according to claim 1, in which the segmentation (E11) comprises a decomposition into layers by depth of the visual content (Cont).
3. Processing method according to any one of claims 1 or 2, in which a zone comprises a majority of pixels of the same color and/or the same texture and/or corresponds to a visual rendering of a specific part of an individual.
4. Processing method according to any one of claims 1 to 3, in which the visual content (Cont) being a video, the segmentation (E2) of the visual content (Cont) into a plurality of zones is carried out on the entire video.
5. Processing method according to any one of claims 1 to 4, in which the visual content (Cont) being a video, the segmentation (E2) of the visual content into a plurality of zones is carried out by video sequences cutting up said video.
6. Processing method according to any one of claims 1 to 5, in which the zones (Z1-26) obtained by segmentation have different depths.
7. Treatment method according to claim 1 according to any one of claims 1 to 6, wherein said acceptability parameter of a zone takes into account of a semantic importance of at least one of said detected semantic elements present in said area;
8. Method of associating, with visual content, a visual element (Text) to be displayed superimposed on said visual content, said association method comprising: - obtaining metadata (Meta) associated with said visual content (Cont), said metadata comprising descriptors (Desc) of the zones of a plurality of zones (Z1-Z6) segmenting said visual content (Cont), said descriptors (Desc) of a zone comprising a parameter (P) of acceptability of said zone to receive a visual element, said acceptability parameter taking into account the presence or absence of at least one semantic element on said zone; - a selection (E2) of at least one zone (Z2, Z5, Z6) of said plurality of zones (Z1-Z6) to receive the visual element (Text) according to said acceptability parameters (P) of said descriptors of the plurality of zones; - an association with said visual element of said at least one selected zone.
9. Association method according to claim 8, in which the selection (E2) takes into account a spatial occupation of the visual element (Text) and/or a color of said visual element (Text).
10. Data flow between a first electronic device (10) and a second electronic device (20), said at least one data flow (FW) comprising: - visual content (Cont) segmented into a plurality of zones (Z1 -Z6); - metadata (Meta) associated with said visual content (Cont), said metadata comprising descriptors (Desc) of zones of the plurality of zones (Z1-Z6), the descriptors (Desc) of a zone comprising a parameter (P) of acceptability of said zone to receive a visual element, said acceptability parameter taking into account the presence or absence of at least one semantic element on said zone.
11. Data stream according to claim 10, wherein said at least one data stream comprises a first visual element and a designation of at least a first of said zones of said visual content for receiving said first visual element (Text), said first zone taking into account said acceptability parameters (P) of said descriptors of the plurality of zones.
12. A method of displaying visual content (Cont), said display method comprising: - obtaining (E'1) a flow (FW), said flow comprising: • visual content (Cont) segmented into a plurality of zones (Z1-Z6); • metadata (Meta) associated with said visual content (Cont), said metadata comprising descriptors (Desc) of zones of the plurality of zones (Z1-Z6), the descriptors (Desc) of a zone comprising a parameter (P) of acceptability of said zone to receive a visual element, said acceptability parameter taking into account the presence or absence of at least one semantic element on said zone; - a (E’5) display of the visual content (Cont) on a display device of an electronic device (20), said displayed visual content (Cont) comprising a first visual element (Text), said first visual element being positioned on an area of said plurality of areas identified taking into account the acceptability parameters of said plurality of areas.
13. A method of displaying visual content (Cont) according to claim 12, said display method comprising: - a reception of said first visual element (Text) and a designation of said first zone.
14. Display method according to any one of claims 12 or 13, in which the method comprises a (E’2) graphic adaptation of the visual content (Cont) according to characteristics of said display device of said electronic device (20), and, when said metadata comprise descriptors of at least two zones whose acceptability parameters correspond to suitable zones (Z2, Z5, Z6) for receiving said first visual element (Text), a selection (E’3) of said first zone from said at least two zones taking into account said graphic adaptation.
15. Display method according to any one of claims 12 to 14, in which prior to the display (E’5) of the visual content (Cont), said first visual element is adapted (E’4) to the identified zone.
16. Electronic device comprising at least one microprocessor suitable for processing visual content (Cont) comprising: - a generation (E1) of metadata (Meta) associated with said visual content (Cont), said generation comprising: • a segmentation (E11) of the visual content (Cont) into a plurality of zones (Z1-Z6), said segmentation taking into account the semantic elements detected in said visual content; • a determination (E12) of descriptors (Desc) of zones of the plurality of zones (Z1 -Z6), the descriptors (Desc) of a zone comprising a parameter (P) of acceptability of said zone to receive a visual element (Text), said acceptability parameter taking into account the presence or absence of at least one of said semantic elements detected on said zone; - an association with said visual content of the zone descriptors (Desc), as metadata (Meta) of said visual content.
17. Electronic device comprising at least one microprocessor adapted to: - obtaining metadata (Meta) associated with visual content (Cont), said metadata comprising descriptors (Desc) of the zones of a plurality of zones (Z1-Z6) segmenting said visual content (Cont), said descriptors (Desc) of a zone comprising a parameter (P) of acceptability of said zone to receive a visual element; - a selection (E2), as a function of said acceptability parameters (P) of said descriptors of the plurality of zones, of at least one zone (Z2, Z5, Z6) of said plurality of zones (Z1-Z6) to receive a first visual element (Text) to be displayed superimposed on said visual content; - an association with said first visual element of said at least one selected zone.
18. Electronic device comprising at least one microprocessor adapted to display, on a display device of said electronic device, visual content (Cont), said display method comprising: - obtaining (E’1) a flow (FW), said flow comprising: • visual content (Cont) segmented into a plurality of zones (Z1 -Z6); • metadata (Meta) associated with said visual content (Cont), said metadata comprising descriptors (Desc) of zones of the plurality of zones (Z1-Z6), the descriptors (Desc) of a zone comprising a parameter (P) of acceptability of said zone to receive a visual element, said acceptability parameter taking into account the presence or absence of at least one semantic element in said area; - a (E’5) display of the visual content (Cont) and of a first visual element, associated with said visual content, on a display device of an electronic device (20), said first visual element (Text) being positioned on a zone positioned superimposed on a zone of said plurality of zones identified by taking into account the acceptability parameters of said plurality of zones.

Description

DESCRIPTION Title of the invention: Method for processing visual content [0001] Technical domain [0002] The present invention relates to the field of communications between electronic devices, such as servers or terminals (or between people using these electronic devices) and more particularly to the exchange of visual content between two electronic devices, said visual content comprising a visual element intended to be displayed superimposed on this visual content. [0003] By visual content is meant here purely visual content (such as an image or a video) such as audiovisual or multimedia content comprising at least one visual component of the image or video type. [0004] Methods are known for creating graphical interfaces that are freely configurable and modular by a user. These graphical interfaces are essentially composed of fixed or dynamic multimedia objects, texts or actuators. [0005] In the context of these methods, a visual element may be placed on an image area that is not suitable for good readability of this visual element. Thus, if the visual element is positioned on a face, this presence may promote a potential reading defect by the user. [0006] There is therefore a need to propose a processing method which makes it possible to improve the rendering of visual elements superimposed on an image. [0007] Summary of the invention [0008] An object of the invention relates to a method for processing visual content comprising a generation of metadata associated with said visual content, said generation comprising a segmentation of the visual content into a plurality of zones, a determination of zone descriptors of the plurality of zones, the descriptors of a zone comprising a parameter of acceptability of said zone to receive a visual element, an association with said visual content of the zone descriptors, as metadata of said visual content. [0009] The present application relates in particular to a method for processing visual content, said processing method comprising: - a generation of metadata associated with said visual content, said generation comprising: • a segmentation of the visual content (Cont) into a plurality of zones, said segmentation taking into account the semantic elements detected in said visual content; • a determination of zone descriptors of the plurality of zones, the descriptors of a zone comprising an acceptability parameter of said zone to receive a visual element, said acceptability parameter taking into account the presence or absence of at least one of said semantic elements detected on said zone; - an association with said visual content of the zone descriptors, as metadata of said visual content. [0010] The processing method analyzes the visual content in order to define preservation zones in which no visual element can be superimposed on the image and/or zones of acceptance of visual elements that can be superimposed on the image. These zones are characterized either by the presence of a strong semantic element (face, main element of a landscape, etc.) or, on the contrary, they constitute neutral zones, i.e. zones containing no semantic element or no semantic element identified as strong. Each zone thus forms a visual unit identifying said zone and grouping together all the properties concerning it. This visual unit can result, for example, from the shape of the zone, the color of the zone, the texture of the zone or the semantic element associated with this zone. In other words, these zones have similar colors (gradient of the same color, etc.), the same resolution or the same level of blur or even the same texture within the same zone. [0011] Each zone is associated with descriptors defining it, both from a structural point of view and in terms of the content carried. These zone descriptors are integrated into the metadata associated with the visual content, thus ensuring their conservation during storage and/or transmission of the visual content through a communication channel. This communication channel is suitable for transmitting different streams that can carry different types of data. This allows for optimal rendering or adaptation on a remote user terminal. The user experience in viewing the visual content can thus be improved. [0012] In an alternative embodiment, taken alone or in combination, the segmentation comprises a decomposition into layers by depth of the visual content. [0013] In an alternative embodiment, taken in isolation or in combination, a zone comprises a majority of pixels of the same color and/or the same texture and/or corresponds to a visual rendering of a specific part of an individual. [0014] In an alternative embodiment, taken in isolation or in combination, the visual content being a video, the segmentation of the visual content into a plurality of zones is carried out over the entire video. [0015] In an alternative embodiment, taken in isolation or in combination, the visual content being a video, the segmentation of the visual conten