CN-117121050-B - Segmenting and removing objects from media items

CN117121050BCN 117121050 BCN117121050 BCN 117121050BCN-117121050-B

Abstract

The media application generates training data comprising a first set of media items and a second set of media items, wherein the first set of media items corresponds to the second set of media items and includes manually segmented interference objects. The media application trains a segmented machine learning model based on the training data to receive media items having one or more interfering objects and output segmented masks corresponding to the one or more segmented objects of the one or more interfering objects.

Inventors

Ollie Liba
Nigel Carnard
Nori Kanazawa
Yel pritch Kernen
CHEN HUIZHONG
CAI LONGQI

Assignees

谷歌有限责任公司

Dates

Publication Date: 20260512
Application Date: 20221018
Priority Date: 20211018

Claims (20)

1. A computer-implemented method of removing an interfering object from a media item, comprising: Generating training data comprising a first set of media items and a second set of media items, wherein the first set of media items comprises an interfering object and the second set of media items comprises a manual segmentation of the interfering object; Identifying one or more original media items in the first set of media items that includes one or more disconnected power lines; Generating one or more corrected media items that correct the one or more disconnected power lines; generating one or more enhanced media items by mixing portions of the one or more corrected media items with portions of the corresponding one or more original media items to increase enhanced randomness, and A segmented machine learning model is trained based on the training data to receive media items having one or more interfering objects and to output segmented masks of one or more segmented objects corresponding to the one or more interfering objects.
2. The method of claim 1, wherein generating the one or more enhanced media items comprises mixing the one or more corrected media items with corresponding one or more original media items with a checkerboard mask.
3. The method of claim 1, wherein generating the one or more corrected media items to correct the one or more broken power lines comprises: local contrast in the one or more original media items is modified to generate corresponding one or more enhanced media items.
4. A method according to claim 3, wherein the local contrast is modified using a gain curve that adds two bias curves together.
5. The method of claim 1, wherein generating the training data comprises enhancing one or more media items of the first set of media items by applying dilation to the segmented mask of the one or more interfering objects.
6. The method of claim 1, wherein the one or more interfering objects are organized into categories including at least one selected from the group of power lines, utility poles, towers, and combinations thereof.
7. The method of claim 1, wherein training the segmented machine learning model comprises: generating a high-capacity machine learning model based on the training data, and The high-capacity machine learning model is refined into a trained segmented machine learning model by running reasoning about training data segmented by the high-capacity machine learning model.
8. The method of claim 1, wherein the training data further comprises a composite image in which the interfering object is added to the front of an outdoor environmental object.
9. A computer-implemented method of removing an interfering object from a media item, the method comprising: Receiving a media item from a user; identifying one or more interfering objects in the media item; providing the media item to a trained segmented machine learning model; outputting a segmented mask of the one or more interfering objects in the media item using the trained segmented machine learning model, and Repairing a portion of the media item that matches the segmented mask to obtain an output media item, wherein the one or more interfering objects are not present in the output media item; Wherein the trained segmented machine learning model is trained by generating training data by: Identifying one or more original media items in a first set of media items that includes one or more disconnected power lines; Generating one or more corrected media items that correct the one or more disconnected power lines, and One or more enhanced media items for the training data are generated by mixing portions of the one or more corrected media items with portions of the corresponding one or more original media items to increase the enhanced randomness.
10. The method of claim 9, wherein the one or more interfering objects are organized into categories including at least one selected from the group of power lines, utility poles, towers, and combinations thereof.
11. The method of claim 9, further comprising providing suggestions to the user to remove the one or more interfering objects from the media item.
12. The method of claim 9, wherein the trained segmented machine learning model is trained using training data comprising the first and second sets of media items, wherein the first set of media items comprises interfering objects and the second set of media items comprises manual segments of the interfering objects.
13. A non-transitory computer-readable medium having instructions stored thereon, which when executed by one or more computers, cause the one or more computers to perform operations comprising: Generating training data comprising a first set of media items and a second set of media items, wherein the first set of media items comprises an interfering object and the second set of media items comprises a manual segmentation of the interfering object; Identifying one or more original media items in the first set of media items that includes one or more disconnected power lines; Generating one or more corrected media items that correct the one or more disconnected power lines; generating one or more enhanced media items by mixing portions of the one or more corrected media items with portions of the corresponding one or more original media items to increase enhanced randomness, and A segmented machine learning model is trained based on the training data to receive media items having one or more interfering objects and to output segmented masks of one or more segmented objects corresponding to the one or more interfering objects.
14. The computer-readable medium of claim 13, wherein generating the one or more enhanced media items comprises mixing the one or more corrected media items with corresponding one or more original media items with a checkerboard mask.
15. The computer-readable medium of claim 13, wherein generating the one or more corrected media items to correct the one or more broken power lines comprises: local contrast in the one or more original media items is modified to generate corresponding one or more enhanced media items.
16. The computer readable medium of claim 15, wherein the local contrast is modified using a gain curve that adds two bias curves together.
17. The computer-readable medium of claim 13, wherein generating the training data comprises enhancing one or more media items of the first set of media items by applying dilation to the segmented mask of the one or more interfering objects.
18. The computer-readable medium of claim 13, wherein the one or more interfering objects are organized into categories including at least one selected from the group of power lines, utility poles, towers, and combinations thereof.
19. The computer-readable medium of claim 13, wherein training the segmented machine learning model comprises: generating a high-capacity machine learning model based on the training data, and The high-capacity machine learning model is refined into a trained segmented machine learning model by running reasoning about training data segmented by the high-capacity machine learning model.
20. The computer-readable medium of claim 13, wherein the training data further comprises a composite image in which the interfering object is added to a front of an outdoor environmental object.

Description

Segmenting and removing objects from media items Cross Reference to Related Applications The present application claims priority from U.S. provisional patent application No.63/257,114, entitled "segmenting and removing objects from media items (SEGMENTING AND Removing Objects from MEDIA ITEMS)" filed on 10/18 of 2021, the entire contents of which are incorporated herein by reference. Background The user perceived quality of visual media items such as images (still images, images with selective motion, etc.) and video can be improved by removing certain objects that interfere with the focus of the media item. The interfering object may be manually removed, but the task may be laborious and incomplete. Furthermore, the disturbing objects are difficult to automatically remove from the media item, as such removal may result in additional objects or portions of objects in the image (erroneously identified as disturbing objects) being removed as well, over-triggering and impractical results, wherein additional objects are removed, or incomplete segmentation, wherein portions of removed objects remain visible. The user may employ manual image or video editing techniques to remove the interfering objects. But this task can be laborious and incomplete. Furthermore, automatically removing the interfering object is difficult because it may result in false positives where other objects or parts of the object are also removed or incomplete segmentation results in some part of the removed object still being visible. The background description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure. Disclosure of Invention The computer-implemented method includes generating training data comprising a first set of media items and a second set of media items, wherein the first set of media items corresponds to the second set of media items and includes manually segmented interference objects. The method further includes training a segmented machine learning model based on the training data to receive media items having one or more interfering objects and to output segmented masks corresponding to the one or more segmented objects of the one or more interfering objects. In some embodiments, the one or more interfering objects are power lines, and the method further comprises identifying one or more media items from a first set of media items comprising one or more disconnected power lines, and enhancing the one or more media items to correct the one or more disconnected power lines in the training data. In some embodiments, enhancing one or more media items to correct one or more broken power lines includes modifying local contrast in one or more media items to generate a corresponding one or more enhanced media items and mixing a portion of the one or more media items with a portion of the corresponding one or more enhanced media items. In some embodiments, the local contrast is modified using a gain curve that adds the two bias curves together. In some embodiments, generating training data includes enhancing one or more media items by applying dilation to a segmented mask of one or more interfering objects. In some embodiments, the one or more interfering objects are organized into categories including at least one selected from the group of power lines, utility poles, towers, and combinations thereof. In some embodiments, training the segmented machine learning model includes generating a high-capacity machine learning model based on the training data and refining the high-capacity machine learning model into a trained segmented machine learning model by running reasoning about training data segmented by the high-capacity machine learning model. In some embodiments, the training data further includes a composite image in which the interfering object is added to the front of the outdoor environment object. In some embodiments, a computer-implemented method for removing interfering objects from media items includes receiving media items from a user, identifying one or more interfering objects in the media items, providing the media items to a trained segmented machine learning model, outputting segmented masks of the one or more interfering objects in the media items using the trained segmented machine learning model, and repairing a portion of the media items that match the segmented masks to obtain output media items, wherein the one or more interfering objects are not present in the output media items. In some embodiments, the one or more interfering objects are organized into categories including at least one selected from the group of power lines, utility poles, towers, and combinations thereof. In s