CN-115968544-B - Real-time virtual remote delivery in a browser

CN115968544BCN 115968544 BCN115968544 BCN 115968544BCN-115968544-B

Abstract

A method includes opening a web-based video call in a browser on a first device (145), receiving a request from a second device (150) to join the web-based video call, capturing (110) video including frames (105) by the first device, segmenting (115) the frames by the first device, selecting at least one segment (120) of the segmented frames by the first device, and streaming (125) the video including the at least one segment directly from the first device to the second device as a real-time virtual remote transfer (140).

Inventors

Jason Mayes

Assignees

谷歌有限责任公司

Dates

Publication Date: 20260508
Application Date: 20200824

Claims (20)

1. A method for streaming video in a video conferencing web-based environment, the method comprising: opening a web-based video call in a browser on a first device; receiving, by the first device, a request from a second device to join the web-based video call; capturing, by the first device, a first video comprising frames; Segmenting, by the first device, the frame; Selecting, by the first device, at least one segment of the segmented frame; streaming the first video comprising the at least one segment as a real-time virtual remote transmission directly from the first device to the second device; receiving, by the first device, streaming video directly from the second device as a second video, the second video comprising a second real-time virtual remote transfer image; Generating a plane and positioning the plane within the second video, the plane having a size proportional to a display of a device rendering the web-based video call; Directing, by the first device, the second video based on an environment of the first device; Projecting, by the first device, the second video onto a background representing the environment of the first device to generate a third video comprising a second real-time virtual remote transmission image, and The third video is rendered by the first device.
2. The method of claim 1, wherein establishing the web-based video call comprises loading a web page, The web page includes code configured to implement a trained machine learning model, and The trained machine learning model is configured to segment the frame and select the at least one segment.
3. The method of claim 1, wherein the at least one clip is an image of a participant in the web-based video call.
4. The method of claim 1, wherein the web-based video call is implemented using a web-based communication standard.
5. The method of claim 1, wherein the segmentation of the frame comprises: Grouping pixels in the frame into semantic regions to locate objects and boundaries, Classifying pixels of the frame into two categories 1) pixels representing a person and 2) pixels representing a background, and -Segmenting the pixels representing the person from the frame.
6. The method according to claim 1, wherein: The segmenting of the frame includes identifying each object in the frame, Selecting at least one segment includes selecting an object as the at least one segment, and The object is a participant in the web-based video call.
7. The method of claim 1, wherein the at least one clip is an image of a participant in the web-based video call, the method further comprising: the image is converted from a two-dimensional image to a three-dimensional image.
8. The method of claim 1, wherein the at least one clip is an image of a participant in the web-based video call, the method further comprising: a filter is applied to the image.
9. The method of any of claims 1 to 8, wherein the web-based video call is implemented as a zero-install web application.
10. The method of claim 2, wherein the web page including the code is configured to implement a web-based augmented reality tool.
11. The method of claim 1, wherein orienting the second video comprises: determining a normal vector associated with the second video; rotating and translating at least one of the first video based on the normal vector, and Projecting the first video onto the second video includes adding the first video to the plane.
12. A method for streaming video in a video conferencing web-based environment, the method comprising: opening a web-based video call in a browser on a first device; transmitting a request from a second device to join the web-based video call; receiving, at the first device, streaming video directly from the second device as a first video; Generating a plane having a size proportional to a display of a device rendering the web-based video call; capturing, by the first device, a second video; Directing, by the first device, the first video based on the second video; projecting, by the first device, the first video onto the second video to generate a third video, and The third video is rendered by the first device.
13. The method of claim 12, wherein orienting the first video comprises: Determining a normal vector associated with the second video, and At least one of rotating and translating the first video based on the normal vector.
14. The method of claim 12 or 13, further comprising generating a plane and positioning the plane within the second video, and wherein projecting the first video onto the second video comprises adding the first video to the plane.
15. The method of claim 12, further comprising: Generating a plane and locating the plane in the second video; orienting the first video includes: Determining a normal vector associated with the second video, and Rotating and translating at least one of the first video based on the normal vector, and Projecting the first video onto the second video includes adding the first video to the plane.
16. The method of claim 12, wherein the first video is a video of a first participant of the web-based video call and the second video is a real-world video.
17. The method of claim 12, wherein the web-based video call comprises code configured to implement a trained machine learning model, and the web-based video call comprises code configured to implement a web-based augmented reality tool.
18. The method of claim 12, wherein a web-based video call web page and the web-based video call are implemented as a zero-install web application.
19. A method for streaming video in a video conferencing web-based environment, the method comprising: opening a web-based video call in a browser on a first device; Receiving a request from a second device to join the web-based video call; capturing, by a first device, a first video comprising frames; Segmenting, by the first device, the frame; Selecting, by the first device, at least one segment of the segmented frame; streaming the first video including the at least one segment as a first real-time virtual remote transfer image directly from the first device to the second device; receiving, by the first device, streaming video directly from the second device as a second video, the second video comprising a second real-time virtual remote transfer image; Generating a plane and positioning the plane within the second video, the plane having a size proportional to a display of a device rendering the web-based video call; Capturing, by the first device, a third video; Directing, by the first device, the second video based on the third video; projecting, by the first device, the second video into the third video to generate a fourth video comprising the second real-time virtual remote transmission image, and Rendering, by the first device, a web page including the fourth video.
20. The method of claim 19 wherein establishing the web-based video call comprises loading a web page, The web page includes code configured to implement a trained machine learning model, The trained machine learning model is configured to segment the frame and select the at least one segment, and The web page includes code configured to implement a web-based augmented reality tool.

Description

Real-time virtual remote delivery in a browser Technical Field Embodiments relate to streaming video in a video conferencing web-based environment. Background Video calls can be perceived by users as being separate from each other. In other words, social interactions can be perceived as far apart because two or more participants are in different locations, with each participant viewing other locations or artificial backgrounds on a viewing device (e.g., mobile phone). Furthermore, in order to conduct a video conference with advanced features (e.g., background modification), full feature applications need to be installed on the user device. Disclosure of Invention In a general aspect, a device, system, non-transitory computer readable medium having stored thereon computer executable program code capable of executing on a computer system, and/or method is capable of performing a process in a method that includes opening a web-based video call in a browser on a first device, receiving a request to join the web-based video call from a second device, capturing video including frames by the first device, segmenting the frames by the first device, selecting at least one segment of the segmented frames by the first device, and streaming the video including the at least one segment as a real-time virtual remote transmission directly from the first device to the second device. Implementations can include one or more of the following features. For example, opening the web-based video call includes loading a web page including code configured to implement a trained machine learning model that can be configured to segment the frame and select the at least one segment. The at least one clip can be an image of a participant in the web-based video call. The web-based video call can be implemented using a web-based communication standard. The segmentation of the frame can include grouping pixels in the frame into semantic regions to locate objects and boundaries, classifying the pixels of the frame into two categories 1) pixels representing people and 2) pixels representing backgrounds, and segmenting the pixels representing the people from the frame. The segmenting of the frame can include identifying each object in the frame, selecting at least one segment includes selecting an object as the at least one segment, and the object can be a participant in the web-based video call. The at least one clip can be an image of a participant in the web-based video call, and the method can further include converting the image from a two-dimensional image to a three-dimensional image. The at least one clip can be an image of a participant in the web-based video call, and the method can further include applying a filter to the image. The web-based video call can be implemented as a zero-install web application. In another general aspect, an apparatus, system, non-transitory computer-readable medium having computer-executable program code stored thereon that is executable on a computer system, and/or method is capable of performing a process with a method that includes opening a web-based video call web page in a browser on a first device, transmitting, by the first device, a request to join a web-based video call from a second device, receiving, at the first device, streaming video directly from the second device as a first video, capturing, by the first device, a second video, directing, by the first device, the first video based on the second video, projecting, by the first device, into the second video to generate a third video, and rendering, by the first device, a web page that includes the third video. Implementations can include one or more of the following features. For example, the method can further include generating a plane and the orientation of the first video can include determining a normal vector associated with the second video and at least one of rotating and translating the first video based on the normal vector. The method can further include generating a plane and locating the plane in the second video, wherein projecting the first video into the second video includes adding the first video to the plane. The method can further include generating a plane and positioning the plane in the second video, and the orienting of the first video can include determining a normal vector associated with the second video, and at least one of rotating and translating the first video based on the normal vector, and projecting the first video into the second video can include adding the first video to the plane. The first video can be a first participant in the web-based video call and the second video can be a real-world video. The plane can be a transparent two-dimensional virtual structure located in the second video. The plane can have a size proportional to a display of a device rendering the web-based video call webpage. The web-based video call web page can include code configured to implement a trained machine learning mod