CN-122027761-A - Method, device, terminal and storage medium for displaying audio and video call suspension window

CN122027761ACN 122027761 ACN122027761 ACN 122027761ACN-122027761-A

Abstract

The application provides a display method, a device, a terminal and a storage medium for an audio and video call suspension window. The method comprises the steps of displaying a 1-to-1 call interface on a main interface in a full screen mode, responding to a floating window triggering operation to create a floating window, including creating a floating window instance, configuring a container view associated with the floating window instance, setting the container view as a content source of the floating window instance, packaging the 1-to-1 call view and the multi-user call view in the container view, setting the 1-to-1 call view as an active sub-view of the container view, converting the 1-to-1 call into the multi-user call when a new user joins the 1-to-1 call, switching the active sub-view of the container view into the multi-user call view, confirming the user with the largest current speaking sound according to audio and video streams of all users, and placing the audio and video stream view of the user with the largest current speaking sound on the top layer of the multi-user call view. The application can continuously display the floating window when the 1-to-1 call is converted into the multi-person call.

Inventors

LUO YUAN
ZHANG SHUNXING
LI YUAN
FANG SHUIBO
CHEN ZHILIE

Assignees

深圳市九牛一毛智能物联科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260324

Claims (10)

1. The method for displaying the audio and video call floating window is suitable for a terminal adopting an iOS operating system, and comprises the following steps: displaying a call interface of 1-to-1 call on the main interface in a full screen manner; Creating a floating window instance, configuring a container view associated with the floating window instance, setting the container view as a content source of the floating window instance, and packaging a 1-to-1 call view and a multi-user call view in the container view, wherein the 1-to-1 call view comprises an audio and video stream view of a local user and an audio and video stream view of a remote user, and the multi-user call view comprises an audio and video stream view of a plurality of users; When a new user joins the 1-to-1 call, converting the 1-to-1 call into a multi-person call, and switching the active sub-view of the container view into the multi-person call view; and according to the audio and video streams of each user, confirming the user with the largest current speaking voice, and placing the audio and video stream view of the user with the largest current speaking voice on the top layer of the multi-person conversation view so as to display the video picture of the user with the largest current speaking voice in the floating window.
2. The method of claim 1, wherein the setting the 1-to-1 talk view as the active sub-view of the container view comprises: Inputting the audio and video stream of the local end user into the audio and video stream view of the local end user and rendering, and inputting the audio and video stream of the remote end user into the audio and video stream view of the remote end user and rendering; capturing GPU rendering output of the audio and video streaming view of the local end user and the audio and video streaming view of the remote end user.
3. The method of claim 1, wherein the switching the active sub-view of the container view to the multi-person conversation view comprises: Switching the audio and video streams of a local user and a remote user in the 1-to-1 call to the audio and video stream views of two users in the multi-user call view and rendering, and simultaneously inputting the audio and video streams of the newly added user into the audio and video stream views of the user and rendering; and capturing GPU rendering output of audio and video streaming views of each user.
4. The method of claim 1, wherein said identifying the user with the loudest current speech comprises: according to the volume information of the audio and video streams of each user, the volume of each user is calculated in real time through a low-pass filtering algorithm, so that the user with the largest speaking sound is found.
5. The method of claim 4, wherein the volume level of each user is calculated according to the formula: , Wherein, the Representing the smoothed volume value calculated at the current time, Representing the smoothed volume value calculated last time, The number of iterations is indicated and, Representing the root mean square volume value calculated for the current audio, Representing the filter coefficients of the filter, 。
6. A display device for an audio-video call suspension window, the device being adapted for a terminal employing an iOS operating system, the device comprising: The initial display module is used for displaying a call interface of 1-to-1 call on the main interface in a full screen manner; The floating window creation module is used for responding to a floating window triggering operation to create a floating window, and comprises the steps of creating a floating window instance, configuring a container view associated with the floating window instance, setting the container view as a content source of the floating window instance, and packaging a 1-to-1 conversation view and a multi-user conversation view in the container view, wherein the 1-to-1 conversation view comprises an audio and video stream view of a local user and an audio and video stream view of a remote user, and the multi-user conversation view comprises an audio and video stream view of a plurality of users; The call migration module is used for converting the 1-to-1 call into a multi-person call and switching the active sub-view of the container view into the multi-person call view when a new user joins the 1-to-1 call; and the dynamic layout module is used for confirming the user with the largest current speaking voice according to the audio and video stream of each user, and placing the audio and video stream view of the user with the largest current speaking voice on the top layer of the multi-person conversation view so as to display the video picture of the user with the largest current speaking voice in the floating window.
7. The apparatus of claim 6, wherein the call migration module is configured to: Switching the audio and video streams of a local user and a remote user in the 1-to-1 call to the audio and video stream views of two users in the multi-user call view and rendering, and simultaneously inputting the audio and video streams of the newly added user into the audio and video stream views of the user and rendering; and capturing GPU rendering output of audio and video streaming views of each user.
8. The apparatus of claim 6, wherein the dynamic layout module is configured to: according to the volume information of the audio and video streams of each user, the volume of each user is calculated in real time through a low-pass filtering algorithm, so that the user with the largest speaking sound is found.
9. A terminal, the terminal comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor, which when executed, implement the method of displaying an audio video call suspension according to any one of claims 1 to 5.
10. A non-transitory computer readable storage medium having instructions stored thereon that when executed implement the method of displaying an audio video call suspension window of any of claims 1 to 5.

Description

Method, device, terminal and storage medium for displaying audio and video call suspension window Technical Field The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a terminal, and a storage medium for displaying an audio/video call suspension window. Background With the rapid development of mobile internet technology, mobile audio-video communication has become an important way for people to communicate daily and work cooperatively. The need for multitasking is becoming more pronounced when users use audio-video telephony features, such as viewing messages, browsing files, etc. while making a call. A Picture-In-Picture (PIP) function has been developed, which enables a user to operate other applications while maintaining a call, greatly improving the efficiency of multitasking. In an audio-video call scenario, a call type may be converted, such as from a 1-to-1 call to a multi-person call. However, current mainstream audio-video call applications cannot guarantee a continuous display of the floating window when handling such call type transitions. Taking the iOS system as an example, since the pip API (Application Programming Interface ) of the iOS system requires that the provided container view cannot be changed, the view content of the 1-to-1 call and the multi-user call often have a large difference, and are generally different view objects, and once the 1-to-1 call view in the pip is changed to the multi-user call view, the floating window disappears, thereby affecting the user experience. Disclosure of Invention In view of the above, the present application provides a method, an apparatus, a terminal, and a storage medium for displaying an audio/video call floating window, which can continuously display the floating window when a 1-to-1 call is converted into a multi-user call, so as to realize seamless migration of the audio/video call, and ensure absolute continuity of audio/video streaming pictures. In a first aspect, the present application provides a method for displaying an audio/video call suspension window, which is applicable to a terminal adopting an iOS operating system, including: displaying a call interface of 1-to-1 call on the main interface in a full screen manner; The method comprises the steps of responding to a floating window triggering operation to create a floating window, namely creating a floating window instance, configuring a container view associated with the floating window instance, setting the container view as a content source of the floating window instance, and packaging a 1-to-1 call view and a multi-user call view in the container view, wherein the 1-to-1 call view comprises an audio and video stream view of a local end user and an audio and video stream view of a remote end user, and the multi-user call view comprises an audio and video stream view of a plurality of users; When a new user joins in a 1-to-1 call, converting the 1-to-1 call into a multi-person call, and switching the active sub-view of the container view into a multi-person call view; and according to the audio and video streams of each user, confirming the user with the largest current speaking voice, and placing the audio and video stream view of the user with the largest current speaking voice on the top layer of the multi-person conversation view so as to display the video picture of the user with the largest current speaking voice in the floating window. In a specific embodiment, setting the 1-to-1 conversation view as an active child view of the container view includes: Inputting the audio and video stream of the local end user into the audio and video stream view of the local end user and rendering, and inputting the audio and video stream of the remote end user into the audio and video stream view of the remote end user and rendering; And capturing GPU rendering output of the audio and video streaming view of the local end user and the audio and video streaming view of the remote end user. In a specific embodiment, switching the active sub-view of the container view to the multi-person talk view includes: Switching the audio and video streams of a local user and a remote user in the 1-to-1 call to the audio and video stream views of two users in the multi-user call view and rendering, and simultaneously inputting the audio and video streams of the newly added user into the audio and video stream views of the user and rendering; and capturing GPU rendering output of audio and video streaming views of each user. In a specific embodiment, identifying the user with the highest current speech sound comprises: according to the volume information of the audio and video streams of each user, the volume of each user is calculated in real time through a low-pass filtering algorithm, so that the user with the largest speaking sound is found. In a specific embodiment, the volume level of each user is calculated according to the following f