Search

US-12627769-B1 - Video intercom communication for VOIP device

US12627769B1US 12627769 B1US12627769 B1US 12627769B1US-12627769-B1

Abstract

A first invite message is received from an intercom device to establish an audio call with a voice over internet protocol (VOIP) device. A second invite message is transmitted to the VOIP device in response to receiving the first invite message. The second invite message indicates the first invite message from the intercom device to establish the audio call. The audio call is elevated to a video call based on a determination that the intercom device and the VOIP device have video capability. A third invite message is transmitted to the intercom device that indicates elevation of the audio call to a video call. Video data is received from the intercom device and forwarded to the VOIP device.

Inventors

  • Karen Kuei Ren Hong
  • Kwan Seng Low
  • Hui Sun
  • Chunsong Zhu

Assignees

  • ZOOM COMMUNICATIONS, INC.

Dates

Publication Date
20260512
Application Date
20240130

Claims (20)

  1. 1 . A method, comprising: receiving a first session initiation protocol (SIP) invite message from an intercom device to establish an audio call with a voice over internet protocol (VOIP) device; transmitting, to the VOIP device, a second SIP invite message that indicates the first SIP invite message from the intercom device; determining whether each of the intercom device and the VOIP device have video capability, wherein determining whether each of the intercom device and the VOIP device have video capability comprises: obtaining a video intercom device capability field from the first SIP invite message and a VOIP device capability field from an accept message received from the VOIP device, and determining based on the video intercom device capability field and the VOIP device capability field, whether the intercom device and the VOIP device each have video capability; elevating the audio call to a video call based on a determination that both the intercom device and the VOIP device have video capability, wherein elevating the audio call to the video call includes bypassing a backend server and routing video data directly to a session border controller (SBC) of a cloud private branch exchange (PBX) system; transmitting a third SIP invite message that indicates elevation of the audio call to the video call; and forwarding video data received from the intercom device to the VOIP device, wherein the video data is sent via the SBC using a secure real-time protocol (SRTP).
  2. 2 . The method of claim 1 , wherein the video data is based on a secure real-time protocol.
  3. 3 . The method of claim 1 , further comprising: performing a real-time protocol negotiation with the VOIP device prior to forwarding the video data.
  4. 4 . The method of claim 1 , wherein the first SIP invite message includes one or more capabilities of the intercom device.
  5. 5 . The method of claim 1 , further comprising: receiving an accept message from the VOIP device responsive to the second SIP invite message.
  6. 6 . The method of claim 5 , wherein the accept message includes one or more capabilities of the VOIP device.
  7. 7 . The method of claim 6 , wherein determining whether each of the intercom device and the VOIP device have video capability is based on a first indicator in the first SIP invite message and a second indicator in the accept message.
  8. 8 . A system, comprising: a session border controller (SBC) configured to: receive a first session initiation protocol (SIP) invite message from an intercom device to establish an audio call with a voice over internet protocol (VOIP) device; transmit, to the VOIP device, a second SIP invite message that indicates the first SIP invite message from the intercom device; determine whether each of the intercom device and the VOIP device have video capability, wherein the SBC is configured to: obtain a video intercom device capability field from the first SIP invite message and a VOIP device capability field from an accept message received from the VOIP device, and determine based on the video intercom device capability field and the VOIP device capability field whether the intercom device and the VOIP device each have video capability; elevate the audio call to a video call based on a determination that both the intercom device and the VOIP device have video capability, wherein elevation of the audio call to the video call bypasses a background server and routes video data directly to an SBC of a cloud private branch exchange (PBX) system; transmit a third SIP invite message that indicates elevation of the audio call to the video call; and forward video data received from the intercom device to the VOIP device, wherein the video data is sent via the SBC using a secure real-time protocol (SRTP).
  9. 9 . The system of claim 8 , wherein the video data is based on a real-time protocol.
  10. 10 . The system of claim 8 , wherein the SBC is further configured to perform a protocol negotiation with the VOIP device prior to forwarding the video data.
  11. 11 . The system of claim 8 , wherein the first SIP invite message includes an indicator of a video capability of the intercom device.
  12. 12 . The system of claim 8 , further comprising: a server configured to receive an accept message from the VOIP device responsive to the second SIP invite message.
  13. 13 . The system of claim 12 , wherein the accept message includes a video capability of the VOIP device.
  14. 14 . The system of claim 8 , wherein the SBC is configured to determine whether each of the intercom device and the VOIP device have video capability based on data obtained from a look up table.
  15. 15 . A non-transitory computer-readable medium comprising instructions that when executed by one or more processors, causes the one or more processors to perform operations comprising: receiving a first session initiation protocol (SIP) invite message from an intercom device to establish an audio call with a voice over internet protocol (VOIP) device; transmitting, to the VOIP device, a second SIP invite message that indicates the first SIP invite message from the intercom device; determining whether each of the intercom device and the VOIP device have video capability, wherein determining whether each of the intercom device and the VOIP device have video capability comprises: obtaining a video intercom device capability field from the first SIP invite message and a VOIP device capability field from an accept message received from the VOIP device, and determining based on the video intercom device capability field and the VOIP device capability field, whether the intercom device and the VOIP device each have video capability; elevating the audio call to a video call based on a determination that both the intercom device and the VOIP device have video capability, wherein elevating the audio call to the video call includes bypassing a backend server and routing video data directly to a session border controller (SBC) of a cloud private branch exchange (PBX) system; transmitting a third SIP invite message that indicates elevation of the audio call to the video call; and forwarding video data received from the intercom device to the VOIP device, wherein the video data is sent via the SBC using a secure real-time protocol (SRTP).
  16. 16 . The non-transitory computer-readable medium of claim 15 , wherein the first SIP invite message includes one or more of an intercom device identifier (ID) field, an intercom device address field, an intercom device capability field, a SIP invite message indicator field, a VOIP device ID field, a VOIP device address field, or a call type field.
  17. 17 . The non-transitory computer-readable medium of claim 15 , wherein the second SIP invite message includes one or more of an intercom device identifier (ID) field, an intercom device address field, a SIP invite message indicator field, a VOIP device ID field, a VOIP device address field, or a call type field.
  18. 18 . The non-transitory computer-readable medium of claim 15 , further comprising: receiving an accept message from the VOIP device responsive to an input.
  19. 19 . The non-transitory computer-readable medium of claim 18 , wherein the input is a touch input, a gesture input, a keyboard input, a mouse input, or an input associated with picking up a receiver of the VOIP device.
  20. 20 . The non-transitory computer-readable medium of claim 18 , wherein the accept message includes one or more capabilities of the VOIP device.

Description

FIELD This disclosure generally relates to voice over internet protocol (VOIP) device communication, and, more specifically, to VOIP device communication with a video intercom device. BRIEF DESCRIPTION OF THE DRAWINGS This disclosure is best understood from the following detailed description when read in conjunction with the accompanying drawings. It is emphasized that, according to common practice, the various features of the drawings are not to-scale. On the contrary, the dimensions of the various features are arbitrarily expanded or reduced for clarity. FIG. 1 is a block diagram of an example of an electronic computing and communications system. FIG. 2 is a block diagram of an example internal configuration of a computing device of an electronic computing and communications system. FIG. 3 is a block diagram of an example of a software platform implemented by an electronic computing and communications system. FIG. 4 is a block diagram of an example of a communications system for VOIP communications. FIGS. 5A and 5B are collectively a swim lane diagram of an example of a communications system configured to communicate with a video intercom device. FIGS. 6A and 6B are swim lane diagrams of a VOIP device configured to communicate with a video intercom device. FIG. 7 is a flowchart of an example of a method for VOIP device communication with a video intercom device. FIG. 8 is a flowchart of an example of a method for determining whether to switch from an audio call to a video call. DETAILED DESCRIPTION Enterprise entities rely upon several modes of communication to support their operations, including telephone, email, internal messaging, and the like. These separate modes of communication have historically been implemented by service providers whose services are not integrated with one another. The disconnect between these services, in at least some cases, requires information to be manually passed by users from one service to the next. Furthermore, some services, such as telephony services, are traditionally delivered via on-premises solutions, meaning that remote workers and those who are generally increasingly mobile may be unable to rely upon them. One solution is by way of a unified communications as a service (UCaaS) platform, which includes several communications services integrated over a network, such as the Internet, to deliver a complete communication experience regardless of physical location. Enterprise customers have long relied upon on-premises PBXs to deliver phone communications over voice over internet protocol (VOIP), integrated services digital network (ISDN), and analog approaches. In recent years, cloud-based PBX approaches, or simply cloud PBXs, have been introduced to implement traditional PBX functionality in a virtual manner. Thus, rather than relying upon large hardware solutions on-site, cloud PBX customers may use data center hardware to achieve the same call routing and other functionality of a conventional PBX. Conventional cloud PBX systems (e.g., of UCaaS platforms) are configured for VOIP communications and cannot accommodate video media. These conventional cloud PBX systems route calls through a backend server (e.g., a freeswitch server) and do not support session initiation protocol (SIP) video communications between video intercom devices (e.g., video doorbells) and VOIP devices (e.g., desktop phones). Since the backend server of the conventional cloud PBX system is configured for VOIP communications, these systems cannot accommodate video media. Typical video media has a much higher bandwidth requirement (e.g., 5-10 Mbps) than audio media (e.g., less than 100 kbps). Streaming video calls via a conventional cloud PBX system can overwhelm the system and result in degraded call quality and overall performance, and in some instances, can crash the system entirely. Accordingly, calls between conventional VOIP devices and video intercom devices over conventional cloud PBX systems are limited to audio-only calls. Implementations of this disclosure address problems such as these by elevating an audio call from the video intercom device to a video call and bypassing the backend server to route the video call directly to the session border controller (SBC) of the cloud PBX system. By elevating the audio call to a video call, video data can be sent from the video intercom device to the VOIP device via the SBC. The video data can be sent via the SBC using a secure real-time protocol (SRTP). To describe some implementations in greater detail, reference is first made to examples of hardware and software structures used to implement a system for supporting video communications between VOIP devices and video intercom devices. FIG. 1 is a block diagram of an example of an electronic computing and communications system 100, which can be or include a distributed computing system (e.g., a client-server computing system), a cloud computing system, a clustered computing system, or the like. Th