EP-4738811-A1 - METHODS AND APPARATUSES FOR VISUALIZING AUDIO DATA IN A SURVEILLANCE SYSTEM

EP4738811A1EP 4738811 A1EP4738811 A1EP 4738811A1EP-4738811-A1

Abstract

Aspects of the present disclosure include a method, a server, and/or a non-transitory computer readable medium for receiving a plurality of images from a plurality of cameras monitoring the site, receiving a plurality of sounds from a plurality of microphones, synchronizing the plurality of images and the plurality of sounds based on first timestamps associated with the plurality of images and second timestamps associated with the plurality of sounds, providing a graphical user interface to display a representation of the plurality of sounds and the plurality of images, and providing a control for selecting at least one portion of the plurality of sounds via the graphical user interface.

Inventors

Sample, Benjamin
AGBOOLA, Akinboluwaji
MALANKAR, Shariwa Ravindra
PATNI, Darshan Ajit
CHANDRA, Sowmya

Assignees

Tyco Fire & Security GmbH

Dates

Publication Date: 20260506
Application Date: 20251030

Claims (15)

A server for monitoring a site, comprising: one or more memories storing instructions therein; one or more processors communicatively coupled with the one or more memories and configured to: receive a plurality of images from a plurality of cameras monitoring the site; receive a plurality of sounds from a plurality of microphones; synchronize the plurality of images and the plurality of sounds based on first timestamps associated with the plurality of images and second timestamps associated with the plurality of sounds; provide a graphical user interface to display a representation of the plurality of sounds and the plurality of images; and provide a control for selecting at least one portion of the plurality of sounds via the graphical user interface.
The server of claim 1, wherein the one or more processors are further configured to extract first sounds at a first frequency from the plurality of sounds; optionally, wherein providing the graphical user interface to display the representation comprises displaying audio intensities of the first sounds.
The server of claim 1 or 2, wherein the one or more processors are further configured to: receive, via the control, an indication for selecting the at least one portion of the plurality of sounds; identify at least one image synchronized with the at least one portion of the plurality of sounds; and provide the at least one image to the graphical user interface for display.
The server of any of claims 1 to 3, wherein providing the graphical user interface to display the representation comprises: displaying audio intensities of at least a portion of the sounds; and/or averaging audio intensities of at least a portion of the sounds as an average intensity; and displaying the average intensity.
The server of any of claims 1 to 4, wherein the one or more processors are further configured to provide one or more of at least one control or an event history.
A surveillance system for monitoring a site, comprising: a server configured to: receive a plurality of images from a plurality of cameras monitoring the site; receive a plurality of sounds from a plurality of microphones; synchronize the plurality of images and the plurality of sounds based on first timestamps associated with the plurality of images and second timestamps associated with the plurality of sounds; provide a graphical user interface to display a representation of the plurality of sounds and the plurality of images; and provide a control for selecting at least one portion of the plurality of sounds via the graphical user interface; the plurality of cameras; and the plurality of microphones.
The surveillance system of claim 6, wherein the server is further configured to extract first sounds at a first frequency from the plurality of sounds.
The surveillance system of claim 7, wherein providing the graphical user interface to display the representation comprises displaying audio intensities of the first sounds; optionally, wherein the server is further configured to: receive, via the control, an indication for selecting the at least one portion of the plurality of sounds; identify at least one image synchronized with the at least one portion of the plurality of sounds; and provide the at least one image to the graphical user interface for display.
The surveillance system of any of claims 6 to 8, wherein providing the graphical user interface to display the representation comprises: displaying audio intensities of at least a portion of the sounds; and/or averaging audio intensities of at least a portion of the sounds as an average intensity; and displaying the average intensity.
The surveillance system of any of claims 6 to 9, wherein the server is further configured to provide one or more of at least one control or an event history.
A method for monitoring a site, comprising: receiving a plurality of images from a plurality of cameras monitoring the site; receiving a plurality of sounds from a plurality of microphones; synchronizing the plurality of images and the plurality of sounds based on first timestamps associated with the plurality of images and second timestamps associated with the plurality of sounds; providing a graphical user interface to display a representation of the plurality of sounds and the plurality of images; and providing a control for selecting at least one portion of the plurality of sounds via the graphical user interface.
The method of claim 11, further comprising extracting first sounds at a first frequency from the plurality of sounds; optionally, wherein providing the graphical user interface to display the representation comprises displaying audio intensities of the first sounds.
The method of claim 11 or 12, further comprising: receiving, via the control, an indication for selecting the at least one portion of the plurality of sounds; identifying at least one image synchronized with the at least one portion of the plurality of sounds; and providing the at least one image to the graphical user interface for display.
The method of any of claims 11 to 13, wherein providing the graphical user interface to display the representation comprises displaying audio intensities of at least a portion of the sounds.
The method of any of claims 11 to 14, further comprising providing one or more of at least one control or an event history.

Description

BACKGROUND Surveillance cameras are frequently used to monitor a site such as event venues, commercial buildings, industrial sites, and/or residential houses. However, it may be costly and/or impractical for surveillance cameras to capture images of the entire site. Audio data collected from microphones may provide additional information to personnel monitoring a site. However, it is not clear how to effectively use audio data to supplement and/or enhance a surveillance system. Therefore, improvements are desired. SUMMARY This summary is provided to introduce a selection of concepts in a simplified form that are further described below, in particular in the DETAILED DESCRIPTION. This summary is not intended to identify key features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. Embodiments according to the invention are disclosed in particular in the appended claims. Aspects of the present disclosure include a method, a server, and/or a non-transitory computer readable medium for receiving a plurality of images from a plurality of cameras monitoring the site, receiving a plurality of sounds from a plurality of microphones, synchronizing the plurality of images and the plurality of sounds based on first timestamps associated with the plurality of images and second timestamps associated with the plurality of sounds, providing a graphical user interface to display a representation of the plurality of sounds and the plurality of images, and providing a control for selecting at least one portion of the plurality of sounds via the graphical user interface. BRIEF DESCRIPTION OF THE DRAWINGS The features believed to be characteristic of aspects of the disclosure are set forth in the appended claims. In the description that follows, like parts are marked throughout the specification and drawings with the same numerals, respectively. The drawing figures are not necessarily drawn to scale and certain figures may be shown in exaggerated or generalized form in the interest of clarity and conciseness. The disclosure itself, however, as well as a preferred mode of use, further objects and advantages thereof, will be best understood by reference to the following detailed description of illustrative aspects of the disclosure when read in conjunction with the accompanying drawings, wherein: FIG. 1 illustrates an example of an environment for monitoring a site according to aspects of the present disclosure.FIG. 2 illustrates a first graphical user interface in accordance with aspects of the present disclosure.FIG. 3 illustrates a second graphical user interface in accordance with aspects of the present disclosure.FIG. 4 illustrates an example of a computer system in accordance with aspects of the present disclosure.FIG. 5 illustrates a method of monitoring a site according to aspects of the present disclosure. DETAILED DESCRIPTION Aspects of the present disclosure include augmenting surveillance images with synchronized audio data. Specifically, the audio data may be displayed to show the corresponding sound intensity as a function of time. As such, a security personnel reviewing the surveillance images may be able to quickly locate a time associated with an elevated sound intensity and the corresponding surveillance images or videos. In particular aspects of the present disclosure, cameras may have microphones. As such, audio data may be examined when considering the historical events that have been captured by a given security camera. There could be sounds occurring off-frame of the camera that a user (e.g., a security personnel) is unaware of until he or she actually plays back footage from that point in time, unbeknownst to them to whether or not there is actually audio until the video is playing. Consequently, aspects of the current disclosure may provide additional insights to surveillance images. Security cameras will often have microphones to capture audio. These audio streams are sent to the Network Video Recorder (NVR) independently of the video stream. Additionally, audio streams from different devices can also be consumed from the NVR and be associated with the security footage of a camera. This audio is data that can be represented over time, using different unique data points. One aspect of the present disclosure includes measuring frequency and amplitude of the audio stream and show that as a graph over time. This graph may be displayed on a timeline in parallel to the video, which then gives an end user an additional data point to consider when viewing their timeline of events. For example, there may be a camera facing the entrance to a building. Simultaneously, there is an attack happening in the alley around the corner and people are screaming. As such, images alone may be insufficient to alert a user about the attack in the alley. On the contrary, the proposed solution may allow the user to see the data that shows a person is sc