US-12620233-B2 - Scalable intelligent video surveillance for the artificial intelligence of things

US12620233B2US 12620233 B2US12620233 B2US 12620233B2US-12620233-B2

Abstract

An artificial intelligence (AI) software product for providing scalable intelligent video surveillance for the artificial intelligence of things. The AI software product performs operations including receiving, at an edge node, a sequence of images of a target environment, where the AI software product resides in the edge node, and where the edge node is local to the target environment. The AI software product further observes one or more target objects in the sequence of images of the target environment in real time. The AI software product further detects at the edge node one or more anomalies in the target environment based on the observing of the one or more target objects in real time, where the one or more anomalies are specific to the target environment.

Inventors

Hamed TABKHIVAYGHAN

Assignees

THE UNIVERSITY OF NORTH CAROLINA AT CHARLOTTE

Dates

Publication Date: 20260505
Application Date: 20240502

Claims (20)

1 . A system comprising: an edge node configured to receive a sequence of images of a target environment from one or more perception sensors, wherein the edge node is local to the target environment and the one or more perception sensors; a local node associated with each of the one or more perception sensors of the edge node, wherein each local node comprises an artificial intelligence (AI) pipeline configured to process the sequence of images received from one of the one or more perception sensors to generate a sequence of feature extracted images; a global node associated with the edge node, wherein the global node is configured to process the sequence of feature extracted images received from the AI pipeline of each of the local nodes and configured to send processed anomaly information to a cloud server; one or more processors; and logic encoded in one or more non-transitory computer-readable storage media for execution by the one or more processors and when executed operable to cause the one or more processors to perform operations comprising: receiving the sequence of images of the target environment from the one or more perception sensors; observing one or more target objects in the sequence of images of the target environment in real time; generate, utilizing the AI pipeline of the local node, the sequence of feature extracted images including pose information images and movement information images, wherein the sequence of feature extracted images do not indicate facial or gait information; tracking the one or more target objects utilizing the global node and based on sequences of feature extracted images received from AI pipelines of multiple local nodes; and detecting at the edge node and based on the sequence of feature extracted images one or more anomalies in the target environment based on the observing of the one or more target objects in real time, wherein the one or more anomalies are specific to the target environment.
2 . The system of claim 1 , wherein the edge node comprises AI technologies that are trained globally for environments in general and trained locally for the target environment.
3 . The system of claim 1 , wherein the logic when executed by one or more processors is operable to cause the one or more processors to perform operations comprising: segmenting the sequences of images of the target environment, wherein the sequences of images comprise the sequence of images from each of multiple perception sensors of the one or more perception sensors; analyzing the one or more target objects in the sequences of images from the multiple perception sensors, wherein the sequences of images provide multiple perspectives of the one or more target objects in the target environment; and detecting at the edge node the one or more anomalies in the target environment based on the analyzing of the one or more target objects.
4 . The system of claim 1 , wherein the logic when executed by one or more processors is operable to cause the one or more processors to perform operations comprising: detecting the one or more target objects in the sequences of images, wherein the sequences of images comprise a sequence of images from each of multiple perception sensors of the one or more perception sensors; computing one or more of pose information, movement information, and gait information associated with each target object of the one or more target objects; and detecting at the edge node the one or more anomalies in the target environment based on the computing of at least one of the pose information or the movement information.
5 . The system of claim 1 , wherein the target environment is a public environment.
6 . The system of claim 1 , wherein the logic when executed by one or more processors is operable to cause the one or more processors to perform operations comprising detecting the one or more anomalies in the target environment without collecting personal identification information.
7 . The system of claim 1 , wherein the situational awareness system comprises legacy components for monitoring the target environment.
8 . An artificial intelligence (AI) software product with program instructions, which when executed by one or more processors are operable to cause the one or more processors to perform operations comprising: receiving, at an edge node, a sequence of images of a target environment from one or more perception sensors, wherein the AI software product resides in the edge node, wherein the edge node is local to the target environment and the one or more perception sensors, wherein the AI software product comprises a local node module associated with each of the one or more perception sensors and a global node module, wherein each local module comprises an AI pipeline configured to process the sequence of images received from one of the one or more perception sensors to generate a sequence of feature extracted images, and wherein the global node module is configured to process the sequence of feature extracted images received from the AI pipeline of each of the local nodes and configured to send processed anomaly information to a cloud server; observing one or more target objects in the sequence of images of the target environment in real time; generating, utilizing the AI pipeline of the local node, the sequence of feature extracted images including pose information images and movement information images, wherein the sequence of feature extracted images do not indicate facial or gait information; tracking the one or more target objects utilizing the global node and based on sequences of feature extracted images received from AI pipelines of multiple local nodes; and detecting at the edge node and based on the sequence of feature extracted images one or more anomalies in the target environment based on the observing of the one or more target objects in real time, wherein the one or more anomalies are specific to the target environment.
9 . The software product of claim 8 , wherein the edge node comprises AI technologies that are trained globally for environments in general and trained locally for the target environment.
10 . The software product of claim 8 , wherein the instructions when executed are further operable to cause the one or more processors to perform operations comprising: segmenting the sequences of images of the target environment, wherein the sequences of images comprise the sequence of images from each of multiple perception sensors of the one or more perception sensors; analyzing the one or more target objects in the sequences of images from the multiple perception sensors, wherein the sequences of images provide multiple perspectives of the one or more target objects in the target environment; and detecting at the edge node the one or more anomalies in the target environment based on the analyzing of the one or more target objects.
11 . The software product of claim 8 , wherein the instructions when executed are further operable to cause the one or more processors to perform operations comprising: detecting the one or more target objects in the sequences of images, wherein the sequences of images comprise a sequence of images from each of multiple perception sensors of the one or more perception sensors; computing one or more of pose information, movement information, and gait information associated with each target object of the one or more target objects; and detecting at the edge node the one or more anomalies in the target environment based on the computing of at least one of the pose information or the movement information.
12 . The software product of claim 8 , wherein the target environment is a public environment.
13 . The software product of claim 8 , wherein the instructions when executed are further operable to cause the one or more processors to perform operations comprising detecting the one or more anomalies in the target environment without collecting personal identification information.
14 . The software product of claim 8 , wherein the situational awareness system comprises legacy components for monitoring the target environment.
15 . A computer-implemented method for detecting anomalies in a target environment, the method comprising: receiving, at an edge node, sequence of images of the target environment from one or more perception sensors, wherein the edge node is local to the target environment and the one or more perception sensors, wherein the edge node comprises a local node associated with each of the one or more perception sensors and a global node, wherein each local node comprises an artificial intelligence (AI) pipeline configured to process the sequence of images received from one of the one or more perception sensors to generate a sequence of feature extracted images, and wherein the global node is configured to process the sequence of feature extracted images received from the AI pipeline of each of the local nodes and configured to send processed anomaly information to a cloud server; observing one or more target objects in the sequence of images of the target environment in real time; generating, utilizing the AI pipeline of the local node, the sequence of feature extracted images including pose information images and movement information images, wherein the sequence of feature extracted images do not indicate facial or gait information; tracking the one or more target objects utilizing the global node and based on sequences of feature extracted images received from AI pipelines of multiple local nodes; and detecting at the edge node and based on the sequence of feature extracted images one or more anomalies in the target environment based on the observing of the one or more target objects in real time, wherein the one or more anomalies are specific to the target environment.
16 . The method of claim 15 , wherein the edge node comprises AI technologies that are trained globally for environments in general and trained locally for the target environment.
17 . The method of claim 15 , further comprising: segmenting the sequences of images of the target environment, wherein the sequences of images comprise the sequence of images from each of multiple perception sensors of the one or more perception sensors; analyzing the one or more target objects in the sequences of images from the multiple perception sensors, wherein the sequences of images provide multiple perspectives of the one or more target objects in the target environment; and detecting at the edge node the one or more anomalies in the target environment based on the analyzing of the one or more target objects.
18 . The method of claim 15 , further comprising: detecting the one or more target objects in the sequences of images, wherein the sequences of images comprise a sequence of images from each of multiple perception sensors of the one or more perception sensors; computing one or more of pose information, movement information, and gait information associated with each target object of the one or more target objects; and detecting at the edge node the one or more anomalies in the target environment based on the computing of at least one of the pose information or the movement information.
19 . The method of claim 15 , wherein the target environment is a public environment.
20 . The method of claim 15 , further comprising detecting the one or more anomalies in the target environment without collecting personal identification information.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS The present disclosure claims the benefit of priority of co-pending U.S. provisional patent application No. 63/463,586, filed May 3, 2023, the entire contents of which are hereby incorporated by reference. The present disclosure further is a continuation-in-part (CIP) and claims the benefit of priority of co-pending U.S. patent application Ser. No. 17/716,671, filed Apr. 8, 2022, which in turn claims the benefit of priority of U.S. provisional patent application No. 63/327,840, filed on Apr. 6, 2022, and is a CIP and claims the benefit of priority of U.S. patent application Ser. No. 17/031,318, filed Sep. 24, 2020, which in turn claims the benefit of priority of U.S. provisional patent application Nos. 62/908,778, filed on Oct. 1, 2019, and 63/082,040, filed on Sep. 23, 2020, the entire contents of all of which are hereby incorporated by reference. STATEMENT OF GOVERNMENT INTEREST This invention was made with government support under award nos.: 1831795 and 1848727 awarded by the National Science Foundation. The government has certain rights in the invention. INTRODUCTION There is a growing need for effective and efficient surveillance technologies that can be deployed to protect cities, people, and infrastructure. For example, in Itaewon, South Korea, a holiday celebration left over 150 dead due to severe overcrowding, with many blaming the tragedy on careless government oversight. In Moore County, North Carolina, directed attacks against two power substations left over 45,000 residents without power for days as technicians rushed to restore power and authorities struggled to find the source of the attacks. With enough forewarning through smart video surveillance, they could have been prevented. The present introduction is provided as background context only and is not intended to be limiting in any manner. It will be readily apparent to those of ordinary skill in the art that the concepts and principles of the present disclosure may be implemented in other applications and contexts equally. SUMMARY In one illustrative embodiment, the present disclosure provides a system including: an edge node configured to receive a sequence of images (e.g., video footage) of a target environment from one or more perception sensors, where the edge node is local to the target environment and the one or more perception sensors; a local node associated with the edge node, where the local node comprises an artificial intelligence (AI) pipeline configured to process the video footage received from the one or more perception sensors; and a global node associated with the edge node, where the global node is configured to process the video footage received from the AI pipeline and configured to send processed anomaly information to a cloud server. The system further includes one or more processors, and logic encoded in one or more non-transitory computer-readable storage media for execution by the one or more processors. When executed the logic is operable to cause the one or more processors to perform operations comprising: receiving the video footage of the target environment from the one or more perception sensors; observing one or more target objects in the video footage of the target environment in real time; and detecting at the edge node one or more anomalies in the target environment based on the observing of the one or more target objects in real time, where the one or more anomalies are specific to the target environment. Optionally, in some embodiments, the edge node comprises AI technologies that are trained globally for environments in general and trained locally for the target environment. In some embodiments, the logic when executed by one or more processors is operable to cause the one or more processors to perform operations comprising: segmenting the video footage of the target environment, where the video footage comprises video footage from multiple perception sensors of the one or more perception sensors; analyzing the one or more target objects in the video footage from the multiple perception sensors, where the video footage provides multiple perspectives of the one or more target objects in the target environment; and detecting at the edge node the one or more anomalies in the target environment based on the analyzing of the one or more target objects. In some embodiments, the logic when executed by one or more processors is operable to cause the one or more processors to perform operations comprising: detecting the one or more target objects in the video footage, where the video footage comprises video footage from multiple perception sensors of the one or more perception sensors; computing one or more of pose information, movement information, and gait information associated with each target object of the one or more target objects; and detecting at the edge node the one or more anomalies in the target environment based on the computing of the pose information, th