US-12618676-B2 - Aerial vehicle tracking using dynamic aleatoric uncertainty
Abstract
Techniques for aerial vehicle tracking using dynamic aleatoric uncertainty covariance estimation are presented. The techniques include: obtaining an image depicting at least one aerial vehicle of interest; passing the image to a first machine learning subsystem, which provides at least one feature vector; inputting the at least one feature vector to a second machine learning subsystem, where the second machine learning subsystem is trained to provide detected aerial vehicle identification data sets (including respective aerial vehicle coordinates, respective aerial vehicle bounding box dimensions, and respective dynamic aleatoric uncertainty covariance values) corresponding to input feature vectors; providing at least one detected aerial vehicle identification data set to a recursive Bayesian estimator subsystem, from which at least one filtered set of aerial vehicle coordinates, representing a real-time location of a respective aerial vehicle of interest, is obtained; and outputting the at least one filtered set of aerial vehicle coordinates.
Inventors
- Alyssa Bekai ROSE
- Trevor M. HOUCHENS
- Lena M. DOWNES
- Charles L. BURKS
- SILDOMAR T. MONTEIRO
Assignees
- THE BOEING COMPANY
Dates
- Publication Date
- 20260505
- Application Date
- 20240904
Claims (20)
- 1 . A method of aerial vehicle tracking using dynamic aleatoric uncertainty covariance estimation, the method comprising: obtaining an image depicting at least one aerial vehicle of interest; passing the image to a first machine learning subsystem, from which at least one feature vector is obtained, wherein the first machine learning subsystem is trained with a first training corpus comprising a set of labeled images, wherein a respective labeled image comprises a respective image and a respective label; inputting the at least one feature vector to a second machine learning subsystem, wherein the second machine learning subsystem is trained with a second training corpus comprising a set of labeled feature vectors, wherein a respective labeled feature vector comprises a respective feature vector, respective centroid coordinates, and respective bounding box dimensions, wherein the second machine learning subsystem is trained based on a loss to provide detected aerial vehicle identification data sets corresponding to input feature vectors, wherein the loss comprises (1) a regression loss for aerial vehicle coordinates and aerial vehicle bounding box dimensions, and (2) a Gaussian loss for aleatoric uncertainty, and wherein a respective detected aerial vehicle identification data set comprises: respective aerial vehicle coordinates, respective aerial vehicle bounding box dimensions, and respective dynamic aleatoric uncertainty covariance values; providing at least one detected aerial vehicle identification data set, output from the second machine learning subsystem in response to the inputting, to a recursive Bayesian estimator subsystem, from which at least one filtered set of aerial vehicle coordinates is obtained, wherein the recursive Bayesian estimator subsystem assigns a respective detected aerial vehicle identification data set to a respective Bayesian estimator, from which a respective filtered set of aerial vehicle coordinates is obtained, wherein a respective filtered set of aerial vehicle coordinates represent a real-time location of a respective aerial vehicle of interest; and outputting the at least one filtered set of aerial vehicle coordinates.
- 2 . The method of claim 1 , wherein respective dynamic aleatoric uncertainty covariance values are obtained from a single pass of a respective feature vector through the second machine learning subsystem.
- 3 . The method of claim 1 , wherein respective dynamic aleatoric uncertainty covariance values comprise at least one of: a respective covariance matrix for respective aerial vehicle coordinates, or respective aerial vehicle bounding box dimensions.
- 4 . The method of claim 1 , wherein the second machine learning subsystem comprises a multivariate deep evidential regression network.
- 5 . The method of claim 1 , wherein the Gaussian loss comprises a trace of a log-Cholesky decomposition of a matrix representing dynamic aleatoric uncertainty covariance values.
- 6 . The method of claim 1 , wherein a respective detected aerial vehicle identification data set further comprises an aerial vehicle classification label.
- 7 . The method of claim 1 , wherein a respective Bayesian estimator comprises a respective Kalman filter.
- 8 . The method of claim 7 , wherein the recursive Bayesian estimator subsystem is configured to provide a respective Kalman filter for each of a plurality of aerial vehicles of interest detected in the image.
- 9 . The method of claim 1 , wherein the first machine learning subsystem comprises a convolutional neural network and a multiscale feature decoder.
- 10 . The method of claim 1 , wherein respective filtered aerial vehicle bounding box dimensions are obtained from the respective Bayesian estimator, and wherein the outputting further comprises outputting at least one set of filtered aerial vehicle bounding box dimensions.
- 11 . A system for aerial vehicle tracking using dynamic aleatoric uncertainty covariance estimation, the system comprising: a non-transitory computer readable medium comprising instructions; and at least one electronic processor that executes the instructions to perform operations comprising: obtaining an image depicting at least one aerial vehicle of interest; passing the image to a first machine learning subsystem, from which at least one feature vector is obtained, wherein the first machine learning subsystem is trained with a first training corpus comprising a set of labeled images, wherein a respective labeled image comprises a respective image and a respective label; inputting the at least one feature vector to a second machine learning subsystem, wherein the second machine learning subsystem is trained with a second training corpus comprising a set of labeled feature vectors, wherein a respective labeled feature vector comprises a respective feature vector, respective centroid coordinates, and respective bounding box dimensions, wherein the second machine learning subsystem is trained based on a loss to provide detected aerial vehicle identification data sets corresponding to input feature vectors, wherein the loss comprises (1) a regression loss for aerial vehicle coordinates and aerial vehicle bounding box dimensions, and (2) a Gaussian loss for aleatoric uncertainty, and wherein a respective detected aerial vehicle identification data set comprises: respective aerial vehicle coordinates, respective aerial vehicle bounding box dimensions, and respective dynamic aleatoric uncertainty covariance values; providing at least one detected aerial vehicle identification data set, output from the second machine learning subsystem in response to the inputting, to a recursive Bayesian estimator subsystem, from which at least one filtered set of aerial vehicle coordinates is obtained, wherein the recursive Bayesian estimator subsystem assigns a respective detected aerial vehicle identification data set to a respective Bayesian estimator, from which a respective filtered set of aerial vehicle coordinates is obtained, wherein a respective filtered set of aerial vehicle coordinates represent a real-time location of a respective aerial vehicle of interest; and outputting the at least one filtered set of aerial vehicle coordinates.
- 12 . The system of claim 11 , wherein respective dynamic aleatoric uncertainty covariance values are obtained from a single pass of a respective feature vector through the second machine learning subsystem.
- 13 . The system of claim 11 , wherein respective dynamic aleatoric uncertainty covariance values comprise at least one of: a respective covariance matrix for respective aerial vehicle coordinates, or respective aerial vehicle bounding box dimensions.
- 14 . The system of claim 11 , wherein the second machine learning subsystem comprises a multivariate deep evidential regression network.
- 15 . The system of claim 11 , wherein the Gaussian loss comprises a trace of a log-Cholesky decomposition of a matrix representing dynamic aleatoric uncertainty covariance values.
- 16 . The system of claim 11 , wherein a respective detected aerial vehicle identification data set further comprises an aerial vehicle classification label.
- 17 . The system of claim 11 , wherein a respective Bayesian estimator comprises a respective Kalman filter.
- 18 . The system of claim 17 , wherein the recursive Bayesian estimator subsystem is configured to provide a respective Kalman filter for each of a plurality of aerial vehicles of interest detected in the image.
- 19 . The system of claim 11 , wherein the first machine learning subsystem comprises a convolutional neural network and a multiscale feature decoder.
- 20 . The system of claim 11 , wherein respective filtered aerial vehicle bounding box dimensions are obtained from the respective Bayesian estimator, and wherein the outputting further comprises outputting at least one set of filtered aerial vehicle bounding box dimensions.
Description
GOVERNMENT SUPPORT This invention was made with government support under HR00112290107 awarded by Defense Advanced Research Projects Agency. The government has certain rights in the invention. FIELD This disclosure relates generally to tracking objects, such as terrestrial or aerial vehicles. BACKGROUND Object detection is a common sub-problem for autonomous systems that rely on computer vision. State-of-the-art object detectors often employ a single stage architecture that uses a single pass of input data to produce bounding boxes for objects within an input image. Due to the computational efficiency of single-pass detectors, they are widely used for perceiving the environment in real-time applications, including in autonomous vehicle perception systems. However, while existing single-pass detectors are computationally efficient, they do not provide a measure of their uncertainty. For example, these techniques often do not provide aleatoric uncertainty estimates of noise originating from input sensors. Deep neural network (DNN)-based object detectors are sometimes used in perception systems. However, DNNs generally lack the ability to provide explanations of their inner workings or reliable quantitative measures of uncertainty. This limits their ability to be combined with other modules within larger decision-making systems. Some attempts to include uncertainty estimation with DNNs, such as Bayesian neural networks, ensemble methods, and Monte Carlo dropout methods, involve some form of sampling, which is computationally expensive. Here, sampling refers to multiple passes of an input datum through a DNN. A first type of sampling involves repeatedly perturbing and passing an input datum through a DNN. A second type of sampling involves passing an input datum through many slightly different DNNs. Both types of sampling consume excessive power, with the first type also consuming excessive time and the second type also requiring an excessive processing footprint. Models deployed in the real world, where uncertainty estimation is most crucial, often require algorithms to run in real-time on small-footprint and low-power hardware. These constraints make sampling-heavy approaches impractical. Other attempts to integrate uncertainty estimation with DNNs, such as Gaussian neural networks, are inherently inaccurate and too slow for real-time tracking of high-speed objects. Yet other attempts to integrate uncertainty estimates with DNNs, such as loss attenuation, redundancy, and Gaussian Density Models, provide only the overall variance of the model outputs, and therefore cannot capture the full aleatoric uncertainty covariance matrices of output location and bounding box dimensions in object tracking applications, for example. Attempts to provide single-stage architecture with uncertainty estimation for object detection also fall short. For example, CertainNet (Gasperini, et al., “Certainnet: Sampling-free uncertainty estimation for object detection,” IEEE Robotics and Automation Letters, vol. 7, no. 2, p. 698-705) extends the CenterNet object detector by estimating uncertainty in a single pass with the deterministic uncertainty quantification (DUQ) method. However, the DUQ method is computationally expensive and does not directly model regression uncertainties. EvCenterNet (Nallapareddy, et al., “EvCenterNet: Uncertainty estimation for object detection using evidential learning,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)), also an extension of CenterNet, uses evidential learning to estimate classification and regression uncertainties. However, EvCenterNet follows the univariate normal inverse-gamma (NIG) distribution of and hence treats the bounding box width and height as independent variances. Statistics-based estimators like the linear Kalman filter require the full covariance of dependent variables, which cannot be provided by univariate approaches to uncertainty estimation. In general, prior art techniques that utilize static (fixed) univariate uncertainty estimations with statistics-based estimators like Kalman filters produce inaccurate results, particularly in the presence of variable or heteroskedastic noise levels. SUMMARY According to various embodiments, a method of aerial vehicle tracking using dynamic aleatoric uncertainty covariance estimation is provided. The method includes: obtaining an image depicting at least one aerial vehicle of interest; passing the image to a first machine learning subsystem, from which at least one feature vector is obtained, wherein the first machine learning subsystem is trained with a first training corpus comprising a set of labeled images, wherein a respective labeled image comprises a respective image and a respective label; inputting the at least one feature vector to a second machine learning subsystem, wherein the second machine learning subsystem is trained with a second training corpus comprising a set of labeled feature v