US-12620195-B2 - Robust feature extraction from occluded image frames for vehicle applications
Abstract
This disclosure provides systems, methods, and devices for vehicle driving assistance systems that support image processing. In a first aspect, a method of image processing includes receiving a plurality of image frames by a computing device and using machine learning models to identify corrupted or occluded image frames. A first machine learning model may identify corrupted image frames, while a second machine learning model may identify partially occluded image frames. The method may further include generating updated versions of image frames captured by vehicle cameras, such as based on feature vectors from the first and second machine learning models. The feature vectors may be fused and provided to a third machine learning model to generate updated versions of occluded image frames. The method may further include determining vehicle control instructions based on the updated versions. Other aspects and features are also claimed and described.
Inventors
- Varun Ravi Kumar
- Debasmit DAS
- Senthil Kumar Yogamani
Assignees
- QUALCOMM INCORPORATED
Dates
- Publication Date
- 20260505
- Application Date
- 20230522
Claims (20)
- 1 . A method for image processing for use in a vehicle assistance system, comprising: receiving a plurality of image frames; determining, from among the plurality of image frames, a first set of image frames that are corrupted; identifying, from among the plurality of image frames, a second set of image frames that are partially occluded; determining fused feature vectors based on the second set of image frames and a third set of image frames, wherein the third set of image frames excludes the first set of image frames and the second set of image frames; determining, based on the fused feature vectors, updated versions of at least a subset of the first set of image frames and the second set of image frames; determining a top view segmentation map based on the updated versions, wherein the top view segmentation map is determined to maintain consistency between non-corrupted image frames and the updated versions, wherein the top view segmentation map is determined using a top view model, and the method further comprises training the top view model to maintain consistency between feature vectors for the non-corrupted image frames and feature vectors for the updated versions, wherein training the top view model includes determining a consistency measure between feature vectors for the non-corrupted image frames and feature vectors for the updated versions; and determining vehicle control instructions based on the top view segmentation map, wherein the consistency measure comprises at least one of: an L1 loss measure between feature vectors for the non-corrupted image frames and feature vectors for the updated versions, an L2 loss measure between feature vectors for the non-corrupted image frames and feature vectors for the updated versions, a KL divergence between feature vectors for the non-corrupted image frames and feature vectors for the updated versions, or a combination thereof.
- 2 . The method of claim 1 , wherein the first set of image frames are determined using a first machine learning model.
- 3 . The method of claim 2 , wherein the first machine learning model determines the first set of image frames based on predicted probabilities that the first set of image frames are corrupted.
- 4 . The method of claim 3 , wherein the first machine learning model includes a multi-layer perceptron (MLP) layer configured to determine the predicted probabilities.
- 5 . The method of claim 2 , wherein the first machine learning model is further configured to apply a separable convolution operation to each of the first set of image frames.
- 6 . The method of claim 2 , further comprising, prior to determining the first set of image frames, training the first machine learning model based on a one-hot encoding of known corrupted image frames within a training dataset.
- 7 . The method of claim 1 , wherein the second set of image frames are determined using a second machine learning model.
- 8 . The method of claim 7 , wherein the second machine learning model performs a relative convolution operation that applies a bias value to pixel values.
- 9 . The method of claim 8 , wherein the bias value is determined based on pixel values within a convolution window of the relative convolution operation.
- 10 . The method of claim 7 , further comprising, prior to determining the second set of image frames, training the second machine learning model based on a training dataset containing training image frames that are known to be partially occluded.
- 11 . The method of claim 1 , wherein the updated versions are determined by a third machine learning model that receives the fused feature vectors.
- 12 . An apparatus, comprising: a memory storing processor-readable code; and at least one processor coupled to the memory, the at least one processor configured to execute the processor-readable code to cause the at least one processor to perform operations including: receiving a plurality of image frames; determining, from among the plurality of image frames, a first set of image frames that are corrupted; identifying, from among the plurality of image frames, a second set of image frames that are partially occluded; determining fused feature vectors based on the second set of image frames and a third set of image frames, wherein the third set of image frames excludes the first set of image frames and the second set of image frames; determining, based on the fused feature vectors, updated versions of at least a subset of the first set of image frames and the second set of image frames; determining a top view segmentation map based on the updated versions, wherein the top view segmentation map is determined to maintain consistency between non-corrupted image frames and the updated versions, wherein the top view segmentation map is determined using a top view model, and the operations further comprise training the top view model to maintain consistency between feature vectors for the non-corrupted image frames and feature vectors for the updated versions, wherein training the top view model includes determining a consistency measure between feature vectors for the non-corrupted image frames and feature vectors for the updated versions; and determining vehicle control instructions based on the top view segmentation map, wherein the consistency measure comprises at least one of: an L1 loss measure between feature vectors for the non-corrupted image frames and feature vectors for the updated versions, an L2 loss measure between feature vectors for the non-corrupted image frames and feature vectors for the updated versions, a KL divergence between feature vectors for the non-corrupted image frames and feature vectors for the updated versions, or a combination thereof.
- 13 . The apparatus of claim 12 , wherein the first set of image frames are determined using a first machine learning model.
- 14 . The apparatus of claim 13 , wherein the first machine learning model determines the first set of image frames based on predicted probabilities that the first set of image frames are corrupted.
- 15 . The apparatus of claim 14 , wherein the first machine learning model includes a multi-layer perceptron (MLP) layer configured to determine the predicted probabilities.
- 16 . The apparatus of claim 13 , wherein the operations further comprise, prior to determining the first set of image frames, training the first machine learning model based on a one-hot encoding of known corrupted image frames within a training dataset.
- 17 . The apparatus of claim 12 , wherein the second set of image frames are determined using a second machine learning model.
- 18 . The apparatus of claim 17 , wherein the second machine learning model performs a relative convolution operation that applies a bias value to pixel values.
- 19 . The apparatus of claim 17 , wherein the operations further comprise, prior to determining the second set of image frames, training the second machine learning model based on a training dataset containing training image frames that are known to be partially occluded.
- 20 . A non-transitory computer-readable medium storing instructions that, when executed by a processor, cause the processor to perform operations comprising: receiving a plurality of image frames; determining, from among the plurality of image frames, a first set of image frames that are corrupted; identifying, from among the plurality of image frames, a second set of image frames that are partially occluded; determining fused feature vectors based on the second set of image frames and a third set of image frames, wherein the third set of image frames excludes the first set of image frames and the second set of image frames; determining, based on the fused feature vectors, updated versions of at least a subset of the first set of image frames and the second set of image frames; determining a top view segmentation map based on the updated versions, wherein the top view segmentation map is determined to maintain consistency between non-corrupted image frames and the updated versions, wherein the top view segmentation map is determined using a top view model, and wherein the operations further comprise training the top view model to maintain consistency between feature vectors for the non-corrupted image frames and feature vectors for the updated versions, wherein training the top view model includes determining a consistency measure between feature vectors for the non-corrupted image frames and feature vectors for the updated versions; and determining vehicle control instructions based on the top view segmentation map, wherein the consistency measure comprises at least one of: an L1 loss measure between feature vectors for the non-corrupted image frames and feature vectors for the updated versions, an L2 loss measure between feature vectors for the non-corrupted image frames and feature vectors for the updated versions, a KL divergence between feature vectors for the non-corrupted image frames and feature vectors for the updated versions, or a combination thereof.
Description
TECHNICAL FIELD Aspects of the present disclosure relate generally to driver-operated or driver-assisted vehicles, and more particularly, to methods and systems suitable for supplying driving assistance or for autonomous driving. INTRODUCTION Vehicles take many shapes and sizes, are propelled by a variety of propulsion techniques, and carry cargo including humans, animals, or objects. These machines have enabled the movement of cargo across long distances, movement of cargo at high speed, and movement of cargo that is larger than could be moved by human exertion. Vehicles originally were driven by humans to control speed and direction of the cargo to arrive at a destination. Human operation of vehicles has led to many unfortunate incidents resulting from the collision of vehicle with vehicle, vehicle with object, vehicle with human, or vehicle with animal. As research into vehicle automation has progressed, a variety of driving assistance systems have been produced and introduced. These include navigation directions by GPS, adaptive cruise control, lane change assistance, collision avoidance systems, night vision, parking assistance, and blind spot detection. BRIEF SUMMARY OF SOME EXAMPLES The following summarizes some aspects of the present disclosure to provide a basic understanding of the discussed technology. This summary is not an extensive overview of all contemplated features of the disclosure and is intended neither to identify key or critical elements of all aspects of the disclosure nor to delineate the scope of any or all aspects of the disclosure. Its sole purpose is to present some concepts of one or more aspects of the disclosure in summary form as a prelude to the more detailed description that is presented later. Human operators of vehicles can be distracted, which is one factor in many vehicle crashes. Driver distractions can include changing the radio, observing an event outside the vehicle, and using an electronic device, etc. Sometimes circumstances create situations that even attentive drivers are unable to identify in time to prevent vehicular collisions. Aspects of this disclosure, provide improved systems for assisting drivers in vehicles with enhanced situational awareness when driving on a road. One aspect includes a method for image processing for use in a vehicle assistance system is provided that includes receiving a plurality of image frames. The method also includes determining, from among the plurality of image frames, a first set of image frames that are corrupted. The method also includes identifying, from among the plurality of image frames, a second set of image frames that are partially occluded. The method also includes determining updated versions of at least a subset of the first set of image frames and the second set of image frames. The method also includes determining vehicle control instructions based on the updated versions. An additional aspect includes an apparatus that includes a memory storing processor-readable code and at least one processor coupled to the memory. The at least one processor may be configured to execute the processor-readable code to cause the at least one processor to perform operations including receiving a plurality of image frames; determining, from among the plurality of image frames, a first set of image frames that are corrupted; identifying, from among the plurality of image frames, a second set of image frames that are partially occluded; determining updated versions of at least a subset of the first set of image frames and the second set of image frames; and determining vehicle control instructions based on the updated versions. Another aspect includes a non-transitory computer-readable medium storing instructions that, when executed by a processor, cause the processor to perform operations including receiving a plurality of image frames. The operations also include determining, from among the plurality of image frames, a first set of image frames that are corrupted. The operations also include identifying, from among the plurality of image frames, a second set of image frames that are partially occluded. The operations also include determining updated versions of at least a subset of the first set of image frames and the second set of image frames. The operations also include determining vehicle control instructions based on the updated versions. A further aspect includes a vehicle that includes a memory storing processor-readable code and at least one processor coupled to the memory. The at least one processor configured to execute the processor-readable code to cause the at least one processor to perform operations including: receiving a plurality of image frames; determining, from among the plurality of image frames, a first set of image frames that are corrupted; identifying, from among the plurality of image frames, a second set of image frames that are partially occluded; determining updated versions of at least a subset of