US-12623684-B1 - Crowdsourcing vehicle data

US12623684B1US 12623684 B1US12623684 B1US 12623684B1US-12623684-B1

Abstract

Systems and techniques are provided for crowdsourcing image data depicting autonomous vehicles (AV) to determine user feedback about the AVs based on the image data. An example method can include obtaining a frame captured by a client device and shared by a user of the client device, wherein the frame depicts an AV in a scene where the frame was captured; identifying the AV depicted in the frame based at least in part on a content of the frame; determining, based at least in part on the frame, an issue experienced by the AV in the scene; and determining an action to take in response to at least one of the frame shared by the user and the issue experienced by the AV.

Inventors

Mohamed Mostafa Elshenawy
Max Eunice
Jennifer Devar McKnew
Matthew Douglas Helfgott
Alexander Willem Gerrese
Nancy Chen

Assignees

GM CRUISE HOLDINGS LLC

Dates

Publication Date: 20260512
Application Date: 20230927

Claims (18)

1 . A system comprising: a non-transitory memory storing instructions; and one or more processors coupled to the memory, wherein the instructions, when executed by the one or more processors, cause the one or more processors to: obtain a frame captured by a client device and shared by a user of the client-device with the system, wherein the frame depicts an autonomous vehicle (AV) in a scene where the frame was captured, wherein the frame comprises an image or a frame of a video; determine an identity of the AV depicted in the frame based at least in part on the frame, wherein the identity uniquely identifies the AV, and wherein determining the identity of the AV depicted in the frame comprises: determining a location of the client device at a time when the client device captured or shared the frame; determining, based on the location of the client device at the time when the client device captured or shared the frame and a set of tracked locations of a set of AVs, one or more AVs from the set of AVs that were present in the scene at the time when the client device captured or shared the frame; determining, based on sensor data from each AV from the one or more AVs, a respective orientation of each AV at the time when the client device captured or shared the frame; determining a depicted orientation of the AV in the frame; determining an orientation of the client device at the time when the client device captured or shared the frame; and determining which AV from the one or more AVs is depicted in the frame based on at least one of the respective orientation of each AV at the time when the client device captured or shared the frame, the depicted orientation of the AV in the frame, the orientation of the client device at the time when the client device captured or shared the frame, and at least one of which AV from the set of AVs is depicted in a foreground of the frame, is most centered within the frame, or is most contained within the frame; determine, based at least in part on the frame, an issue experienced by the AV in the scene; determine an action for the AV to take in response to the issue experienced by the AV to resolve the issue; and based on the action to take and the identity of the AV, send a remote command to the AV, wherein the remote command causes the AV to perform the action to resolve the issue.
2 . The system of claim 1 , wherein determining the identity of the AV depicted in the frame further comprises: extracting one or more features in the frame, the one or more features corresponding to the AV depicted in the frame; determining one or more AV characteristics based on the one or more features extracted; and identifying the AV depicted in the frame based on the one or more AV characteristics.
3 . The system of claim 1 , wherein determining the identity of the AV depicted in the frame further comprises: determining the location of the client device being determined based on at least one of location information included in metadata of the frame and location information provided by a client application used to share the frame.
4 . The system of claim 1 , wherein the instructions, when executed by the one or more processors, cause the one or more processors to also: determine, based on the frame shared by the user, that the frame represents a notification from the user about a situation experienced by the AV in the scene; and determine, based at least in part on the frame and sensor data from the AV, that the situation comprises the issue experienced by the AV in the scene, wherein the issue comprises at least one of an AV error, an AV failure, an AV condition, an AV state, and an AV behavior.
5 . The system of claim 1 , wherein the instructions, when executed by the one or more processors, cause the one or more processors to also: in response to a determination that the frame does not provide sufficient information to determine with a threshold confidence a situation being reported by the user through sharing of the frame, obtain, from the AV, sensor data collected by the AV during a period of time when the AV was present in the scene, the period of time comprising at least one of a time when the client device captured or shared the frame, an amount of time before the time when the client device captured or shared the frame, and an amount of time after the time when the client device captured or shared the frame; determine, based on the frame and the sensor data obtained from the AV, an activity or condition of the AV during the period of time when the AV was present in the scene; and determine the issue experienced by the AV in the scene based on the activity or condition of the AV during the period of time when the AV was present in the scene.
6 . The system of claim 1 , wherein; determining the action to take in response to at least one of the frame shared by the user and the issue experienced by the AV comprises determining to request feedback from the user about a situation of the AV associated with the frame; and wherein the instructions, when executed by the one or more processors, cause the one or more processors to also: send, to the client device, user interface content for presentation at the client device, the user interface content requesting information from the user about the AV; and receive, from the client device, one or more user inputs provided via a user interface comprising the user interface content.
7 . The system of claim 6 , wherein the user interface content comprises at least one of a map depicting the AV in the scene, one or more frames collected by one or more AVs that were present in the scene when the frame was captured or shared, and one or more interface input elements.
8 . The system of claim 7 , wherein the one or more user inputs define a path in the map suggested by the user for the AV to take instead of an actual path taken by the AV when navigating the scene.
9 . The system of claim 1 , wherein determining the action for the AV to take in response to the issue experienced by the AV comprises: determining that the frame is missing context information relating to the issue experienced by the AV; determining, from a set of frames collected by the set of AVs that were present in the scene when the frame was captured or shared, one or more frames in the set of frames determined to provide the missing context contextual information relating to the issue experienced by the AV; and providing the one or more frames in a reply to a post from the user sharing the frame depicting the AV.
10 . A method comprising: obtaining a frame captured by a client device and shared by a user of the client device, wherein the frame depicts an autonomous vehicle (AV) in a scene where the frame was captured, wherein the frame comprises an image or a frame of a video; determining an identity of the AV depicted in the frame based at least in part on the frame, wherein the identity uniquely identifies the AV, and wherein determining the identity of the AV depicted in the frame comprises: determining a location of the client device at a time when the client device captured or shared the frame; determining, based on the location of the client device at the time when the client device captured or shared the frame and a set of tracked locations of a set of AVs, one or more AVs from the set of AVs that were present in the scene at the time when the client device captured or shared the frame; determining, based on sensor data from each AV from the one or more AVs, a respective orientation of each AV at the time when the client device captured or shared the frame; determining a depicted orientation of the AV in the frame; determining an orientation of the client device at the time when the client device captured or shared the frame; and determining which AV from the one or more AVs is depicted in the frame based on at least one of the respective orientation of each AV at the time when the client device captured or shared the frame, the depicted orientation of the AV in the frame, the orientation of the client device at the time when the client device captured or shared the frame, and at least one of which AV from the set of AVs is depicted in a foreground of the frame, is most centered within the frame, or is most contained within the frame; determining, based at least in part on the frame, an issue experienced by the AV in the scene; determining an action for the AV to take in response to the issue experienced by the AV to resolve the issue; and based on the action to take and the identity of the AV, sending a remote command to the AV, wherein the remote command causes the AV to perform the action to resolve the issue.
11 . The method of claim 10 , wherein determining the identity of the AV depicted in the frame further comprises: extracting one or more features in the frame, the one or more features corresponding to the AV depicted in the frame; determining one or more AV characteristics based on the one or more features extracted; and identifying the AV depicted in the frame based on the one or more AV characteristics.
12 . The method of claim 10 , wherein determining the identity of the AV depicted in the frame further comprises: determining the location of the client device being determined based on at least one of location information included in metadata of the frame and location information provided by a client application used to share the frame.
13 . The method of claim 10 , further comprising: determining, based on the frame shared by the user, that the frame represents a notification from the user about a situation experienced by the AV in the scene; and determining, based at least in part on the frame and sensor data from the AV, that the situation comprises the issue experienced by the AV in the scene, wherein the issue comprises at least one of an AV error, an AV failure, an AV condition, an AV state, and an AV behavior.
14 . The method of claim 10 , further comprising: in response to a determination that the frame does not provide sufficient information to determine with a threshold confidence a situation being reported by the user through sharing of the frame, obtaining, from the AV, sensor data collected by the AV during a period of time when the AV was present in the scene, the period of time comprising at least one of a time when the client device captured or shared the frame, an amount of time before the time when the client device captured or shared the frame, and an amount of time after the time when the client device captured or shared the frame; determining, based on the frame and the sensor data obtained from the AV, an activity or condition of the AV during the period of time when the AV was present in the scene; and determining the issue experienced by the AV in the scene based on the activity or condition of the AV during the period of time when the AV was present in the scene.
15 . The method of claim 10 , wherein: determining the action to take in response to at least one of the frame shared by the user and the issue experienced by the AV comprises determining to request feedback from the user about a situation of the AV associated with the frame; and the method further comprising: sending, to the client device, user interface content for presentation at the client device, the user interface content requesting information from the user about the AV; and receiving, from the client device, one or more user inputs provided via a user interface comprising the user interface content.
16 . The method of claim 15 , wherein the user interface content comprises a map depicting the AV in the scene, and wherein the one or more user inputs define a path in the map suggested by the user for the AV to take instead of an actual path taken by the AV when navigating the scene.
17 . The method of claim 10 , wherein determining the action to take in response to at least one of the frame shared by the user and the issue experienced by the AV comprises: determining that the frame is missing context information relating to the issue experienced by the AV; determining, from a set of frames collected by one or more AVs that were present in the scene when the frame was captured or shared, one or more frames in the set of frames determined to provide the missing context information relating to the issue experienced by the AV; and providing the one or more frames in a reply to a post from the user sharing the frame depicting the AV.
18 . A non-transitory computer-readable medium having stored thereon instructions which, when executed by one or more processors, cause the one or more processors to: obtain a frame captured by a client device and shared by a user of the client device, wherein the frame depicts an autonomous vehicle (AV) in a scene where the frame was captured, wherein the frame comprises an image or a frame of a video; determine an identity of the AV depicted in the frame based at least in part on the frame, wherein the identity uniquely identifies the AV, and wherein determining the identity of the AV depicted in the frame comprises: determining a location of the client device at a time when the client device captured or shared the frame; determining, based on the location of the client device at the time when the client device captured or shared the frame and a set of tracked locations of a set of AVs, one or more AVs from the set of AVs that were present in the scene at the time when the client device captured or shared the frame; determining, based on sensor data from each AV from the one or more AVs, a respective orientation of each AV at the time when the client device captured or shared the frame; determining a depicted orientation of the AV in the frame; determining an orientation of the client device at the time when the client device captured or shared the frame; and determining which AV from the one or more AVs is depicted in the frame based on at least one of the respective orientation of each AV at the time when the client device captured or shared the frame, the depicted orientation of the AV in the frame, the orientation of the client device at the time when the client device captured or shared the frame, and at least one of which AV from the set of AVs is depicted in a foreground of the frame, is most centered within the frame, or is most contained within the frame; determine, based at least in part on the frame, an issue experienced by the AV in the scene; determine an action for the AV to take in response to the issue experienced by the AV to resolve the issue; and based on the action to take and the identity of the AV, send a remote command to the AV, wherein the remote command causes the AV to perform the action to resolve the issue.

Description

BACKGROUND 1. Technical Field The present disclosure generally relates to crowdsourcing vehicle data and, more specifically, using crowdsourced image data depicting a vehicle in a scene to identify and locate the vehicle in the scene, and obtain feedback and/or commands relating to the vehicle. 2. Introduction An autonomous vehicle is a motorized vehicle that can navigate without a human driver. An exemplary autonomous vehicle can include various sensors, such as a camera sensor, a light detection and ranging (LIDAR) sensor, and a radio detection and ranging (RADAR) sensor, amongst others. The sensors collect data and measurements that the autonomous vehicle can use for operations such as navigation. The sensors can provide the data and measurements to an internal computing system of the autonomous vehicle, which can use the data and measurements to control a mechanical system of the autonomous vehicle, such as a vehicle propulsion system, a braking system, or a steering system. Typically, the sensors are mounted at specific locations on the autonomous vehicles. BRIEF DESCRIPTION OF THE DRAWINGS The various advantages and features of the present technology will become apparent by reference to specific implementations illustrated in the appended drawings. A person of ordinary skill in the art will understand that these drawings only show some examples of the present technology and would not limit the scope of the present technology to these examples. Furthermore, the skilled artisan will appreciate the principles of the present technology as described and explained with additional specificity and detail through the use of the accompanying drawings in which: FIG. 1 is a diagram illustrating an example system environment that can be used to facilitate autonomous vehicle navigation and routing operations, according to some examples of the present disclosure; FIG. 2 is a flowchart illustrating an example process for crowdsourcing user feedback about an autonomous vehicle, according to some examples of the present disclosure; FIG. 3 is a diagram illustrating an example of a user capturing a frame depicting an autonomous vehicle in a scene and reporting the autonomous vehicle using the captured frame, according to some examples of the present disclosure; FIG. 4 is a diagram illustrating an example video relating to a situation involving an autonomous vehicle that provides context relating to a situation in a scene involving the autonomous vehicle, according to some examples of the present disclosure; FIG. 5 is a diagram illustrating an example user interface that a user can use to provide feedback about a situation reported by the user regarding an autonomous vehicle, according to some examples of the present disclosure; FIG. 6 is a flowchart illustrating an example process for crowdsourcing user feedback about an autonomous vehicle, according to some examples of the present disclosure; and FIG. 7 is a diagram illustrating an example system architecture for implementing certain aspects described herein. DETAILED DESCRIPTION The detailed description set forth below is intended as a description of various configurations of the subject technology and is not intended to represent the only configurations in which the subject technology can be practiced. The appended drawings are incorporated herein and constitute a part of the detailed description. The detailed description includes specific details for the purpose of providing a more thorough understanding of the subject technology. However, it will be clear and apparent that the subject technology is not limited to the specific details set forth herein and may be practiced without these details. In some instances, structures and components are shown in block diagram form in order to avoid obscuring the concepts of the subject technology. One aspect of the present technology is the gathering and use of data available from various sources to improve quality and experience. The present disclosure contemplates that in some instances, this gathered data may include personal information. The present disclosure contemplates that the entities involved with such personal information respect and value privacy policies and practices. As previously explained, autonomous vehicles (AVs) can include various sensors, such as a camera sensor, a light detection and ranging (LIDAR) sensor, a radio detection and ranging (RADAR) sensor, a time-of-flight (TOF) sensor, an inertial measurement unit (IMU), and/or an acoustic sensor (e.g., sound navigation and ranging (SONAR), microphone, etc.), global navigation satellite system (GNSS) and/or global positioning system (GPS) receiver, amongst others. The AVs can use the various sensors to collect data and measurements that the AVs can use for AV operations such as perception (e.g., object detection, event detection, tracking, localization, sensor fusion, point cloud processing, image processing, etc.), planning (e.g., route planning, trajector