EP-4739980-A1 - LOCALIZATION OF USER(S) IN ENVIRONMENT(S)

EP4739980A1EP 4739980 A1EP4739980 A1EP 4739980A1EP-4739980-A1

Abstract

Implementations described herein relate to various techniques for localization of user(s) in environment(s). In particular, processor(s) can utilize a multi-scan technique, a semantic segmentation technique, or a combination of these techniques. In utilizing the multi-scan technique, the processor(s) can initially determine a subset of candidate maps that are predicted to correspond to an environment of a user, and from a superset of candidate maps. Further, the processor(s) can obtain vision data that captures the environment, process the vision data to determine a narrower subset of candidate maps and to determine a given map, from the narrower subset of candidate maps corresponding to the environment. In utilizing the semantic segmentation technique, the processor(s) can additionally, or alternatively, determine semantic properties of the environment. The semantic properties can be utilized to initially constrain the subset of candidate maps and/or utilized in subsequently determining the given map.

Inventors

Mirzaei, Fatemeh
Karels, Jr., Nicholas J.
MEREDITH, CHARLIE

Assignees

GoodMaps Inc.

Dates

Publication Date: 20260513
Application Date: 20240724

Claims (20)

1. A method implemented by one or more processors, the method comprising: determining a subset of candidate maps, from among a superset of candidate maps, that are predicted to correspond to an environment of a user of a client device; obtaining vision data that captures the environment of the user, the vision data being generated by one or more vision components of the client device of the user; determining, based on processing the vision data that captures the environment of the user, and from among the subset of candidate maps, a given map corresponding to the environment of the user, wherein determining the given map corresponding to the environment of the user based on processing the vision data that captures the environment of the user, and from among the subset of candidate maps, comprises: processing, using a machine learning (ML) model, the vision data to generate output; determining, based on at least a portion of the output, and from among the subset of candidate maps, a narrower subset of candidate maps; and determining, based on at least an additional portion of the output, and from among the narrower subset of candidate maps, the given map; and causing the given map corresponding to the environment of the user to be utilized as an operational map.
2. The method of ciaim 1, wherein determining the given map based on the additional portion of the output, and from among the narrower subset of candidate maps, is in response to determining that the narrower subset of candidate maps includes multiple candidate maps.
3. The method of claim 2, further comprising: in response to determining that the narrower subset of candidate maps includes a single candidate map: refraining from determining, based on the additional portion of the output, and from among the narrower subset of candidate maps, the given map; and determining the single candidate map is the given map corresponding to the environment of the user.
4. The method of any preceding claim, wherein the ML model is a feature extraction model, and wherein the output generated based on processing the vision data using the ML model comprises: global features of the environment of the user, local features of the environment of the user, and keypoint detection scores associated with the global features of the environment of the user and/or the locai features of the environment of the user.
5. The method of claim 4, wherein determining the narrower subset of candidate maps based on at least the portion of the output, and from among the subset of candidate maps, comprises: processing, using a rough feature matching algorithm or modei, the global features of the environment of the user and/or the keypoint detection scores associated with the global features of the environment of the user to determine, from among the subset of candidate maps, the narrower subset of candidate maps.
6. The method of claim 4 or claim 5, wherein determining the given map based on at least the additional portion of the output, and from among the narrower subset of candidate maps, comprises: processing, using a fine feature matching algorithm or model, the local features of the environment of the user and/or the keypoint detection scores associated with the local features of the environment of the user to determine, from among the narrower subset of candidate maps, the given map.
7. The method of any preceding claim, wherein determining the subset of candidate maps that are predicted to correspond to the environment of the user of the client device, and from among the superset of candidate maps, comprises: obtaining location data associated with the client device of the user, the location data being generated by one or more location sensors of the client device of the user; and determining, based on the location data associated with the client device of the user, the subset of candidate maps that are predicted to correspond to the environment of the user of the client device, and from among the superset of candidate maps.
S. The method of any one of claims 1 to 6 wherein determining the subset of candidate maps that are predicted to correspond to the environment of the user of the client device, and from among the superset of candidate maps, comprises: obtaining, via a software application that is accessible by the client device of the user, user input the user input being generated by one or more user input interfaces of the client device of the user; and determining, based on the user input obtained via the software application, the subset of candidate maps that are predicted to correspond to the environment of the user of the client device, and from among the superset of candidate maps.
9. The method of any preceding claim, wherein causing the given map corresponding to the environment of the user to be utilized as the operational map comprises: causing the operational map to be utilized by a software application, that is accessible by the client device of the user, to provide the user with navigational directions from a current location of the user in the environment and to a given point of interest in the environment that is specified by the user via the software application.
10. The method of any preceding claim, wherein causing the given map corresponding to the environment of the user to be utilized as the operational map comprises: causing the operational map to be utilized by a software application, that is accessible by the client device of the user, to provide the user with information related to a plurality of points of interest in the environment.
11. The method of any preceding ciaim, further comprising: prior to determining the subset of candidate maps that are predicted to correspond to the environment of the user of the dient device, and from among the superset of candidate maps: generating each map that is included in the superset of candidate maps, the superset of candidate maps including the given map corresponding to the environment of the user and a plurality of additional maps of other environments.
12. The method of claim 11, wherein generating the given map comprises: obtaining mapping vision data that captures the environment of the user, the mapping vision data being generated by one or more additional vision components, and the one or more additional vision components being associated with: an additional user that is manually traversing through the environment of the user, or a robot that is autonomously or semi- autonomously traversing through the environment of the user; generating, based on the mapping vision data, the given map corresponding to the environment of the user; and augmenting the given map corresponding to the environment of the user with information related to points of interest included in the environment of the user.
13. The method of claim 12, wherein augmenting the given map corresponding to the environment of the user with information related to the points of interest included in the environment of the user comprises obtaining user input that: labels the points of interest in the environment of the user; provides various levels of information about the points of interest in the environment of the user; assigns semantic properties to the points of interest in the environment; magnifies one or more obstacles in the environment of the user; or draws shapes around one or more of the obstacles in the environment of the user.
14, The method of any preceding ciaim, wherein the environment of the user corresponds to an indoor environment of a building, and wherein each of the subset of candidate maps are associated with: the building, a corresponding floor of the buiiding, a portion of the corresponding floor of the building, or an outdoor environment that surrounds the building.
15, The method of ciaim 14, further comprising: prior to obtaining the vision data that captures the environment of the user: obtaining, via a software appiication that is accessible by the ciient device, a prewarm request that includes a building identifier for the building.
16, The method of ciaim 15, further comprising: subsequent to obtaining the vision data that captures the environment of the user: obtaining, via the software application that is accessibie by the client device, a batch request that includes the buiiding identifier for the building and the vision data.
17, The method of any preceding ciaim, further comprising: obtaining additional vision data that captures a subsequent environment of the user, the vision data being generated by one or more of the vision components of the client device of the user; determining, based on processing the additions! vision data that captures the subsequent environment of the user, and from among the subset of candidate maps, whether to continue utilizing the given map as the operational map for the subsequent environment of the user or to utilize an additional given map as the operational map for the subsequent environment of the user; and causing, based on the determination, the given map or the additional given map to be utilized as the operationai map for the subsequent environment of the user.
18, The method of ciaim 17, wherein obtaining the additionai vision data that captures the subsequent environment of the user is in response to detecting an occurrence of an error with respect to the causing the given map to be utilized as the operational map for the environment of the user, and wherein the occurrence of the error with respect to the causing the given map to be utilized as the operational map for the environment of the user comprises one or more of: points of interest in the environment no longer being identified via a software application that is accessible by the client device, navigational directions through the environment no longer being able to be provided via the software application that is accessible by the client device, and/or intervening vision data, that is captured by one or more of the vision components of the client device subsequent to the vision data being captured and prior to the additional vision data being captured, being processed and indicating that the user is no longer located in the environment corresponding to the given map,
19, The method of claim 17, wherein obtaining the additional vision data that captures the subsequent environment of the user is in response to detecting an occurrence of sensor data, that is generated by one or more sensors of the client device, indicating that the user is no longer located in the environment corresponding to the given map, and wherein the one or more sensors of the client device comprises one or more of: one or more location sensors of the client device, one or more gyroscopes of the client device, one or more accelerometers of the client device, one or more motion sensors of the client device, one or more inertial measurement units of the client device, or one or more altimeters of the client device,
20, A method implemented by one or more processors, the method comprising: obtaining vision data that captures an environment of a user of a client device, the vision data being generated by one or more vision components of the client device of the user; processing, using one or more machine learning {ML) models, the vision data to determine one or more semantic properties of the environment of the user; determining, based on one or more of the semantic properties of the environment of the user, a subset of candidate maps, from a superset of candidate maps, that are predicted to correspond to the environment of the user; determining, based on processing the vision data that captures the environment of the user or based on processing additional vision data that captures the environment of the user, and from among the subset of candidate maps, a given map corresponding to the environment of the user; and causing the given map corresponding to the environment of the user to be utilized as an operational map.

Description

LOCALIZATION OF USER(S) IN ENVIRONMENT(S) Background [0001] Various localization techniques have been proposed that can aid humans (referred to herein as "users") in determining an environment in which they are located, identifying points of interest in the environment in which they are located, navigating through the environment, etc. For example, many users have client devices, such as smartphones, that are equipped with GPS sensor(s) and/or other location sensor(s). These users can interact with various software applications (e.g., via their client devices) that leverage sensor data generated by the GPS sensor(s) and/or other location sensor(s) of the smartphones. For instance, a user can interact with a navigational software application that leverages this sensor data to determine the user's current location, identify businesses or other points of interest that are near the user's current location, provide directions to a desired business and/or other point of interest specified by the user, and so on. However, current localization techniques suffer from one or more drawbacks. [0002] As one example, most localization techniques are limited to determining a user's location with respect to a global frame of reference. For instance, most localization techniques determine a user's location with respect to longitude and latitude or with respect to particular location identifiers (e.g., street addresses, plus codes, etc.). However, the user's location with respect to a global frame of reference is insufficient in aiding the user in many instances, such as when the user's location is within a multi-story building where the user's longitude and latitude can be the same for each story of the multi-story building. As another example, some localization techniques determine a user's location with respect to a local frame of reference. For instance, some localization techniques utilize vision-based machine learning (ML) techniques to analyze vision data capturing an environment of the user, compare features captured in the vision data to features of maps that were previously generated for a plurality of different environments, and determine the environment of the user based on the comparing. However, these vision-based ML techniques are computationally intensive. Accordingly, there is a need in the art for localization techniques that mitigate and/or obviate these drawbacks. Summary [0003] Implementations described herein relate to various techniques for localization of user(s) in environment(s). in particular, processor(s) can utilize a multi-scan technique, a semantic segmentation technique, or a combination of these techniques as described herein. In some implementations, the processor(s) may oniy utilize the multi-scan technique in localization of user(s) in environment(s). In utilizing only the multi-scan technique localization of user(s) in environ ment(s), the processor(s) can determine a subset of candidate maps, from among a superset of candidate maps, that are predicted to correspond to an environment of a user of a client device; obtain vision data that captures the environment of the user and that is generated by vision component(s) of the client device of the user; determine, based on processing the vision data that captures the environment of the user, and from among the subset of candidate maps, a given map corresponding to the environment of the user; and cause the given map corresponding to the environment of the user to be utilized as an operational map. [0004] In some versions of those implementations, and in determining the subset of candidate maps that are predicted to correspond to the environment of the user, and from among the superset of candidate maps, the processor(s) may obtain location data generated by location sensor(s) of the client device. Further, the processor(s) may determine the subset of candidate maps based on the location data. For example, the superset of candidate maps may include all previously generated maps that are associated with a plurality of different buildings. However, the location data may indicate that the user has entered a particular building. In this example, the subset of candidate maps can include all of the maps that are associated with the particular building, surrounding buildings, and outdoor environment(s) around the particular building. In this manner, the processor(s) can initially leverage the location data to initially constrain the superset of candidate maps to only those that are relevant to the user at the time the location data is obtained. In some further versions of those implementations, the processor(s) may only obtain the location data generated by the location sensor(s) of the client device in response to a software application, that is accessible by the client device, being launched. In these implementations, the software application may be leveraged in localization of the user(s) in the environment(s). [0005] In additional or altern