US-12618771-B2 - Generating high-resolution concentration maps for atmospheric gases using geography-informed machine learning
Abstract
Generating one or more high-resolution atmospheric gas concentration maps using geography-informed machine learning includes obtaining a remote sensing dataset constrained by at least one temporal window and at least one spatial window defining a first geographic area. The remote sensing dataset includes at least a first set of atmospheric gas concentration data for a plurality of atmospheric gases. A training dataset is generated based on the remote sensing dataset. A machine learning model is trained with the training dataset to predict a plurality of atmospheric gas concentration values for at least one atmospheric gas of the plurality of atmospheric gases in a given geographic area and with a spatial resolution that is greater than a spatial resolution of atmospheric gas concentration data provided as an input to the machine learning module.
Inventors
- Kalaivani Ramea KUBENDRAN
- Md Nurul Huda
- David Schwartz
- Jeyasri Subramanian
Assignees
- PALO ALTO RESEARCH CENTER INCORPORATED
Dates
- Publication Date
- 20260505
- Application Date
- 20220802
Claims (20)
- 1 . A method for generating one or more high-resolution atmospheric gas concentration maps using geography-informed machine learning, the method comprising: obtaining, by an information processing system, a remote sensing dataset constrained by at least one temporal window and at least one spatial window defining a first geographic area, the remote sensing dataset comprising at least a first set of atmospheric gas concentration data for a plurality of atmospheric gases; generating, by the information processing system, a training dataset by: spatially and temporally aligning subsets of the remote sensing dataset, and extracting, for each raster point of a plurality of raster points in the first geographic area, a set of remote sensing features; training, by the information processing system, a machine learning model with the training dataset to predict a plurality of atmospheric gas concentration values for at least one atmospheric gas of the plurality of atmospheric gases in a given geographic area and with a spatial resolution that is greater than a spatial resolution of atmospheric gas concentration data provided as an input to the machine learning module; and generating, by the trained machine learning model, a high-resolution atmospheric gas concentration map for the given geographic area based on the plurality of predicted atmospheric gas concentration values, wherein the high-resolution atmospheric gas concentration map is georeferenced and configured for real-time monitoring of atmospheric pollution.
- 2 . The method of claim 1 , further comprising: processing, by the trained machine learning model, an input dataset comprising a second set of atmospheric gas concentration data for the at least one atmospheric gas, the second set of atmospheric gas concentration data being associated with a second geographic area and having a first spatial resolution; and generating, by the trained machine learning model based on processing the input dataset, a plurality of predicted atmospheric gas concentration values for the at least one atmospheric gas, wherein the plurality of predicted atmospheric gas concentration values has a second spatial resolution that is greater than the first spatial resolution.
- 3 . The method of claim 1 , wherein the remote sensing dataset further comprises a first set of multispectral data associated with the first geographic area and at least one of a synthetic aperture radar data or a nighttime radiance data associated with the first geographic area.
- 4 . The method of claim 3 , wherein the remote sensing dataset comprises a plurality of rasters including a first set of rasters representing the first set of atmospheric gas concentration data, a second set of rasters representing the first set of multispectral data, and at least one of a third set of rasters representing the synthetic aperture radar data or a fourth set of rasters representing the nighttime radiance data.
- 5 . The method of claim 1 , wherein generating the training dataset comprises: extracting, for each raster point of the plurality of raster points, remote sensing data of interest, the extracted remote sensing data including extracted atmospheric gas concentration data for the at least one atmospheric gas, extracted multispectral data, and at least one of extracted synthetic aperture radar data or extracted nighttime radiance data; and storing at least the extracted atmospheric gas concentration data and the extracted multispectral data as the training dataset.
- 6 . The method of claim 5 , wherein generating the training dataset further comprises: determining one or more local spatial association indicators for the at least one of the extracted synthetic aperture radar data or the extracted nighttime radiance data, wherein the one or more local spatial association indicators provide a set of spatially autocorrelated land use classifications for each raster point of the plurality of raster points; and storing the set of spatially autocorrelated land use classifications as part of the training dataset.
- 7 . The method of claim 6 , wherein determining one or more local spatial association indicators comprises calculating a local Moran's Index for the at least one of the extracted synthetic aperture radar data or the extracted nighttime radiance data.
- 8 . The method of claim 6 , wherein training the machine learning model comprises performing gradient boosting using the extracted multispectral data and the set of spatially autocorrelated land use classifications as explanatory variables and the extracted atmospheric gas concentration data as a target feature to be predicted by the machine learning model.
- 9 . An information processing system for generating one or more high-resolution atmospheric gas concentration maps using geography-informed machine learning, the information processing system comprising: a processor; memory communicatively coupled to the processor; and an atmospheric gas mapping unit communicatively coupled to the processor and the memory, wherein the atmospheric gas mapping unit is configured to: obtain a remote sensing dataset constrained by at least one temporal window and at least one spatial window defining a first geographic area, the remote sensing dataset comprising at least a first set of atmospheric gas concentration data for a plurality of atmospheric gases; generate a training dataset by: spatially and temporally aligning subsets of the remote sensing dataset, and extracting, for each raster point of a plurality of raster points in the first geographic area, a set remote sensing features; train a machine learning model with the training dataset to predict a plurality of atmospheric gas concentration values for at least one atmospheric gas of the plurality of atmospheric gases in a given geographic area and with a spatial resolution that is greater than a spatial resolution of atmospheric gas concentration data provided as an input to the machine learning module; and generate, by the trained machine learning model, a high-resolution atmospheric gas concentration map for the given geographic area based on the plurality of predicted atmospheric gas concentration values, wherein the high-resolution atmospheric gas concentration map is georeferenced and configured for real-time monitoring of atmospheric pollution.
- 10 . The information processing system of claim 9 , wherein the trained machine learning model: processes an input dataset comprising a second set of atmospheric gas concentration data for the at least one atmospheric gas, the second set of atmospheric gas concentration data being associated with a second geographic area and having a first spatial resolution; and generates, based on processing the input dataset, a plurality of predicted atmospheric gas concentration values for the at least one atmospheric gas, wherein the plurality of predicted atmospheric gas concentration values has a second spatial resolution that is greater than the first spatial resolution.
- 11 . The information processing system of claim 9 , wherein the remote sensing dataset further comprises a first set of multispectral data associated with the first geographic area and at least one of-a-Erst-set of synthetic aperture radar data or nighttime radiance data associated with the first geographic area.
- 12 . The information processing system of claim 11 , wherein the remote sensing dataset comprises a plurality of rasters including a first set of rasters representing the first set of atmospheric gas concentration data, a second set of rasters representing the first set of multispectral data, and at least one of a third set of rasters representing the synthetic aperture radar data or a fourth set of rasters representing the nighttime radiance data.
- 13 . The information processing system of claim 9 , wherein the atmospheric gas mapping unit generates the training dataset by: for each raster point of the plurality of raster points, extracting remote sensing data of interest, the extracted remote sensing data including extracted atmospheric gas concentration data for the at least one atmospheric gas, extracted multispectral data, and at least one of extracted synthetic aperture radar data or extracted nighttime radiance data; and storing at least the extracted atmospheric gas concentration data and the extracted multispectral data as the training dataset.
- 14 . The information processing system of claim 13 , wherein the atmospheric gas mapping unit generates the training dataset further by: determining one or more local spatial association indicators for the at least one of the extracted synthetic aperture radar data or the extracted nighttime radiance data, wherein the one or more local spatial association indicators provide a set of spatially autocorrelated land use classifications for each raster point of the plurality of raster points; and storing the set of spatially autocorrelated land use classifications as part of the training dataset.
- 15 . The information processing system of claim 14 , wherein determining one or more local spatial association indicators comprises calculating a local Moran's Index for the at least one of the extracted synthetic aperture radar data or the extracted nighttime radiance data.
- 16 . The information processing system of claim 14 , wherein the atmospheric gas mapping unit trains the machine learning model by performing gradient boosting using the extracted multispectral data and the set of spatially autocorrelated land use classifications as explanatory variables and the extracted atmospheric gas concentration data as a target feature to be predicted by the machine learning model.
- 17 . A method for generating one or more high-resolution atmospheric gas concentration maps using geography-informed machine learning, the method comprising: obtaining, by an information processing system, a remote sensing dataset constrained by at least one temporal window and at least one spatial window defining a first geographic area, the remote sensing dataset comprising at least a first set of atmospheric gas concentration data for a plurality of atmospheric gases; generating, by the information processing system, a training dataset by: spatially and temporally aligning subsets of the remote sensing dataset, and extracting, for each raster point of a plurality of raster points in the first geographic area, a set of remote sensing features; training a machine learning model with the training dataset; processing, by the trained machine learning model, an input dataset comprising a second set of atmospheric gas concentration data for at least one atmospheric gas of the plurality of atmospheric gases, the second set of atmospheric gas concentration data being associated with a second geographic area and having a first spatial resolution; predicting, by the trained machine learning model based on processing the input dataset, a plurality of atmospheric gas concentration values for the at least one atmospheric gas, wherein the plurality of predicted atmospheric gas concentration values has a second spatial resolution that is greater than the first spatial resolution; and generating, by the trained machine learning model, a high-resolution atmospheric gas concentration map for the given geographic area based on the plurality of predicted atmospheric gas concentration values, wherein the high-resolution atmospheric gas concentration map is georeferenced and configured for real-time monitoring of atmospheric pollution.
- 18 . The method of claim 17 , wherein the remote sensing dataset further comprises a first set of multispectral data associated with the first geographic area and at least one of synthetic aperture radar data or nighttime radiance data associated with the first geographic area.
- 19 . The method of claim 18 , wherein the remote sensing dataset comprises a plurality of rasters including a first set of rasters representing the first set of atmospheric gas concentration data, a second set of rasters representing the first set of multispectral data, and at least one of a third set of rasters representing the synthetic aperture radar data or a fourth set of rasters representing the nighttime radiance data.
- 20 . The method of claim 19 , wherein generating the training dataset comprises: based on the at least one temporal window and the at least one spatial window, spatially and temporally aligning a plurality of raster subsets including a subset of the first set of rasters, a subset of the second set of rasters, and a subset of at least one of the third set of rasters or the fourth set of rasters; for each raster point in a set of randomly selected raster points constrained by the at least one spatial window, extracting remote sensing data of interest from the plurality of raster subsets, the extracted remote sensing data including extracted atmospheric gas concentration data for the at least one atmospheric gas, extracted multispectral data, and at least one of extracted synthetic aperture radar data or extracted nighttime radiance data; and storing at least the extracted atmospheric gas concentration data and the extracted multispectral data as the training dataset.
Description
TECHNICAL FIELD The present invention is directed to systems and methods for generating high-resolution concentration maps for atmospheric gases using geography-informed machine learning. BACKGROUND As more policies are stipulated for air quality and climate change, there is an increasing need to monitor atmospheric gases, such as nitrogen dioxide (NO2). Monitoring atmospheric gases provides important data from air quality and public health perspectives, such as atmospheric gas concentrations, the location of “hot spots”, and the like. One technique for monitoring NO2 and other atmospheric gases includes using ground sensors and on-demand measurements, such as cars, drones, or aircraft equipped with gas sensors. Although the use of ground sensors and on-demand measurements for monitoring NO2 and other atmospheric gases may reliably quantify surface-level atmospheric gases, this technique typically does not provide dense sensor networks, which are needed for developing fine-scale maps gas concentrations. Also, ground sensor observations are difficult to scale to larger regions. Another technique for monitoring atmospheric gas concentrations includes indirect observation through accounting models. In this technique, modelers use, for example, traffic datasets as a proxy to measure atmospheric gas concentration, and the like. However, with increasing adoption of electric vehicles and other types of zero-emission vehicles, there is a risk of divergence from the metrics provided by this technique and a possibility of providing misleading gas concentrations. Also, this technique typically requires surveys of the land along with human labeling, which can be expensive, time consuming, and prone to human error. BRIEF SUMMARY In one embodiment, a method for generating one or more high-resolution atmospheric gas concentration maps using geography-informed machine learning includes: obtaining a remote sensing dataset constrained by at least one temporal window and at least one spatial window defining a first geographic area, the remote sensing dataset comprising at least a first set of atmospheric gas concentration data for a plurality of atmospheric gases; generating a training dataset based on the remote sensing dataset; and training a machine learning model with the training dataset to predict a plurality of atmospheric gas concentration values for at least one atmospheric gas of the plurality of atmospheric gases in a given geographic area and with a spatial resolution that is greater than a spatial resolution of atmospheric gas concentration data provided as an input to the machine learning module. In another embodiment, an information processing system for generating one or more high-resolution atmospheric gas concentration maps using geography-informed machine learning includes: a processor; memory communicatively coupled to the processor; and an atmospheric gas mapping unit communicatively coupled to the processor and the memory. The atmospheric gas mapping unit obtains a remote sensing dataset constrained by at least one temporal window and at least one spatial window defining a first geographic area. The remote sensing dataset comprising at least a first set of atmospheric gas concentration data for a plurality of atmospheric gases. The atmospheric gas mapping unit further generates a training dataset based on the remote sensing dataset, and trains a machine learning model with the training dataset to predict a plurality of atmospheric gas concentration values for at least one atmospheric gas of the plurality of atmospheric gases in a given geographic area and with a spatial resolution that is greater than a spatial resolution of atmospheric gas concentration data provided as an input to the machine learning module. In a further embodiment, a method for generating one or more high-resolution atmospheric gas concentration maps using geography-informed machine learning includes: obtaining a remote sensing dataset constrained by at least one temporal window and at least one spatial window defining a first geographic area, the remote sensing dataset comprising at least a first set of atmospheric gas concentration data for a plurality of atmospheric gases; generating a training dataset based on the remote sensing dataset; training a machine learning model with the training dataset; and processing, by the trained machine learning model, an input dataset comprising a second set of atmospheric gas concentration data for at least one atmospheric gas of the plurality of atmospheric gases, the second set of atmospheric gas concentration data being associated with a second geographic area and having a first spatial resolution; and predicting, by the trained machine learning module based on processing the input dataset, a plurality of atmospheric gas concentration values for the at least one atmospheric gas, wherein the plurality of predicted atmospheric gas concentration. values has a second spatial resolution that is gr