Physics-aware AutoML

Laurens Arp

PhD student

Supervisors:

Dr. Mitra Baratchi
Prof.dr. Holger Hoos
Prof.dr. Peter van Bodegom

Contact

l.r.arp (at) liacs (dot) leidenuniv (dot) nl

Publications

Laurens Arp, Mitra Baratchi and Holger Hoos. VPint: value propagation-based spatial interpolation. Data Mining and Knowledge Discovery, 36:, 2022. view publication PDF BibTeX ∙ citations

Laurens Arp, Dyon van Vreumingen, Daniela Gawehns and Mitra Baratchi. Dynamic macro scale traffic flow optimisation using crowd-sourced urban movement data. In 2020 21st IEEE International Conference on Mobile Data Management (MDM), 168–177. IEEE, 2020. PDF BibTeX citations

The topic of Laurens’ PhD project is “Physics-Aware Automated Machine Learning for Earth Observation Data”. Physics-Aware Machine Learning is an emerging field in which, typically, traditional physics-based models are combined with data-driven machine learning models to create hybrid models. A broader interpretation of this field would be to take a holistic view of the entire problem pipeline, from data collection to corrections to model construction to actual parameter retrieval, and inject domain knowledge or data-driven modules into it as needed. The Automated Machine Learning (AutoML) angle of this specific project takes this principle one step further. Just like domain knowledge, different parts of the full pipeline can likewise be explored for opportunities for automation. On the one hand, the physics-aware setup is more challenging to apply traditional AutoML methods to. On the other hand, the addition of domain knowledge can open exciting new paths to explore, e.g., by reducing the search space size, or by guiding the optimisation process. This can result in automatically generated, specialised models, taking advantage of domain knowledge codified in, for example, physical models or physical constraints. This is particularly valuable for Earth Observation-based applications, since the Earth is too diverse for conventional models to generalise well, while transferring specialised models typically requires substantial human effort.

Within this larger project, Laurens has worked on the following sub-projects:

Automated Instance Generation for LAI Retrieval Using Inverted Simulation
Automated Configuration of Radiative Transfer Models For LAI Retrieval

Our approach aims to automatically create a Lookup Table (LUT) with the appropriate data for a specific study area, without requiring ground truth data. This allows users to train a specialised machine learning model on this customised dataset, avoiding the problems of geographical diversity and non-transferability that make retrieval applications challenging when applied at a global scale.

To generate a LUT, Radiative Transfer Models (RTMs) are first used in a forward pass to generate this data. The problem is that these RTMs are ill-posed; many combinations of their (often real-valued) parameters can result in a similar spectral output. This results in problems for machine learning training, because the same input (spectral reflectance) will have multiple different labels (biophysical parameters) associated with them. To avoid this problem, domain experts tend to use highly specific knowledge about the distribution of plant species in their study area to constrain parameters to limited ranges. This reduces the ill-posedness of the problem, thus making it more likely that a value close to the correct parameter of interest is matched to the spectral input. However, models trained on this data cannot be transferred to other ecosystems with a completely different balance of biophysical parameters. Therefore, we aim to automate this process, such that every new study area will not require a domain expert to spend many valuable hours researching a study area to constrain the parameter space, but aim to do so automatically, leveraging the research experts have already done (and allowing for new research to further enhance the method).

Training-Free Cloud Removal for Sentinel-2 Optical Imagery Using Value Propagation Interpolation

Our approach, based on our interpolation algorithm VPint, allows users to remove clouds from their optical imagery, using a previously sensed reference image. Unlike mosaicking approaches, the reference image will only be used to extract the spatial structure of the image, while the up-to-date information from non-cloudy pixels will be propagated. Unlike neural network-based approaches, there is no training phase necessary; VPint can be run on any image with a reference image from any time before. This makes VPint a particularly attractive option compared to neural networks for tasks such as cloud removal, since such methods cannot learn functions for every possible time difference.