Search

US-12619896-B2 - Bayesian optimal model system (BOMS) for predicting equilibrium ripple geometry and evolution

US12619896B2US 12619896 B2US12619896 B2US 12619896B2US-12619896-B2

Abstract

A method of training a machine learning model to predict seafloor ripple geometry that includes receiving one or more input values, each input value based on an observation associated with ocean wave and seafloor conditions, and preprocessing the one or more input values. The method includes generating a training data set based on the preprocessed data set, splitting the training data set into a plurality of folds, and training via stacked generalization the machine learning model by performing a cross validation of each fold of training data based on at least one deterministic equilibrium ripple predictor model and on at least one machine learning algorithm. The method may include generating via the trained machine learning model, a set of one or more seafloor ripple geometry predictions, and performing Bayesian regression on the set of one or more seafloor ripple predictions to generate a probabilistic distribution of predicted seafloor ripple geometry.

Inventors

  • Ryan E. Phillip
  • Allison M. Penko

Assignees

  • THE GOVERNMENT OF THE UNITED STATES OF AMERICA, AS REPRESENTED BY THE SECRETARY OF THE NAVY

Dates

Publication Date
20260505
Application Date
20220628

Claims (19)

  1. 1 . A method of training a machine learning model to predict seafloor ripple geometry, the method comprising: receiving, by a processing device, one or more input values, each input value being based on an observation associated with ocean wave and seafloor conditions; generating, by the processing device, a preprocessed data set by performing at least one preprocessing step on the one or more input values, wherein the at least one preprocessing step comprises imputing one or more null values into the one or more input values, filtering the one or more input values for equilibrium ripples, or scaling one or more of the one or more input values; generating, by the processing device, a training data set based on the preprocessed data set; splitting, by the processing device, the training data set into a plurality of folds, wherein each of the folds comprises a distinct test set for evaluating predictions; training, by the processing device, via stacked generalization, the machine learning model by performing a cross validation of each of the folds of training data based on at least one deterministic equilibrium ripple predictor model and on at least one machine learning algorithm; generating, by the processing device, via the trained machine learning model, a set of one or more seafloor ripple geometry predictions; performing, by the processing device, Bayesian regression on the set of one or more seafloor ripple geometry predictions, wherein the Bayesian regression is based on a posterior distribution generated using Markov Chain Monte Carlo sampling; responsive to performing the Bayesian regression, generating, by the processing device, a probabilistic distribution of predicted seafloor ripple geometry; and performing one or more underwater operations based on the generated probabilistic distribution of predicted seafloor ripple geometry.
  2. 2 . The method of claim 1 , wherein the one or more input values is based on the preprocessed data set collected over a period of time.
  3. 3 . The method of claim 2 , wherein the one or more input values further comprises a set of additional training data separate from the preprocessed data set.
  4. 4 . The method of claim 1 , wherein the trained machine learning model comprises one or more hyperparameters.
  5. 5 . The method of claim 1 , wherein the trained machine learning model comprises at least one hyperparameter, wherein the method further comprises optimizing the at least one hyperparameter for an optimal value via a grid search operation, wherein the optimal value minimizes model overfitting.
  6. 6 . The method of claim 5 , wherein the optimizing comprises optimizing one or more hyperparameters for a gradient boosting regression base model, and wherein one or more of the at least one hyperparameter comprises a hyperparameter for an XGBoost Regressor base model is set to a default value.
  7. 7 . The method of claim 1 , wherein the at least one machine learning algorithm comprises an XGBoost Regressor base model or a gradient boosting regression base model.
  8. 8 . The method of claim 7 , further comprising optimizing one or more hyperparameters for the gradient boosting regression base model.
  9. 9 . The method of claim 1 , further comprising determining an amount of the plurality of the folds based on bias-variance tradeoff, wherein bias is associated with model generalization, and variance is associated with model overfitting or model underfitting.
  10. 10 . The method of claim 1 , wherein training the machine learning model comprises evaluating model performance based on at least one of adjusted R-squared (R 2 adj ), root-mean-square-error (RMSE), or bias, wherein R 2 is a percentage of a dependent feature variation of the machine learning model, adjusted R-squared (R 2 adj ) is a modified version of R 2 that takes one or more predictors into account, RMSE is a square root of an average of squared errors, and bias is associated with a difference between a predicted model value and a corresponding observed value.
  11. 11 . The method of claim 1 , wherein the one or more input values comprise ripple height or ripple wavelength.
  12. 12 . The method of claim 1 , the one or more input values exclude ripple height or ripple wavelength.
  13. 13 . The method of claim 1 , wherein the generated set of one or more seafloor ripple geometry predictions comprises a generated set of a plurality of seafloor ripple geometry predictions, wherein performing the Bayesian regression further comprises: determining biases associated with the at least one deterministic equilibrium ripple predictor model and the at least one machine learning algorithm, and combining the generated set of the plurality of seafloor ripple geometry predictions based on the determined biases.
  14. 14 . The method of claim 1 , wherein performing the Bayesian regression further comprises performing the Bayesian regression on observed true values associated with the at least one deterministic equilibrium ripple predictor model and the at least one machine learning algorithm.
  15. 15 . The method of claim 1 , wherein performing one or more underwater operations further comprises developing a mission route plan associated with operating a vessel based on the generated probabilistic distribution of predicted seafloor ripple geometry.
  16. 16 . The method of claim 1 , wherein performing one or more underwater operations further comprises calculating a performance of an acoustical device based on the generated probabilistic distribution of predicted seafloor ripple geometry.
  17. 17 . The method of claim 1 , wherein performing one or more underwater operations further comprises detecting an obscured object based on the generated probabilistic distribution of predicted seafloor ripple geometry.
  18. 18 . The method of claim 1 , wherein performing one or more underwater operations further comprises inputting one or more seafloor roughness values into a coastal hydrodynamic model or a morphological model based on the generated probabilistic distribution of predicted seafloor ripple geometry.
  19. 19 . The method of claim 1 , wherein the probabilistic distribution of predicted seafloor ripple geometry comprises one or more seafloor ripple wavelengths at a specified location.

Description

CROSS-REFERENCE This Application is a nonprovisional application of and claims the benefit of priority under 35 U.S.C. § 119 based on U.S. Provisional Patent Application No. 63/216,522 filed on Jun. 29, 2021. The Provisional Application and all references cited herein is hereby incorporated by reference into the present disclosure in their entirety. FEDERALLY-SPONSORED RESEARCH AND DEVELOPMENT The United States Government has ownership rights in this invention. Licensing inquiries may be directed to Office of Technology Transfer, US Naval Research Laboratory, Code 1004, Washington, DC 20375, USA; +1.202.767.7230; techtran@nrl.navy.mil, referencing Navy Case #109643. TECHNICAL FIELD The present disclosure is related to predicting seafloor ripple wavelengths, and more specifically to, but not limited to, use of environmental and machine learning algorithms that provide probabilistic predictions of seafloor ripple wavelengths at a specific location given surface wave conditions. BACKGROUND Existing ripple geometry prediction equations, which resulted from decades of field and laboratory research on seafloor ripples, have traditionally been developed using least squares fits to equilibrium ripple observations (see e.g., Pedocchi and Garcia (2009), Soulsby and Whitehouse (2005), Grasmeijer and Kleinhans (2004), Faraci and Foti (2002), Styles and Glenn (2002), Wiberg and Harris (1994), Mogridge et al. (1994), Van Rijn et al. (1993), Grant and Madsen (1982), Nielsen (1981), and others). While the prediction equation skill has increased as new data has been collected, the method of fitting a deterministic equation to ripple geometry observations has generally stayed the same. Improvements to these models has been made with new data and, for example, the implementation of time dependency (Traykovski, 2007); however, significant uncertainty remains in the deterministic estimations. Some more recent models has been built using various machine learning techniques such as artificial neural networks (ANNs) (Yan et al., 2008) and genetic programming (Goldstein et al., 2013). However, these models, do not undergo preprocessing techniques, have a limited training set, and/or do not perform cross validation. Because none of the aforementioned models provide probabilistic predictions, there exists a need for a solution for a robust system that produces probabilistic seafloor ripple wavelength predictions. SUMMARY This summary is intended to introduce, in simplified form, a selection of concepts that are further described in the Detailed Description. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. Instead, it is merely presented as a brief overview of the subject matter described and claimed herein. Disclosed aspects provide novel systems and methods that utilizes the ensemble machine learning method of stacked generalization may be used to develop the Bayesian Optimal Model System (BOMS). Disclosed aspects provide for a robust system that produces probabilistic seafloor ripple wavelength predictions by combining the predictive capabilities of multiple algorithms. The present disclosure provides for a method of training a machine learning model to predict seafloor ripple geometry. The method may include receiving, by a processing device, one or more input values, each input value based on an observation associated with ocean wave and seafloor conditions, and generating, by the processing device, a preprocessed data set by performing at least one preprocessing step on the one or more input values, wherein the at least one preprocessing step comprises imputing one or more null values into the input values, filtering the input values for equilibrium ripples, or scaling one or more of the input values. The method may include generating, by the processing device, a training data set based on the preprocessed data set, splitting, by the processing device, the training data set into a plurality of folds, each of the folds comprises a distinct test set for evaluating predictions, and training, by the processing device, via stacked generalization, the machine learning model by performing a cross validation of each fold of training data based on at least one deterministic equilibrium ripple predictor model and on at least one machine learning algorithm. The method may include generating, by the processing device, via the trained machine learning model, a set of one or more seafloor ripple geometry predictions, and performing, by the processing device, Bayesian regression on the set of one or more seafloor ripple geometry predictions, wherein the Bayesian regression is based on posterior distribution generated using Markov Chain Monte Carlo sampling. The method may include responsive to performing the Bayesian regression, generating, by the processing device, a probabilistic distribution of pre