Skip to content

rbrunetta/sonic-log-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sonic Log Prediction using XGBoost

Python 3.12+ License: MIT

This repository contains the complete source code, data, and results for the manuscript:

"Predicting sonic log from well log in the Sergipe-Alagoas Basin using XGBoost Regressor algorithm"
Rodrigo Brunetta, Carolina Danielski Aquino
Brazilian Journal of Geology (under review)


Overview

This study predicts the compressional sonic log (DT) from conventional well logs — Gamma Ray (GR), Deep Resistivity (RT90), Bulk Density (RHOB), and Neutron Porosity (NPHI) — using the XGBoost Regressor algorithm across 28 offshore wells in the Sergipe-Alagoas Basin, Brazil.

Model generalization was assessed using a Leave-One-Well-Out (LOWO) protocol, in which each of the 28 wells was held out once as a blind test set while the remaining 27 wells were used for training. XGBoost was benchmarked against Ridge Regression and Random Forest under the same protocol.

LOWO Results (28 wells):

Model Median R² IQR R² Median RMSE Median MAE
Ridge Regression 0.621 0.259 6.06 µs/ft 4.64 µs/ft
Random Forest 0.815 0.160 4.38 µs/ft 3.12 µs/ft
XGBoost 0.818 0.152 4.36 µs/ft 3.14 µs/ft

Repository Structure

sonic-log-prediction/
│
├── data/
│   ├── LAS/                          # Individual LAS files — one per well (28 files)
│   │   ├── 1-BRSA-1013-SES.las
│   │   ├── 1-BRSA-1088-SES.las
│   │   └── ... (28 wells total)
│   ├── processed/
│   │   └── wells_processed.txt       # Quality-controlled dataset (output of Notebook 1)
│   └── README.md                     # Data description and provenance
│
├── notebooks/
│   ├── 1_EDA_SEAL.ipynb              # Exploratory data analysis and quality control
│   └── 2_Sonic_Log_Prediction.ipynb  # Hyperparameter optimization and LOWO validation
│
├── results/
│   ├── figures/                      # All output figures
│   │   ├── benchmark_boxplot.png
│   │   ├── benchmark_perwell_r2.png
│   │   ├── xgboost_scatter_all_wells.png
│   │   ├── xgboost_residuals.png
│   │   ├── xgboost_feature_importance.png
│   │   └── xgboost_profile_<well>.png  # Per-well LOWO prediction profiles (28 files)
│   ├── metrics/
│   │   ├── benchmark_comparison.csv  # Summary metrics for all three models
│   │   ├── lowo_xgboost_metrics.csv  # Per-well LOWO metrics — XGBoost
│   │   ├── lowo_rf_metrics.csv       # Per-well LOWO metrics — Random Forest
│   │   └── lowo_ridge_metrics.csv    # Per-well LOWO metrics — Ridge Regression
│   ├── params/
│   │   ├── xgboost_best_params.json  # Optimized hyperparameters — XGBoost
│   │   ├── rf_best_params.json       # Optimized hyperparameters — Random Forest
│   │   └── ridge_best_params.json    # Optimized hyperparameters — Ridge Regression
│   └── predicts/
│       ├── lowo_xgboost_predicts.csv # LOWO predictions — XGBoost
│       ├── lowo_rf_predicts.csv      # LOWO predictions — Random Forest
│       └── lowo_ridge_predicts.csv   # LOWO predictions — Ridge Regression
│
├── LICENSE                           # MIT License
├── README.md                         # This file
└── requirements.txt                  # Python dependencies

Methodology

1. Data and Quality Control (Notebook 1)

  • Raw LAS files from 28 offshore wells loaded and standardized
  • Mnemonic standardization across acquisition vendors
  • Physical plausibility filter applied (GR: 0–300 gAPI; RT90: 0.01–30,000 ohm.m; RHOB: 1.0–3.5 g/cm³; NPHI: −5–100%; DT: 40–250 µs/ft; CALI: 4–30 in)
  • Caliper-based IQR filter to remove intervals affected by borehole washout
  • Linear interpolation of isolated DT gaps (max 10 consecutive samples)
  • The DT curve was not used as a removal criterion at any stage
  • Final dataset: 231,816 depth samples across 28 wells

2. Hyperparameter Optimization (Notebook 2)

  • RandomizedSearchCV with 50 random combinations per model
  • GroupKFold (cv = 10) with well identity as grouping variable, preventing depth-point data leakage between correlated adjacent samples
  • Models optimized: Ridge Regression, Random Forest, XGBoost
  • Selection criterion: mean R² across validation folds

3. LOWO Validation (Notebook 2)

  • Each of the 28 wells held out once as a blind test set
  • Model retrained from scratch on the remaining 27 wells in each iteration
  • Performance metrics (R², RMSE, MAE) computed for each held-out well
  • Results summarized as median and IQR across 28 independent evaluations

Quick Start

Prerequisites

  • Python 3.12 or higher
  • 8 GB RAM minimum

Installation

git clone https://github.com/rbrunetta/sonic-log-prediction.git
cd sonic-log-prediction
pip install -r requirements.txt

Running the Analysis

Step 1 — Exploratory Data Analysis and Quality Control:

jupyter notebook notebooks/1_EDA_SEAL.ipynb

This notebook loads the 28 LAS files from data/LAS/, performs quality control, and saves the processed dataset to data/processed/wells_processed.txt.

Step 2 — Hyperparameter Optimization and LOWO Validation:

jupyter notebook notebooks/2_Sonic_Log_Prediction.ipynb

This notebook loads data/processed/wells_processed.txt, optimizes hyperparameters for the three models, runs the full LOWO validation, and saves all results to results/.


Dataset

Well log data from 28 offshore wells in the Sergipe-Alagoas Basin were provided by ANP (Agência Nacional do Petróleo, Gás Natural e Biocombustíveis). Each well contains GR, RT90, RHOB, NPHI, DT, and CALI logs in standard LAS format.

Property Value
Number of wells 28
Total samples (after QC) 231,816
Depth samples per well 3,436 – 21,838
Input features GR, RT90, RHOB, NPHI
Target variable DT (µs/ft)
Data format LAS (raw) / CSV (processed)

See data/README.md for full data description.


Authors


Acknowledgments

The authors thank the Universidade Federal do Paraná (UFPR), the Programa de Pós-Graduação em Geologia (PPGEOL), the Laboratório de Análises de Bacias e Petrofísica (LABAP), CAPES (Finance Code 001), ANP for providing the well log data, and Marina Martins (PRIO), Tiago de Bittencourt Rossi (Petrobras), and Marcelo Guarido de Andrade (CREWES) for technical assistance.


Citation

If you use this code or dataset, please cite:

Brunetta, R., & Aquino, C. D. (2025). Predicting sonic log from well log in the
Sergipe-Alagoas Basin using XGBoost Regressor algorithm.
Brazilian Journal of Geology. https://github.com/rbrunetta/sonic-log-prediction

License

This project is licensed under the MIT License — see the LICENSE file for details.