Official Repository for "Thermal Chameleon Net: Task-Adaptive Tone-mapping for Thermal-Infrared images", Robotics and Automation Letters (RA-L).
Above is a picture of thermal chameleon that we've made using Dall-E.
TLDR: We propose a new task-adaptive learnable tone-mapping network for thermal infrared images from 14-bit (RAW) thermal infrared images.
Thermal Infrared (TIR) imaging provides robust perception for navigating in challenging outdoor environments but faces issues with poor texture and low image contrast due to its 14/16-bit format. Conventional methods utilize various tone-mapping methods to enhance contrast and photometric consistency of TIR images, however, the choice of tone-mapping is largely dependent on knowing the task and temperature dependent priors to work well. In this paper, we present Thermal Chameleon Network (TCNet), a task-adaptive tone- mapping approach for RAW 14-bit TIR images. Given the same image, TCNet tone-maps different representations of TIR images tailored for each specific task, eliminating the heuristic image rescaling preprocessing and reliance on the extensive prior knowledge of the scene temperature or task-specific characteris- tics. TCNet exhibits improved generalization performance across object detection and monocular depth estimation, with minimal computational overhead and modular integration to existing architectures for various tasks.
Too long to read? Here's a TL;DR
Don't spend time on tone-mapping thermal images that would work well for all tasks, instead let the network do it for you, optimized for each task!
Just like the name states, our work is aimed at creating object detection adaptive network from 14-bit thermal images.
Our method is divided into two stages:
- Multichannel thermal embedding: Essentially a tool to represent each absolute temperature value (in Celsius) to a set feature vectors.
- Adaptive channel compression network: Employing lots of multichannel embeddings always don't work and it even incurs high computational cost. More importantly, we can't use transfer learning this way as they are optimized for 3 channel inputs. This essentially enables all those operations by compressing only valid features for object detection in three channel representations.
In essence, what really happens is that we assign task-adaptive weights to each thermal embedding, optimized and controlled by the loss functions of the downstream task.
- All settings use ResNet50 as backbones unless specified
- All models were trained using Nvidia RTX 4090 / RTX-A6000 for YOLOX
- All models were trained for 500 epochs with weights being saved for each epoch. We took the best epoch based on the validation set.
RetinaNet
- Warm up epoch: 10
- Batch size: 16
- Optimizer: AdamW
- Base lr:
$1.5 \times 10^{-4}$ - Scheduler: Cosine annealing
- Data augmentation: Random horizontal flip
- Pretraining?: No (Trained from scratch)
YOLOX
- Warm up epoch: 5
- Batch size: 32
- Optimizer: SGD with momentum of 0.9
- Weight decay: 0.05
- Base lr:
$1.5625 \times 10^{-4}$ - Scheduler: Cosine annealing
- Data augmentation: Random horizontal flip, Random mosaic, Random mixup
- Pretraining?: No (Trained from scratch) Pretty much all settings are identical to original YOLO-X implementations.
Sparse-RCNN
Implemented on MMDetection
- Warm up iterations: 1000 iterations
- Batch size: 16
- Optimizer: AdamW
- Weight decay: 0.0001
- Base lr:
$2.5 \times 10^{-4}$ - Scheduler: Cosine annealing
- Data augmentation: Random horizontal flip, Random mosaic, Random mixup
- Pretraining?: Yes (ImageNet pretraining). For Thermal embedding, we averaged out the 3 channel weights and copied it to all channels for the first conv layer.
Monodepth-Thermal
- Batch size: 4
- Optimizer: Adam
- Base lr:
$1.5 \times 10^{-4}$ - Scheduler: Cosine annealing
- Data augmentation: Random horizontal flip/Random crop
- Pretraining?: Yes (ImageNet Pretraining)
Followed all protocols and most settings used in this repo: https://github.com/UkcheolShin/ThermalMonoDepth
Please consider citing the paper as:
@ARTICLE {dglee-2024-tcnet,
AUTHOR = { Dong-Guw Lee and Jeongyun Kim and Younggun Cho and Ayoung Kim },
TITLE = { Thermal Chameleon: Task-Adaptive Tone-mapping for Radiometric Thermal-Infrared images },
JOURNAL = {IEEE Robotics and Automation Letters (RA-L) },
YEAR = { 2024 },
}
If you have any urgent questions or issues that need to be resolved, please contact me by email.
donkeymouse@snu.ac.kr






