DarkBottomLine Framework

A modular Coffea-based analysis framework for CMS Run 3 bbMET analysis.

Overview

DarkBottomLine is designed to process NanoAOD datasets using Coffea, producing flat output (ROOT or Parquet) containing analysis-level variables. The framework is generic, configurable for each Run 3 year (2022–2024) via metaconditions, and operates on NanoAOD datasets as input.

Features

Modular Design: Separate modules for objects, selections, corrections, weights, and histograms
Config-Driven: Year-specific parameters in YAML configuration files
Coffea Integration: Uses Coffea NanoEvents for efficient event processing
Correction Support: Integration with correctionlib for scale factors
Multiple Executors: Support for iterative, futures, and Dask execution backends
Flexible Output: Support for ROOT, Parquet, and pickle output formats
Validation Tools: Jupyter notebook for framework validation and plotting

Installation

Locally

Prerequisites

Python 3.9+
Conda or pip package manager

Setup

Clone the repository:

git clone <repository-url>
cd DarkBottomLine

Create a conda environment:

conda create -n darkbottomline python=3.9
conda activate darkbottomline

Install dependencies:

pip install -r requirements.txt

Install the package in development mode:

pip install -e .

Lxplus

Source the following file

source /cvmfs/sft.cern.ch/lcg/views/LCG_105/x86_64-el9-gcc11-opt/setup.sh

If above method does not work then try to install and load the CMSSW release:

cmsrel CMSSW_15_0_17
cd CMSSW_15_0_17/src
cmsenv

Clone the repository:

git clone https://github.com/tiwariPC/DarkBottomLine.git
cd DarkBottomLine

Check the pre-installed packages that come with CMSSW release:

python3 check_requirements.py

# To install the missing packages
python3 check_requirements.py --install --local-dir ./.local

# Please add the suggested PYTHONPATH in the output of the above installation to your Python path

Run final installation script:

chmod +x install_lxplus.sh
./install_lxplus.sh

Environment Setup (Lxplus)

After installation, you need to set up your environment before using DarkBottomLine. The install_lxplus.sh script automatically sets up the environment for the current session, but for future logins you'll need to use the start.sh script.

First-time Installation (Automatic Setup)

When you run install_lxplus.sh, it automatically:

Sets up PYTHONPATH to include the installed packages
Adds the darkbottomline command to your PATH
Exports these paths for the current session

After running install_lxplus.sh, you can immediately use DarkBottomLine in that session (after sourcing LCG environment).

Using DarkBottomLine in Future Sessions

Every time you start a new shell session on lxplus, you need to:

Source LCG environment first (critical):

   source-lcg
   # Or if you don't have the function:
   source /cvmfs/sft.cern.ch/lcg/views/LCG_105/x86_64-el9-gcc11-opt/setup.sh

Source the start.sh script to set up DarkBottomLine environment:

   cd /path/to/DarkBottomLine
   source start.sh
   # Or:
   . start.sh

Verify the setup:

   darkbottomline --help
   # Or:
   python3 -c "from darkbottomline import DarkBottomLineProcessor; print('✓ Import successful')"

Condor Setup

cd condorJobs
# Edit submit.sub file, change the user letter and username in line 3
# Change the <full_path> in runanalysis.sh and relevant command as needed
voms-proxy-init --voms cms --valid 192:00 && cp /tmp/x509up_u$(id -u) /afs/cern.ch/user/u/username/private/
condor_submit submit.sub

Quick Start

Full Analysis Workflow

The DarkBottomLine framework supports a complete analysis workflow from NanoAOD processing to plot generation. Here's how to run the entire analysis:

Step 1: Run Multi-Region Analysis

Run the analysis on your input files (data, MC backgrounds, signal). You can provide input files one by one, as multiple arguments, or listed in a .txt file.

# Activate virtual environment
source venv/bin/activate

# Run analysis on a single data file
darkbottomline analyze \
    --config configs/2024.yaml \
    --regions-config configs/regions.yaml \
    --input /path/to/data/nano_data.root \
    --output outputs/hists/regions_data.pkl \
    --max-events 10000

# Run analysis on MC backgrounds from a list of files in a .txt file
darkbottomline analyze \
    --config configs/2024.yaml \
    --regions-config configs/regions.yaml \
    --input my_background_files.txt \
    --output outputs/hists/regions_dy.pkl

# Run analysis on multiple signal files directly
darkbottomline analyze \
    --config configs/2024.yaml \
    --regions-config configs/regions.yaml \
    --input /path/to/signal/nano_signal_1.root /path/to/signal/nano_signal_2.root \
    --output outputs/hists/regions_signal.pkl

When using a .txt file for input, list one file path per line. Empty lines and lines starting with # will be ignored.

Analysis Options:

--config: Base configuration file (e.g., configs/2024.yaml)
--regions-config: Regions configuration file (e.g., configs/regions.yaml)
--input: Input NanoAOD ROOT file(s). Can be a single file, multiple files, or a .txt file containing a list of file paths.
--output: Output pickle file path
--executor: Execution backend (iterative, futures, dask) - default: iterative
--workers: Number of parallel workers (for futures/dask) - default: 4
--chunk-size: Number of events per chunk for futures/dask executors (default: 50000 for futures, 200000 for dask). Useful for managing memory with large files.
--max-events: Maximum number of events to process (optional, for testing). For futures/dask executors, this is converted to maxchunks based on chunk-size.

Step 2: Generate Plots

Generate data/MC plots from the analysis results:

# Generate plots from analysis results
darkbottomline make-plots \
    --input outputs/hists/regions_data.pkl \
    --save-dir outputs \
    --show-data

# With custom plotting configuration
darkbottomline make-plots \
    --input outputs/hists/regions_data.pkl \
    --save-dir outputs \
    --show-data \
    --plot-config configs/plotting.yaml \
    --version 20251105_1100

Plotting Options:

--input: Input results pickle file
--save-dir: Base output directory (default: outputs)
--show-data: Include data points on plots
--plot-config: Plotting configuration file (default: configs/plotting.yaml)
--version: Version string for output directory (default: auto-generate timestamp)
--regions: Specific regions to plot (optional, default: all regions)

Complete Workflow Example

# 1. Setup
source venv/bin/activate
cd /path/to/DarkBottomLine

# 2. Run analysis on all samples (using a .txt file for inputs)
# Create a file, e.g. dy_inputs.txt, with your list of ROOT files.
darkbottomline analyze \
    --config configs/2024.yaml \
    --regions-config configs/regions.yaml \
    --input dy_inputs.txt \
    --output outputs/hists/regions_dy.pkl

# 3. Generate plots
darkbottomline make-plots \
    --input outputs/hists/regions_data.pkl \
    --save-dir outputs \
    --show-data

# 4. Plots are saved in: outputs/plots/{version}/
#    - PNG: outputs/plots/{version}/png/{category}/{region}/
#    - PDF: outputs/plots/{version}/pdf/{category}/{region}/
#    - ROOT: outputs/plots/{version}/root/
#    - Text: outputs/plots/{version}/text/{category}/{region}/
#    - Summary: outputs/plots/{version}/region_summary.{png,pdf}

Basic Usage (Simple Analysis)

For a simple single-region analysis without the multi-region framework:

darkbottomline run \
    --config configs/2024.yaml \
    --input /path/to/nanoaod_or_file_list.txt \
    --output results.pkl \
    --event-selection-output output/event_selected.pkl  # optional: save events for event-level selection
    --executor iterative

Command Line Options

Analysis Commands:

analyze: Multi-region analysis with full region definitions
run: Simple single-region analysis
--config: Path to YAML configuration file
--regions-config: Path to regions configuration file (for analyze command)
--input: Path to input NanoAOD file(s). Can be a single file, multiple files, or a .txt file containing a list of file paths.
--output: Path to output file (supports .parquet, .root, .pkl)
--executor: Execution backend (iterative, futures, dask)
--workers: Number of parallel workers (for futures/dask)
--chunk-size: Number of events per chunk for futures/dask executors (default: 50000 for futures, 200000 for dask). Helps manage memory with large files.
--max-events: Maximum number of events to process. For futures/dask executors, converted to maxchunks based on chunk-size.
--event-selection-output: Optional path to save events that pass event-level selection (supports .pkl and .root).
- If you provide a .pkl path, a plain-Python-serializable pickle will be saved and a raw awkward backup *.awk_raw.pkl will also be created.
- If you provide a .root path, a small ROOT TTree Events will be written containing scalar branches (event identifiers, MET scalars, and object multiplicities).

Plotting Commands:

make-plots: Generate individual variable plots and grouped plots
make-stacked-plots: Generate stacked Data/MC plots with ratio
--show-data: Show data points on plots
--plot-config: Plotting configuration file
--version: Version string for output directory

Example with Different Executors

The input flexibility works with all executors. For example:

# Iterative execution (single-threaded, good for debugging)
python run_analysis.py --config configs/2023.yaml --input my_files.txt --output results.pkl --executor iterative

# Futures execution (multi-threaded, good for local parallelization)
python run_analysis.py --config configs/2023.yaml --input file1.root file2.root --output results.parquet --executor futures --workers 4

# Futures execution with custom chunk size (for large files)
python run_analysis.py --config configs/2023.yaml --input large_file.root --output results.pkl --executor futures --workers 8 --chunk-size 100000

# Dask execution (distributed, good for production)
python run_analysis.py --config configs/2023.yaml --input nanoaod.root --output results.root --executor dask --workers 8

# Dask execution with custom chunk size (default is 200000 for dask)
python run_analysis.py --config configs/2023.yaml --input large_file.root --output results.root --executor dask --workers 8 --chunk-size 500000

Chunk Size Notes:

Chunk size controls how many events are processed per chunk, helping manage memory usage
Smaller chunks (e.g., 50000) use less memory but may have more overhead
Larger chunks (e.g., 200000+) are more efficient but require more memory
Default: 50000 for futures executor, 200000 for dask executor
Only applies to futures and dask executors (uses run_uproot_job internally)
The iterative executor loads all events at once and doesn't use chunking

Configuration

The framework uses YAML configuration files for year-specific parameters. Configuration files are located in the configs/ directory:

configs/2022.yaml: 2022 data-taking parameters
configs/2023.yaml: 2023 data-taking parameters
configs/2024.yaml: 2024 data-taking parameters
configs/regions.yaml: Region definitions with categories and channels
configs/plotting.yaml: Plotting configuration and exclusions

Region Definitions

Regions are defined in configs/regions.yaml with the format: {category}:{region_type}_{channel}

Categories:

1b: 1 b-tag category (≤2 jets, 1 b-jet)
2b: 2 b-tag category (3 jets, 2 b-jets)

Region Types:

SR: Signal region
CR_Wlnu: W+jets control region
CR_Top: Top control region
CR_Zll: Z+jets control region

Channels:

mu: Muon channel
el: Electron channel

Example Regions:

1b:SR - Signal region, 1 b-tag
2b:SR - Signal region, 2 b-tags
1b:CR_Wlnu_mu - W+jets CR, 1b, muon channel
2b:CR_Top_el - Top CR, 2b, electron channel
1b:CR_Zll_mu - Z+jets CR, 1b, muon channel

Configuration Structure

year: 2023
lumi: 35.9  # fb^-1

# Correction file paths
corrections:
  pileup: data/corrections/pileup_2023.json.gz
  btagSF: data/corrections/btagging_2023.json.gz
  muonSF: data/corrections/muonSF_2023.json.gz
  electronSF: data/corrections/electronSF_2023.json.gz

# Trigger paths
triggers:
  MET: ["HLT_PFMET120_PFMHT120_IDTight"]
  SingleMuon: ["HLT_IsoMu24", "HLT_IsoMu27"]

# Object selection cuts
objects:
  muons:
    pt_min: 20.0
    eta_max: 2.4
    id: "tight"
    iso: "tight"
  # ... more object configurations

# Event selection
event_selection:
  min_muons: 0
  max_muons: 2
  min_jets: 2
  min_bjets: 1
  met_min: 50.0

Region Structure

The analysis uses a category-based region structure with channel separation:

Control Region Definitions

Z CR Separation:

Z_1b: (njet <= 2) and (jet1Pt > 100.)
Z_2b: (njet <= 3 and njet > 1) and (jet1Pt > 100.)

Channel Separation:

All CRs (Top, W, Z) have separate muon and electron channels
Taus are vetoed for the full analysis

Framework Components

1. Object Selection (`darkbottomline/objects.py`)

Physics object selection and cleaning functions:

select_muons(): Muon selection with ID and isolation cuts
select_electrons(): Electron selection with ID and isolation cuts
select_taus(): Tau selection with ID and decay mode cuts
select_jets(): AK4 jet selection with jet ID cuts
select_fatjets(): AK8 fat jet selection
clean_jets_from_leptons(): Delta-R based overlap removal
get_bjet_mask(): B-tagging working point selection

2. Region Management (`darkbottomline/regions.py`)

Multi-region analysis with category and channel separation:

RegionManager: Manages multiple analysis regions
Region: Single region with cuts and properties
apply_regions(): Apply region cuts to events
Supports category-based regions (1b, 2b) with channel separation

3. Multi-Region Analyzer (`darkbottomline/analyzer.py`)

Multi-region analysis processor:

DarkBottomLineAnalyzer: Extends base processor for multi-region analysis
process(): Process events through all defined regions
_fill_region_histograms(): Fill histograms for each region
_calculate_region_cutflow(): Calculate cutflow per region
save_results(): Save results with full region names preserved

4. Plotting (`darkbottomline/plotting.py`)

Data/MC plotting with CMS styling:

PlotManager: Manages plot creation and styling
create_all_plots(): Generate all plots for all regions
_get_excluded_variables_for_region(): Region-specific plot exclusions
Supports multiple formats: PNG, PDF, ROOT, TXT
CMS plotting style with mplhep
Configurable exclusions via configs/plotting.yaml

5. Histograms (`darkbottomline/histograms.py`)

Histogram definitions and filling:

HistogramManager: Manages histogram creation and filling
Histogram types: MET, jet kinematics, lepton kinematics, b-tagging, derived variables
Support for both hist library and fallback implementation
~40+ histogram definitions matching StackPlotter variables

6. Weights (`darkbottomline/weights.py`)

Weight calculation and combination:

WeightCalculator: Combines all weights using Coffea's Weights class
add_generator_weight(): Generator weight handling
add_corrections(): Correction weight application
get_weight(): Final weight calculation with systematic variations

Output Formats

Parquet Output

# Skimmed events with selected objects
skimmed_events = {
    "event": events.event,
    "run": events.run,
    "luminosityBlock": events.luminosityBlock,
    "MET": {"pt": events.MET.pt, "phi": events.MET.phi},
    "weights": event_weights,
    "muons": selected_muons,
    "electrons": selected_electrons,
    "jets": selected_jets,
    "bjets": selected_bjets,
}

ROOT Output

Histograms saved as ROOT histograms with metadata.

Pickle Output

Complete analysis results including histograms, cutflow, and metadata.

Output Structure

Analysis Results

Analysis results are saved as pickle files with the following structure:

outputs/hists/
├── regions_data.pkl
├── regions_dy.pkl
├── regions_signal.pkl
└── ...

Each pickle file contains:

region_histograms: Dictionary of histograms per region
regions: Region processing results
region_cutflow: Cutflow statistics per region
region_validation: Region validation results
metadata: Analysis metadata

Plot Output Structure

Plots are organized in a versioned directory structure:

outputs/plots/{version}/
├── png/
│   ├── 1b/
│   │   ├── SR/
│   │   │   ├── 1b_SR_met.png
│   │   │   ├── 1b_SR_met_log.png
│   │   │   └── ...
│   │   ├── Wlnu_mu/
│   │   │   ├── 1b_Wlnu_mu_lep1_pt.png
│   │   │   └── ...
│   │   ├── Wlnu_el/
│   │   ├── Zll_mu/
│   │   └── Zll_el/
│   └── 2b/
│       ├── SR/
│       ├── Top_mu/
│       ├── Top_el/
│       ├── Zll_mu/
│       └── Zll_el/
├── pdf/ (same structure as png/)
├── text/ (same structure as png/)
├── root/
│   ├── met.root (one file per variable)
│   └── ...
└── region_summary.{png,pdf}

File Naming Convention:

Format: {category}_{region_dir}_{variable_name}.{format}
Examples:
- 1b_SR_met.png
- 2b_Top_mu_lep1_pt.png
- 1b_Zll_mu_z_mass.png

Plot Exclusions:

1b SR: Excludes jet3 plots and all lepton plots
2b SR: Excludes lepton plots (includes jet3)
Top/W CRs: Exclude z_mass and z_pt plots
Z CRs: Include z_mass and z_pt plots

See configs/plotting.yaml for configurable exclusions.

Validation

Use the validation notebooks to test and verify the framework:

jupyter notebook notebooks/

Available Validation Notebooks:

01_plot_exclusions_validation.ipynb - Test plot exclusions
02_region_definitions_validation.ipynb - Validate region definitions
03_histogram_structure_validation.ipynb - Check histogram structure
04_plot_output_structure_validation.ipynb - Verify plot directory structure
05_configuration_validation.ipynb - Validate configuration files
06_data_mc_comparison_validation.ipynb - Compare data/MC yields

See notebooks/README.md for detailed documentation.

Extension Guide

Adding New Years

Create new YAML configuration file in configs/
Update luminosity values and trigger paths
Adjust object selection cuts if needed

Adding New Corrections

Add correction file path to configuration
Implement correction method in CorrectionManager
Add correction to weight calculation

Adding New Histograms

Define histogram in HistogramManager.define_histograms()
Add filling logic in fill_histograms()
Update validation notebook if needed

Dependencies

Core: coffea, awkward, uproot, correctionlib
Execution: dask, distributed
Output: pyarrow, pandas
Visualization: matplotlib, jupyter, mplhep
Histogramming: hist
ROOT: pyroot (optional, for ROOT file output)

Install all dependencies:

pip install -r requirements.txt

For ROOT support (optional):

# Install ROOT via conda or system package manager
conda install -c conda-forge root

Troubleshooting

Common Issues

Missing correction files: Ensure correction files are in the correct paths specified in configuration. Warnings are acceptable if corrections are not needed for testing.
Import errors: Check that all dependencies are installed correctly:
```
pip install -r requirements.txt
pip install -e .
```
Memory issues:
- Use --max-events to limit events for testing:
```
darkbottomline analyze ... --max-events 10000
```
- For futures/dask executors, reduce --chunk-size to process smaller chunks:
```
darkbottomline analyze ... --executor futures --chunk-size 25000
```
Executor issues: Try different executors (iterative, futures, dask)
ROOT not available: ROOT files won't be generated if ROOT library is not installed. Other formats (PNG, PDF, TXT) will still be created.
Old region format: If you see regions like CR_Zll instead of 1b:CR_Zll_mu, the data file was created with an old version. Re-run the analysis with the updated regions.yaml.
Plot exclusions not working: Check that configs/plotting.yaml is loaded correctly and exclusion patterns match variable names.

Debug Mode

Run with debug logging to see detailed information:

# Set log level
export PYTHONPATH=.
python -c "import logging; logging.basicConfig(level=logging.DEBUG)"

# Run analysis
darkbottomline analyze ... --log-level DEBUG

Validation

Run validation notebooks to check framework setup:

jupyter notebook notebooks/01_plot_exclusions_validation.ipynb

Complete Analysis Example

Here's a complete example running the full analysis workflow:

#!/bin/bash
# Complete analysis workflow

# Setup
source venv/bin/activate
cd /path/to/DarkBottomLine

# Configuration
CONFIG="configs/2024.yaml"
REGIONS_CONFIG="configs/regions.yaml"
INPUT_DIR="/path/to/nanoaod"
OUTPUT_DIR="outputs/hists"

# 1. Run analysis on all samples
echo "Running analysis on data..."
darkbottomline analyze \
    --config $CONFIG \
    --regions-config $REGIONS_CONFIG \
    --input ${INPUT_DIR}/nano_data.root \
    --output ${OUTPUT_DIR}/regions_data.pkl

echo "Running analysis on DY..."
darkbottomline analyze \
    --config $CONFIG \
    --regions-config $REGIONS_CONFIG \
    --input ${INPUT_DIR}/nano_dy.root \
    --output ${OUTPUT_DIR}/regions_dy.pkl

echo "Running analysis on signal..."
darkbottomline analyze \
    --config $CONFIG \
    --regions-config $REGIONS_CONFIG \
    --input ${INPUT_DIR}/nano_signal.root \
    --output ${OUTPUT_DIR}/regions_signal.pkl

# 2. Generate plots
echo "Generating plots..."
darkbottomline make-plots \
    --input ${OUTPUT_DIR}/regions_data.pkl \
    --save-dir outputs \
    --show-data \
    --plot-config configs/plotting.yaml

echo "Analysis complete! Plots saved to outputs/plots/{version}/"

Documentation

Analysis Structure: See docs/analysis_structure.md for region naming conventions and structure flow
Plotting Configuration: See docs/plotting_configuration.md for plot exclusion configuration
Validation Notebooks: See notebooks/README.md for validation notebook documentation
Developer Guide: See DEVELOPER_GUIDE.md for a comprehensive guide on where to make changes (plotting, variables, histograms, regions, etc.)

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built on top of the Coffea framework
Uses correctionlib for scale factor corrections
Inspired by CMS analysis workflows
Plotting style follows CMS figure guidelines using mplhep

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
condorJobs		condorJobs
configs		configs
darkbottomline		darkbottomline
docs		docs
input		input
notebooks		notebooks
scripts		scripts
tests		tests
.gitignore		.gitignore
DEVELOPER_GUIDE.md		DEVELOPER_GUIDE.md
README.md		README.md
__init__.py		__init__.py
check_requirements.py		check_requirements.py
code_modifications_summary.txt		code_modifications_summary.txt
install_lxplus.sh		install_lxplus.sh
install_requirements.sh		install_requirements.sh
requirements.txt		requirements.txt
run_analysis.py		run_analysis.py
setup.py		setup.py
setup_environment.sh		setup_environment.sh
start.sh		start.sh
test_installation.py		test_installation.py

tiwariPC/DarkBottomLine

Folders and files

Latest commit

History

Repository files navigation

DarkBottomLine Framework

Overview

Features

Installation

Locally

Prerequisites

Setup

Lxplus

Environment Setup (Lxplus)

First-time Installation (Automatic Setup)

Using DarkBottomLine in Future Sessions

Condor Setup

Quick Start

Full Analysis Workflow

Step 1: Run Multi-Region Analysis

Step 2: Generate Plots

Complete Workflow Example

Basic Usage (Simple Analysis)

Command Line Options

Example with Different Executors

Configuration

Region Definitions

Configuration Structure

Region Structure

Categories

Control Region Definitions

Framework Components

1. Object Selection (darkbottomline/objects.py)

2. Region Management (darkbottomline/regions.py)

3. Multi-Region Analyzer (darkbottomline/analyzer.py)

4. Plotting (darkbottomline/plotting.py)

5. Histograms (darkbottomline/histograms.py)

6. Weights (darkbottomline/weights.py)

Output Formats

Parquet Output

ROOT Output

Pickle Output

Output Structure

Analysis Results

Plot Output Structure

Validation

Extension Guide

Adding New Years

Adding New Corrections

Adding New Histograms

Dependencies

Troubleshooting

Common Issues

Debug Mode

Validation

Complete Analysis Example

Documentation

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

1. Object Selection (`darkbottomline/objects.py`)

2. Region Management (`darkbottomline/regions.py`)

3. Multi-Region Analyzer (`darkbottomline/analyzer.py`)

4. Plotting (`darkbottomline/plotting.py`)

5. Histograms (`darkbottomline/histograms.py`)

6. Weights (`darkbottomline/weights.py`)

Packages