INterpretability Interchange Format (INIF)

This package is in early development and the API may change without deprecation. Feedback and contributions are very welcome!

The INterpretability Interchange Format (INIF) is a JSON-based format for tokenized LLM generation traces with support for contiguous token annotations, position selection, and efficient storage of interpretability outputs.

Designed as the interchange layer between generation and evaluation frameworks (e.g. Inspect AI) and interpretability tools in the NDIF ecosystem (nnsight, nnterp and workbench).

Installation

pip install inif

With Inspect AI converter support:

pip install "inif[inspect]"

Quick start

From text

from inif.converters.text import from_texts

doc = from_texts(
    ["The capital of France is Paris.", "Hello world!"],
    tokenizer="gpt2",
)
doc.save("traces.inif.json")

From Inspect AI eval logs

from inif.converters.inspect_ai import from_eval_file

doc = from_eval_file("logs/my_eval.eval")

Viewing

from inif import InifDocument

doc = InifDocument.load("traces.inif.json")
doc.show()                       # in Jupyter
doc.save_html("traces.html")     # self-contained HTML

CLI

# Convert text files to inif
inif convert txt input.txt -m gpt2

# Convert Inspect AI eval logs
inif convert eval logs/my_eval.eval

# View as interactive HTML in the browser
inif view traces.inif.json

Format overview

An .inif.json file contains:

InifDocument
├── metadata          — model info, source eval, packages, timestamps
├── sequences[]       — deduplicated token patterns shared across samples
└── samples[]         — tokenized generation traces
    ├── tokens[]      — token id + string, plus sparse extras (logprob, logit lens, probes, ...)
    ├── annotations[] — named token ranges with optional metadata
    ├── texts[]       — named text segments ({name, value, start, end, children, metadata})
    ├── spans[]       — named position ranges
    └── scores[]      — evaluation scores (scorer, value, answer)

Token convention: each TokenOrSeqRef has token: str and an optional id: int. Vocabulary tokens use the integer id; sequence references have id is None and the token field carries the target Sequence.id.

Annotations: repeated labels such as chat roles, generated output, reasoning traces, and regex matches live in Sample.annotations as named half-open ranges. This avoids repeating "role": "assistant" or "tags": [...] on every token in a long contiguous region.

Texts: Sample.texts is a list of Text objects ({name, value, start, end, children, metadata}). Each entry covers one chat message (with role-based names — "system_0", "user_0", "assistant_0", "user_1", …, system prompt included) or one plain-text input ("text_0", …). Token offsets locate the message inside tokens; assistant turns can carry children for the reasoning / content / tool-call sub-sections.

Extensible tokens: sparse per-token values such as logprobs and interpretability outputs (logit lens, probes, etc.) are stored as token extras.

Use .inif.json for plain JSON and .inif for the indexed archive format. The archive keeps per-sample text previews in the manifest and stores each full sample payload as a separate compressed member, so callers can browse summaries without inflating token dictionaries.

The same unified read API works on both formats — pass a path with either suffix and the reader dispatches to the indexed-archive path or falls back to a full load:

from inif import (
    InifDocument,
    IndexedInifWriter,
    iter_samples,
    read_info,
    read_samples,
)

doc.save("traces.inif")                                   # indexed archive
doc.save("traces.inif.json")                              # plain JSON

info = read_info("traces.inif")                           # metadata + per-sample summaries
sample = read_samples("traces.inif", "sample_42")[0]      # single id
subset = read_samples("traces.inif", ["sample_1", "sample_7"])

for sample in iter_samples("traces.inif.json"):           # works for both
    ...

with IndexedInifWriter("streaming.inif", doc.metadata, doc.sequences) as writer:
    for sample in doc.samples:
        writer.write_sample(sample)
        writer.flush()  # make the partial archive readable

The indexed archive stores metadata and sequences once, then stores each sample as a separate compressed member with an uncompressed preview summary. This supports incremental writes, header-only reads, per-sample random access, and streaming iteration while preserving the same InifDocument model.

Key features

Annotation

All tagging is exposed as methods on Sample (single-sample) and InifDocument (whole-document fan-out). The two surfaces share names so the receiver disambiguates the scope.

# Annotate every matching token across the whole document
doc.tag_by_regex(r"^\d+$", "number")

# Annotate by concatenated text (multi-token matches) on one sample
sample.tag_by_text_regex(r"Paris", "city")

# Convert an annotation into a named span on a sample
sample.create_span_from_tag("city", "answer_span")

Selection

selection = sample.select_by_annotation("number")
selection = sample.select_by_span("answer_span")
selection = sample.select_by_position(slice(5, 10))

Sequence deduplication

Common token sequences across samples (e.g. shared system prompts) are automatically deduplicated via set-intersection and stored as Sequence objects referenced by tokens.

deduped = doc.deduplicate_sequences()             # default min_length=5
flat    = deduped.expand_sequences()              # flatten back

Interactive HTML viewer

InifDocument.save_html / InifDocument.show produce a self-contained HTML page with:

Collapsible sidebar with sample list and pass/fail indicators
Per-message panels driven by Sample.texts (with reasoning / content / tool-call sub-sections for assistant turns)
Token-level display with hover tooltips showing all extra fields
Toggleable annotation highlighting with color legends
Span border annotations and extra-field underline indicators
Newline-aware token wrapping

Development

make dev              # install dev environment
make test             # run tests
make format           # ruff format
make lint             # ruff check
make typecheck        # ty check
make schema           # regenerate JSON schema

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
docs		docs
examples		examples
inif		inif
schemas		schemas
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

INterpretability Interchange Format (INIF)

Installation

Quick start

From text

From Inspect AI eval logs

Viewing

CLI

Format overview

Key features

Annotation

Selection

Sequence deduplication

Interactive HTML viewer

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

INterpretability Interchange Format (INIF)

Installation

Quick start

From text

From Inspect AI eval logs

Viewing

CLI

Format overview

Key features

Annotation

Selection

Sequence deduplication

Interactive HTML viewer

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages