This package is in early development and the API may change without deprecation. Feedback and contributions are very welcome!
The INterpretability Interchange Format (INIF) is a JSON-based format for tokenized LLM generation traces with support for contiguous token annotations, position selection, and efficient storage of interpretability outputs.
Designed as the interchange layer between generation and evaluation frameworks (e.g. Inspect AI) and interpretability tools in the NDIF ecosystem (nnsight, nnterp and workbench).
pip install inifWith Inspect AI converter support:
pip install "inif[inspect]"from inif.converters.text import from_texts
doc = from_texts(
["The capital of France is Paris.", "Hello world!"],
tokenizer="gpt2",
)
doc.save("traces.inif.json")from inif.converters.inspect_ai import from_eval_file
doc = from_eval_file("logs/my_eval.eval")from inif import InifDocument
doc = InifDocument.load("traces.inif.json")
doc.show() # in Jupyter
doc.save_html("traces.html") # self-contained HTML# Convert text files to inif
inif convert txt input.txt -m gpt2
# Convert Inspect AI eval logs
inif convert eval logs/my_eval.eval
# View as interactive HTML in the browser
inif view traces.inif.jsonAn .inif.json file contains:
InifDocument
βββ metadata β model info, source eval, packages, timestamps
βββ sequences[] β deduplicated token patterns shared across samples
βββ samples[] β tokenized generation traces
βββ tokens[] β token id + string, plus sparse extras (logprob, logit lens, probes, ...)
βββ annotations[] β named token ranges with optional metadata
βββ texts[] β named text segments ({name, value, start, end, children, metadata})
βββ spans[] β named position ranges
βββ scores[] β evaluation scores (scorer, value, answer)
Token convention: each TokenOrSeqRef has token: str and an optional id: int. Vocabulary tokens use the integer id; sequence references have id is None and the token field carries the target Sequence.id.
Annotations: repeated labels such as chat roles, generated output, reasoning traces, and regex matches live in Sample.annotations as named half-open ranges. This avoids repeating "role": "assistant" or "tags": [...] on every token in a long contiguous region.
Texts: Sample.texts is a list of Text objects ({name, value, start, end, children, metadata}). Each entry covers one chat message (with role-based names β "system_0", "user_0", "assistant_0", "user_1", β¦, system prompt included) or one plain-text input ("text_0", β¦). Token offsets locate the message inside tokens; assistant turns can carry children for the reasoning / content / tool-call sub-sections.
Extensible tokens: sparse per-token values such as logprobs and interpretability outputs (logit lens, probes, etc.) are stored as token extras.
Use .inif.json for plain JSON and .inif for the indexed archive format. The
archive keeps per-sample text previews in the manifest and stores each full
sample payload as a separate compressed member, so callers can browse summaries
without inflating token dictionaries.
The same unified read API works on both formats β pass a path with either suffix and the reader dispatches to the indexed-archive path or falls back to a full load:
from inif import (
InifDocument,
IndexedInifWriter,
iter_samples,
read_info,
read_samples,
)
doc.save("traces.inif") # indexed archive
doc.save("traces.inif.json") # plain JSON
info = read_info("traces.inif") # metadata + per-sample summaries
sample = read_samples("traces.inif", "sample_42")[0] # single id
subset = read_samples("traces.inif", ["sample_1", "sample_7"])
for sample in iter_samples("traces.inif.json"): # works for both
...
with IndexedInifWriter("streaming.inif", doc.metadata, doc.sequences) as writer:
for sample in doc.samples:
writer.write_sample(sample)
writer.flush() # make the partial archive readableThe indexed archive stores metadata and sequences once, then stores each sample
as a separate compressed member with an uncompressed preview summary. This
supports incremental writes, header-only reads, per-sample random access, and
streaming iteration while preserving the same InifDocument model.
All tagging is exposed as methods on Sample (single-sample) and InifDocument (whole-document fan-out). The two surfaces share names so the receiver disambiguates the scope.
# Annotate every matching token across the whole document
doc.tag_by_regex(r"^\d+$", "number")
# Annotate by concatenated text (multi-token matches) on one sample
sample.tag_by_text_regex(r"Paris", "city")
# Convert an annotation into a named span on a sample
sample.create_span_from_tag("city", "answer_span")selection = sample.select_by_annotation("number")
selection = sample.select_by_span("answer_span")
selection = sample.select_by_position(slice(5, 10))Common token sequences across samples (e.g. shared system prompts) are automatically deduplicated via set-intersection and stored as Sequence objects referenced by tokens.
deduped = doc.deduplicate_sequences() # default min_length=5
flat = deduped.expand_sequences() # flatten backInifDocument.save_html / InifDocument.show produce a self-contained HTML page with:
- Collapsible sidebar with sample list and pass/fail indicators
- Per-message panels driven by
Sample.texts(with reasoning / content / tool-call sub-sections for assistant turns) - Token-level display with hover tooltips showing all extra fields
- Toggleable annotation highlighting with color legends
- Span border annotations and extra-field underline indicators
- Newline-aware token wrapping
make dev # install dev environment
make test # run tests
make format # ruff format
make lint # ruff check
make typecheck # ty check
make schema # regenerate JSON schemaMIT