Anthony Lewallen LewallenAE

Anthony Lewallen

AI Research Engineer — Python • Systems • EvalOps/RLHF Tooling

I build RLHF evaluation systems, multi-agent orchestration workflows, and production-minded tooling for LLM post-training.

My Typical Weekend

Friday: "I should learn RLHF infrastructure"

Sunday night:

✅ 6 data quality detectors running on 160K preference pairs
✅ PostgreSQL pipeline storing every signal
✅ Trained two reward models (clean vs unfiltered data)
✅ Orchestrated 6 LLM agents in parallel to build an RLHF data-quality pipeline
✅ Started a GPT implementation from scratch

I don't do tutorials. I ship.

Featured Projects

Project	What It Does	Stack
RLHF Data Quality System	Detects problematic preference pairs in RLHF training data. Found 12,693 flagged examples (7.9%) in Anthropic's HH-RLHF dataset.	PyTorch, PostgreSQL, sentence-transformers
GPT From Scratch	Transformer implementation from bigram → attention → CUDA kernels. Training on War and Peace, not TinyShakespeare.	PyTorch, CUDA
Multi-Agent Orchestration	First iteration: A multi-agent coding system using two Claude agents plus Gemini and Codex to build production code in parallel with shared contracts and coordination protocols. I designed the orchestration and approval loop.	Claude, Gemini, Codex API, Python

Open Source Contributions

Project	Description	Link
NLWeb (Microsoft Open Source)	Identified an explicitly documented CI/CD gap in NLWeb and implemented the pipeline (Ruff linting, mypy checks, pytest matrix, Docker validation, Dependabot automation).	PR #397

Background

3 years as an RLHF contractor. Selected into Alignerr’s elite “Alignerrd” group of top-tier programmers after standout evaluation performance. I’ve created rubrics, graded models, and seen exactly how preference data breaks.

Now I build systems to catch those problems automatically.

Tech Stack

Category	Tools
Languages	Python, TypeScript/JavaScript, Java, SQL
ML/LLM	PyTorch, Hugging Face, sentence-transformers
Backend	FastAPI, REST APIs, PostgreSQL, SQLite
Infra	Docker, GitHub Actions, Google Colab, Linux

How I Build

Ship first, polish later
Interfaces + invariants before implementation
Tests that prove behavior
Logging/metrics as first-class citizens
If it takes more than a weekend, break it down

Open To

Research Engineer • Applied Evals • EvalOps • Data Quality Engineering • ML Systems

Email: anthonylewallen.dev@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly