Skip to content
View LewallenAE's full-sized avatar
  • 17:18 (UTC -07:00)

Block or report LewallenAE

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
LewallenAE/README.md

Anthony Lewallen

AI Research Engineer — Python • Systems • EvalOps/RLHF Tooling

LinkedIn Email

I build RLHF evaluation systems, multi-agent orchestration workflows, and production-minded tooling for LLM post-training.


My Typical Weekend

Friday: "I should learn RLHF infrastructure"

Sunday night:

  • ✅ 6 data quality detectors running on 160K preference pairs
  • ✅ PostgreSQL pipeline storing every signal
  • ✅ Trained two reward models (clean vs unfiltered data)
  • ✅ Orchestrated 6 LLM agents in parallel to build an RLHF data-quality pipeline
  • ✅ Started a GPT implementation from scratch

I don't do tutorials. I ship.


Featured Projects

Project What It Does Stack
RLHF Data Quality System Detects problematic preference pairs in RLHF training data. Found 12,693 flagged examples (7.9%) in Anthropic's HH-RLHF dataset. PyTorch, PostgreSQL, sentence-transformers
GPT From Scratch Transformer implementation from bigram → attention → CUDA kernels. Training on War and Peace, not TinyShakespeare. PyTorch, CUDA
Multi-Agent Orchestration First iteration: A multi-agent coding system using two Claude agents plus Gemini and Codex to build production code in parallel with shared contracts and coordination protocols. I designed the orchestration and approval loop. Claude, Gemini, Codex API, Python

Open Source Contributions

Project Description Link
NLWeb (Microsoft Open Source) Identified an explicitly documented CI/CD gap in NLWeb and implemented the pipeline (Ruff linting, mypy checks, pytest matrix, Docker validation, Dependabot automation). PR #397

Background

3 years as an RLHF contractor. Selected into Alignerr’s elite “Alignerrd” group of top-tier programmers after standout evaluation performance. I’ve created rubrics, graded models, and seen exactly how preference data breaks.

Now I build systems to catch those problems automatically.


Tech Stack

Category Tools
Languages Python, TypeScript/JavaScript, Java, SQL
ML/LLM PyTorch, Hugging Face, sentence-transformers
Backend FastAPI, REST APIs, PostgreSQL, SQLite
Infra Docker, GitHub Actions, Google Colab, Linux

How I Build

  • Ship first, polish later
  • Interfaces + invariants before implementation
  • Tests that prove behavior
  • Logging/metrics as first-class citizens
  • If it takes more than a weekend, break it down

Open To

Research Engineer • Applied Evals • EvalOps • Data Quality Engineering • ML Systems


Email: anthonylewallen.dev@gmail.com

Pinned Loading

  1. LewallenAE LewallenAE Public

    AI Research Engineer | RLHF / EvalOps | Multi-Agent Systems | ML Systems

  2. rlhf-eval rlhf-eval Public

    End-to-end RLHF data quality evaluation harness — detects preference pair pathologies, filters training data, and measures downstream reward model impact

    Python

  3. generatively_pretrained_transformer generatively_pretrained_transformer Public

    Jupyter Notebook