Rather than skimming yet another abstract explanation or relying on an AI tool to summarize it for me, I went hands-on: rebuilt self attention from scratch using just Excel, math, and a stubborn need to actually understand Attention
From tokenizing a simple sentence, to manually crafting the query, key, and value matrices, to computing attention weights and watching how focus shifts across words it’s a deceptively elegant process once you sit with it long enough.
It’s easy to take models like Transformers for granted when they're wrapped up in pre-trained APIs and high-level libraries. But peeling back the abstraction layer reminded me: these systems aren’t magic they’re clever math, stacked purposefully.