Very intuitive and simple, @karpathy! I was coding in parallel while watching your video and took a slightly different approach.
- For computing gradients of
+ and *, I used the fundamental derivative formula:
$$ L = \lim_{h \to 0} \frac{f(a+h) - f(a)}{h} $$
- Instead of using topological sorting for backpropagation, I implemented a recursive approach, where each parent node checks its child nodes and calculates gradients accordingly. While this method is probably less efficient—as it can recompute gradients for child nodes multiple times when gradients flow from multiple paths—it still serves as a valid alternative that produces the same results.
Link to my repo:
https://github.com/wahabaftab/micrograd/
Very intuitive and simple, @karpathy! I was coding in parallel while watching your video and took a slightly different approach.
+and*, I used the fundamental derivative formula:Link to my repo:
https://github.com/wahabaftab/micrograd/