Build A Large Language Model From Scratch Pdf < Plus >

Build A Large Language Model From Scratch Pdf < Plus >

This scales the logits before the softmax. $$ \textlogits_new = \frac\textlogitsT $$