Skip to main content

One post tagged with "transformer"

View All Tags

LLM basics from scratch

· 14 min read
Ordinary Magician | Half stack developer


The main purpose of this article is to use basic self-attention blocks to build a simple large language model for learning purposes. Due to limitations in model scale and embedding methods, the model built in this article will not be very effective, but this does not affect the ability to learn various basic concepts of language models similar to Transformer through the code provided in this article.

What happens on this page:

  • get full code of a basic Large(?) Language Model (data preparation, model architecture, model training and predicting)
  • understand the general architecture of transformer with a few illustrations
  • understand how self regressive training works
  • understand how to load very large text dataset into limited memory (OpenWebTextCorpus)
  • train and observe the training procedure
  • load the trained model into a simple ask-and-answer interactive script

Get full code

Code available at


Download the code via git clone before continue.