The Tokenizer: How AI Reads
You type “Apple”. The AI doesn’t see “Apple.” It sees
[23405].
To a computer, words are just numbers. Before an AI can understand your prompt, it must chop it up into bite-sized pieces called Tokens.
In this guide, we will look at the Atomic Structure of language models.
1. What is a Token?
A token is not always a word. It can be part of a word. * “Cat” = 1 Token. * “Transformation” = 2 Tokens (“Transform” + “ation”). * “12345” = 3 Tokens (“12” + “34” + “5”).
2. Why It Matters
- Cost: You pay per token.
- Math: AI is bad at math because it sees numbers as tokens, not values. It sees “100” and “1” as just symbols, not quantities.
- Memory: The “Context Window” is limited by tokens.
3. The “Glitch” Words
Some words are notoriously hard for AI to spell or reverse (like “Lollipop”) because of how they are tokenized. The AI sees the whole chunk, not the individual letters “L-o-l-l-i…”.
4. Visualizing the Slice
Look at the Token Slicer on the right.
Type a sentence. Watch how the machine chops it up. Common words remain whole. Rare words get pulverized. This is the first step of the AI’s digestion process.
Turn Up the Heat
Now that you know how it reads, let’s learn how it decides what to write next. Control the chaos in: Temperature Check.