Thoughts
What are tokens in the context of Large Language Models?
Computers have come way ahead in life. But one thing fundamentally remains true even today - “They only understand 0 or 1“. The two bits that run the whole tech world. Isn’t it crazy? That is why computers don’t really understand words or characters or language directly. Yes, the last request you made to ChatGPT or Claude or any other LLM has a lot more baked in the background to make it understand what you really want to tell it. This is where tokens come into the picture. Tokens are nothing but “numbers“ that a computer can understand – bits. But wait, bits are way too long in representation, so bits are taken as bytes, and you could assume that each word is represented by a byte or simply a number ranging from 0 to 256. Though, in practice,, this goes quite far, and even more numbers are used to represent tokens. An average modern model as of today could have as many as 100k unique tokens (numbers, duh) representations to represent repeated word patterns in the text data LLMs are trained on. I will cover this in detail in a follow post. Till then, think about it!