# I Got Tired of Not Understanding AI Jargon. Here’s What Actually Happens When You Hit Enter.

There's a massive gap between knowing how to use AI and knowing what you're actually using. I lived comfortably in that gap for a long time. I could prompt ChatGPT with the best of them, write the cleverest system prompts, get the outputs I wanted. The machinery underneath all of it stayed a black box.

I got tired of nodding along to jargon I didn't actually understand. So I started digging for real. I practically moved into research papers and documentation, chasing the why behind every concept instead of just the what. I wanted to strip away the buzzwords and see the actual digital anatomy of what happens in the exact second you hit Enter.

If you've ever closed a tab out of frustration because an AI explanation went straight to academic, this is for you. What follows is the map of that journey, from someone who just used the thing to someone who finally understands the engine underneath it.

* * *

## Okay but why does this even exist

Before how it works, I needed why anyone built it. Computers have been around for seventy years. Search engines for thirty. What gap was actually missing that made someone build a thing that talks back?

A search engine gives you links. You type "how to center a div" and Google hands you ten of them, and you still have to read each one, decide which applies to your situation, and translate it yourself. The computer found information for you. It did not understand your question, and it definitely did not write an answer for you.

What people actually wanted was something that could read your specific question, follow an oddly phrased or incomplete sentence, and respond the way a knowledgeable person would, in your context, not someone else's blog post from 2015. That is a different problem entirely from "find documents containing these words." Search engines are librarians pointing at shelves. People wanted someone who had already read every book on the shelf and could just tell them the answer.

That gap, between finding information and understanding it, is the reason LLMs exist. Everything else here is the story of how engineers closed it.

* * *

## Where you're already using one of these

You've probably used an LLM today without thinking of it as AI. That gap didn't get closed by ChatGPT alone. It got closed everywhere.

Gmail finishing your sentence as you type is a small language model predicting your next words. Autocorrect guessing your next word on your phone is a tiny cousin of the same idea. The customer support chat bubble on a shopping site that actually answers instead of dumping a useless FAQ link is usually an LLM now. GitHub Copilot suggesting your next line of code while you type is a language model trained on code instead of English. Even Google sometimes shows an AI-written summary above the search results, generated, not copied from a page.

The point isn't to impress you with how widespread this is. It's that an LLM is not one product called ChatGPT. It's a type of system, and ChatGPT is just the most famous house built on it. Once you understand the engine, you start recognising it everywhere.

* * *

## So what actually is the thing

Here's the least textbook way I can put it: an LLM is a system that has read an enormous amount of text and, from that, learned to guess what word is statistically likely to come next, given the words before it.

That sounds insultingly simple. Surely something that writes working code and explains quantum physics is doing more than guessing the next word? Hold that thought. It's the single most important and most misunderstood fact about how these systems work, and the rest of this post is earning the answer to why "guess the next word" turns out to be enough.

For now, sit with the claim. Everything ChatGPT writes, character by character, is the model predicting one next piece of text at a time, over and over, until it decides to stop.

* * *

## What happens between you pressing Enter and an answer appearing

Let's actually trace this, because most explanations skip straight past it.

![](https://cdn.hashnode.com/uploads/covers/678b775e773554ab7117f20a/936757ec-8603-4ab5-84cf-87a53f893c7b.png align="center")

Notice the loop in the middle. The model isn't writing the whole answer in one go, the way you'd draft an email and hit send. It produces the response piece by piece, and each new piece depends on everything written so far, including what it just generated itself. That's why text sometimes streams in word by word on screen. Not a fake typing animation. That's literally the speed at which each next piece gets computed.

This detail explains something that confuses a lot of people early on: why ChatGPT occasionally contradicts itself mid-answer, or trails off oddly near the end of a long response. It isn't pulling a pre-written answer off a shelf. It's composing one piece at a time, never fully knowing in advance where the sentence will land.

Which led me to my next question.

* * *

## Wait, it's not searching the internet for my answer?

I genuinely believed this for longer than I want to admit, that ChatGPT was quietly googling things and rephrasing what it found. It isn't, and the distinction matters.

Ask a search engine a question and it goes out, fetches live documents, and hands you pieces of them. Ask ChatGPT and nothing goes out to the internet by default. It generates a response from patterns it learned during training, which happened months or years before your conversation started. No live lookup, unless the product specifically gives the model a browsing tool, which is a separate feature bolted on top, not how the core model works.

This is also why an LLM can be confidently wrong. It isn't lying and it isn't pulling from a bad source. It's doing exactly what it always does, predicting the most statistically likely next words, and sometimes the most fluent-sounding answer is simply incorrect. People call this hallucination. Once you know the model is generating rather than retrieving, hallucination stops being mysterious. A system built to compose fluent text will occasionally compose something fluent and wrong, because it was never checking a fact. It was completing a pattern.

So the real question becomes: how does a machine learn patterns from text well enough to write fluent, mostly accurate sentences at all? For that we need to back up further.

* * *

## Why can't a computer just understand English the way you do

Here's a question that sounds almost silly until you try to answer it: why does a computer need anything special to process language? You read this sentence and understand it instantly. Why is that hard for a machine?

A computer doesn't "read" in any sense close to what you do. At the most basic level it does one thing: move electrical signals through circuits representing numbers. That's genuinely it. A processor has no concept of "the," no innate sense of grammar, and has never felt anything a word like "lonely" describes. Every operation a CPU performs is arithmetic on numbers, full stop.

So before a computer can do anything with your sentence, that sentence has to become numbers. Not figuratively, literally. The letter "A" has been the number 65 since long before LLMs existed. That's just how text has been stored on computers since a standard called ASCII, from the 1960s.

That fact, that everything inside a computer is ultimately numbers, isn't trivia here. It's the whole reason the rest of this story unfolds the way it does. If a computer can only do arithmetic, then "understanding language" has to somehow become "doing arithmetic on numbers that represent language." Everything from here is the story of how that translation works.

* * *

## So just turn each word into a number?

This was my instinct, and it seems like the obvious fix. Give every word a number. Cat is 1, dog is 2, the is 3, and so on for every word in English. Problem solved?

Try it for thirty seconds and the cracks show.

English has an enormous, constantly growing number of words. "Rizz" wasn't a word five years ago. Do you stop and renumber the whole system every time language shifts? Words also change shape constantly. Run, running, ran, and runner are clearly related, but as four separate numbers they'd look completely unrelated to a model, like four strangers instead of one family. And the real problem: what happens with a word the model has genuinely never seen, a typo, a made-up brand name, a word from another language dropped into an English sentence? A whole-word system just breaks. There's no number for it.

What you actually want is something smaller than a word but bigger than a single letter, flexible enough to represent any word, including ones nobody has typed before, by building it out of reusable pieces. That unit is called a **token**. This is the point where the word finally earns its place, because now you can feel why it had to exist.

* * *

## What a token actually looks like

A token is a chunk of text, sometimes a whole word, sometimes part of one, sometimes a single character or even a space, that the model treats as one unit. Breaking text into these chunks is tokenization.

Here's the part that surprised me: tokens and words aren't the same thing, and they often don't line up the way you'd guess.

```plaintext
"I love debugging."        → ["I", " love", " debug", "ging", "."]
"unbelievable"              → ["un", "believ", "able"]
"ChatGPT"                   → ["Chat", "G", "PT"]
"hello"                     → ["hello"]
```

"Debugging" split into "debug" and "ging." Not random. It means the model already has a reusable chunk for "debug" from seeing it constantly in programming text, and can attach different endings without needing separate numbers for debugging, debugged, and debugger as three unrelated mystery words. A short common word like "hello" often stays whole, since splitting it buys nothing. A rare or made-up word gets broken into smaller, more familiar pieces so the model can still make sense of something it has technically never seen whole.

This is also why ChatGPT and similar tools measure usage in tokens rather than words. A token really is the smallest unit the model deals with, not the word, and a rough rule of thumb is one token per four characters of English text.

Once text is tokenized, each unique token gets a number, the same way "cat" almost got one a few sections ago, except now the units are flexible enough to cover essentially any text you could type.

![](https://cdn.hashnode.com/uploads/covers/678b775e773554ab7117f20a/df520de0-d550-444a-acc1-ee488c95e329.png align="center")

That solves the first problem. Text becomes numbers, in flexible chunks, without needing a dictionary entry for every word that has ever existed or will ever exist. But solving it opens a much harder one, and this is the question that actually stopped me for a day.

* * *

## Once everything is just numbers, how does the model know what those numbers mean?

This is the crux of the whole thing, worth sitting with before jumping to the answer.

Say "debug" became token number 25033. That number by itself means nothing. It's a label, the way your locker number in school told you nothing about what was inside. 25033 doesn't look "more like a verb" than 4. It doesn't sit "close" to the token for "fix" or "repair" the way the words themselves feel related to you. As plain numbers, 25033 and 4 are exactly as unrelated as 25033 and 99999.

But the model clearly does treat "debug" and "fix" as related, because if you ask it to fix a bug it doesn't respond like you asked about gardening. So something has to let the model capture meaning and relationships between tokens, not just their raw ID numbers.

The answer: each token doesn't get one number, it gets a whole list of numbers, sometimes hundreds, called a vector. Think of each number as some abstract quality of meaning, learned automatically during training, not designed by a human. One number might loosely track how technical a word feels, another how positive or negative it reads, others with no clean human name at all, just patterns the model found useful. Tokens with similar meanings end up with similar lists of numbers, the way two people who like similar things end up answering a personality quiz with similar scores.

That's genuinely clever, because "closeness in meaning" becomes "closeness in numbers," and closeness in numbers is something a computer can actually compute.

But it still doesn't fully answer the question. Take the word "bank." In "I deposited money at the bank," it means a financial institution. In "we sat by the river bank," it means the edge of a river. Same token, wildly different meaning, entirely dependent on the words around it. If every token had one fixed list of numbers regardless of context, the model would treat both sentences as if they were about the same thing.

So the real requirement is sharper than I first thought: the model needs a way to look at a token and adjust its understanding of that token based on every other token around it, dynamically, every single sentence, every single time. That's a genuinely hard computational problem. How it finally got solved is the actual breakthrough this whole post has been building toward.

* * *

## The idea that changed everything: paying attention to the right words

Before 2017, language models existed, but they read text basically one word at a time, in order, like reading through a narrow tube, trying to remember everything important from earlier words as they moved forward. The further back a relevant word sat, the more likely the model was to simply forget it mattered. Long sentences and long documents were genuinely hard for these older systems, the same way you might lose the start of a long sentence by the time you reach its end.

In 2017, a group of researchers published a paper with a bluntly literal title: "Attention Is All You Need." The idea inside it is called the **Transformer**, and it's the architecture nearly every modern LLM, GPT included, the T literally stands for Transformer, is built on.

Stripped of the math, the core idea is this: instead of reading a sentence strictly in order and trying to remember everything, let every word look directly at every other word at once, and learn how much attention to pay to each one when figuring out its own meaning.

Back to "bank." In a Transformer, while processing the token "bank," the model doesn't just look at the word right before it. It looks at every word in the sentence at once and learns to weigh them. In "I deposited money at the bank," "bank" pays heavy attention to "deposited" and "money," and that pulls its meaning toward financial institution. In "we sat by the river bank," the same token pays heavy attention to "river" and "sat," pulling its meaning toward the edge of water. Same starting token, same starting numbers, but the surrounding context reshapes what those numbers mean in that specific sentence. This mechanism is called **self-attention**, because each token attends to the other tokens to update its own understanding of itself.

![](https://cdn.hashnode.com/uploads/covers/678b775e773554ab7117f20a/88c07e75-19ec-4d60-8d5c-71ad5689b957.png align="center")

This is also why Transformers scaled so well compared to older approaches. Since every word can look at every other word at once instead of one after another in sequence, the computation runs in parallel, which happens to be exactly what GPUs are good at. Not a small detail. It's a big part of why we suddenly got models trained on internet-scale text in the last several years. The architecture and the hardware turned out to be a near-perfect match right as enough computing power became affordable.

So here's where the pieces connect. Your text becomes tokens. Tokens become numbers. Those numbers get reshaped by self-attention based on context, layer after layer, dozens of times, each layer refining what every token means given everything around it. After all those layers, the model has a rich, context-aware numerical picture of your entire prompt, and from that picture it predicts the single most statistically likely next token. Then it does the whole thing again with that new token added, and again, until it decides the response is done.

* * *

## One more thing: why the same question gets different answers

If the model predicts the most likely next word, why does asking ChatGPT the same question twice sometimes give two different answers?

That comes down to a setting called **temperature**, which controls how much randomness is allowed into the prediction step. The model doesn't output one single next token, it calculates a probability for many possible next tokens. At low temperature, it almost always picks the highest-probability option, which makes responses focused, predictable, and repetitive across attempts. At high temperature, it occasionally picks a lower-probability but still reasonable option, which makes responses more varied and creative, at the cost of consistency.  

![](https://cdn.hashnode.com/uploads/covers/678b775e773554ab7117f20a/a9190d10-8b86-4172-b32b-70516c2f438a.png align="center")

That's also why creative writing tools tend to run hotter, and tools meant for code or factual answers run cooler. You want a SQL query generator boring and consistent, not creatively wrong.

There's also a cap on how much text the model can consider at once, called the **context window**. Every token in your conversation, your prompt, the model's earlier replies, anything pasted in, counts against this limit. Once a conversation grows past it, the earliest parts start dropping from what the model can actually see, which is why a long chat session can feel like the model "forgot" something you said earlier. It didn't forget in any human sense. It just ran out of room to keep looking at it.  

![](https://cdn.hashnode.com/uploads/covers/678b775e773554ab7117f20a/151ea4a8-7cb1-43d3-8eb7-54952a268d22.png align="center")

* * *

## Putting the whole journey together

Let me walk back through this once, now that every piece has a name, because seeing the full chain in one place is satisfying after building it up bit by bit.

You type a question. That text breaks into tokens, small reusable chunks rather than whole words. Each token becomes a list of numbers representing its meaning. As the model processes your prompt, self-attention lets every token look at every other token and reshape its meaning based on context, layer after layer. After enough of these layers, the model has a deep, context-aware numerical picture of what you actually asked. From that picture it predicts the single most likely next token, adds it to the response, and repeats the whole process, one token at a time, until the response is complete.

![](https://cdn.hashnode.com/uploads/covers/678b775e773554ab7117f20a/9a78d93f-c85b-49bf-b7b1-5f40de00aca9.png align="center")

None of this required the model to understand your question the way you do. There's no inner voice, no comprehension in the human sense. What's there is an extraordinarily refined statistical pattern, built from reading more text than any human could read in a thousand lifetimes, good enough at predicting plausible next words that the result reads as understanding to us. Whether that distinction matters philosophically is a longer conversation than this post. But knowing exactly where the trick lives, in tokens, numbers, attention, and repeated prediction, turns "AI is magic" into "AI is a specific, learnable system." That's a far more useful place to stand once you start building things with these tools.

It still feels a little strange to me that something this mechanical produces something that reads this fluently. But it stopped feeling like magic the moment I traced every step, and I suspect that's true of most things that look like magic from the outside.

* * *

## References

*   [Attention Is All You Need (the original Transformer paper, Vaswani et al., 2017)](https://arxiv.org/abs/1706.03762)
    
*   [OpenAI Tokenizer tool, see how your own text gets split into tokens](https://platform.openai.com/tokenizer)
    
*   [Jay Alammar: The Illustrated Transformer](https://jalammar.github.io/illustrated-transformer/)
    
*   [Jay Alammar: The Illustrated GPT-2](https://jalammar.github.io/illustrated-gpt2/)
    
*   [Hugging Face: Tokenizers documentation](https://huggingface.co/docs/tokenizers/index)
    
*   [Anthropic: How large language models work, an overview](https://www.anthropic.com/research)
    
*   [Google Cloud: What is a large language model](https://cloud.google.com/discover/what-is-a-large-language-model)
