What is AI, LLM, and Transformers?

The Big Picture: AI vs ML vs DL vs GenAI

Before diving into complex engineering, you must understand where Large Language Models sit in the broader landscape:

AI: The overarching goal of creating intelligent machines.
ML: A subset where computers learn from data without explicit programming (e.g., predicting house prices).
DL: A subset of ML using deep neural networks modeled after the human brain (e.g., image recognition).
Generative AI: A subset of DL capable of generating new content (text, images, audio). LLMs (Large Language Models) are GenAI models specifically trained on massive amounts of text.

What exactly is a Large Language Model (LLM)?

At its core, an LLM is a giant autocomplete engine. It is a deep neural network trained to predict the next word (or token) in a sequence.

When you ask ChatGPT "What is the capital of France?", it analyzes those words and calculates that the most statistically probable next words are "Paris is the...". It generates text one piece at a time based on patterns learned from terabytes of internet data.

The Transformer: The Engine of Modern AI

Before 2017, AI struggled with long text because it processed words sequentially (like reading a sentence one word at a time). Then Google researchers published a paper called "Attention Is All You Need", introducing the Transformer architecture.

The Magic of Self-Attention

The breakthrough of the Transformer was Self-Attention. Instead of reading sequentially, it looks at all words in a sentence simultaneously to understand their relationships.

In the sentence "The bank of the river" vs "I deposited money in the bank", self-attention allows the model to look at the surrounding words ("river" vs "deposited") to instantly understand which meaning of "bank" is intended.

How LLMs "Read": Tokens and Embeddings

Language models don't actually process letters or words. They use math. Here is how text turns into math:

Tokens: Text is chopped into chunks called tokens. A token can be a whole word ("apple") or just a syllable ("un", "believ", "able"). Roughly, 100 tokens = 75 input words.
Embeddings: Once a word is a token, it is converted into a list of numbers (a high-dimensional vector) called an Embedding. This isn't just random math; numbers are plotted so that words with similar meanings (like "King" and "Queen") are mathematically close to each other in vector space.
Parameters/Weights: The interconnected decision nodes inside the network. A "70B" model has 70 billion variables it tweaked during training to know how those word numbers relate to each other.
Context Window: The short-term memory limit of the model (how many tokens it can read at once).

Interview Insight

Relevance

Medium - Foundational knowledge expected for all engineers entering the AI space.

LLM Foundations

Advanced Prompt Engineering

RAG & Vector Databases

Building AI Agents

AI Engineering Stack

Advanced RAG Engineering

LLM Inference Engineering

Fine-Tuning & Model Alignment

Context & Memory Management