Article 01 · Foundations

What is a Neural Network?

The phrase gets thrown around constantly. Here is what is actually happening inside, and where the most common analogy falls apart.

Not magic, just maths

Before you can understand transformers, attention, or any of the machinery that makes modern AI work, you need one thing: an honest picture of what a neural network actually is. Not what it feels like to use one. What it literally does.

A neural network is a function that maps inputs to outputs by passing numbers through layers of simple mathematical operations, where the behaviour of those operations is shaped by billions of learned parameters called weights.

That sentence is precise but not yet useful. Here is how to build up to it.

The analogy: a panel of judges

Imagine you are trying to decide whether a photo contains a cat. You assemble a panel of judges, say five of them. Each judge looks at the photo and gives a score between 0 and 1 representing their confidence. Each judge pays attention to a different thing. One notices whiskers. One notices pointed ears. One notices the general shape of a sitting animal.

You then combine their scores, weighting some judges more heavily than others based on how reliable they have proven to be, and produce a final verdict.

Now imagine instead of five judges, you have millions. And instead of one panel, you have dozens of panels stacked on top of each other. The first panel looks at raw pixels, the second looks at the outputs of the first (edges, shapes), the third looks at the outputs of the second (textures, parts), and so on. By the time you reach the final layer, the network has built a rich, abstract representation of the image from the ground up.

That is a neural network. The "judges" are neurons. Their individual sensitivities are weights. The stacked panels are layers.

neural_network_layers.diagram

Input

x₁

x₂

x₃

x₄

Hidden 1

h₁

h₂

h₃

h₄

h₅

Hidden 2

h₁

h₂

h₃

h₄

Output

0.02

0.91

Highlighted neurons fired strongly for this input. Output: 91% cat, 2% not-cat.

What a weight actually is

Every connection between neurons has a weight, a single number that determines how much one neuron's output influences the next neuron's input. A high positive weight means: if this fires, strongly push the next one to fire too. A negative weight means: if this fires, suppress the next one. A weight near zero means: this connection barely matters.

A large modern neural network might have tens of billions of these weights. The entire character of the network, what it knows, what it is good at, how it behaves, lives in those numbers.

weights_example.txt

Connection	Weight	Meaning
whisker-detector → cat-output	+2.8	Strong positive signal
round-ear-detector → cat-output	+1.9	Positive signal
beak-detector → cat-output	−3.1	Strong suppressor
fur-texture → cat-output	+0.4	Weak — not decisive

These weights are not hand-coded. Nobody sat down and decided that whiskers should score +2.8. The network learned those values by being shown millions of labelled examples and gradually adjusting every weight to reduce its mistakes. That process is called training, and it is the subject of the next article.

The activation function: giving neurons opinions

Each neuron does not just pass its input straight through. It applies a small function called an activation function that introduces non-linearity. The most common one today is called ReLU: if the input is negative, output zero. If it is positive, pass it through unchanged.

Without non-linearity, stacking dozens of layers would be mathematically equivalent to having just one layer. Non-linear activations are what let deep networks learn curves, corners, and complexity.

The key insight: A neural network isn't magic. It's a very large composition of very simple operations — multiply by a weight, add a bias, apply a threshold — repeated millions of times. Intelligence emerges from scale and the right training signal, not from any single clever component.

Where the analogy breaks down

⚠ Analogy limitations

Neurons do not "pay attention" to specific things by design. Real neurons in a trained network activate in response to patterns that often have no clean human-interpretable label like "whiskers." The features a network learns are frequently distributed, abstract, and hard to describe.

Neurons do not work independently either. The analogy implies discrete, separable opinions. In reality, meaning is distributed across many neurons simultaneously. No single neuron reliably represents a single concept.

Biological neurons are nothing like this. The name "neural network" is a loose analogy to the brain. These are not simulations of the brain. They are mathematical functions inspired, very loosely, by its architecture.

What this means in practice

When you interact with GPT-4, Claude, or any modern language model, you are talking to a neural network with hundreds of billions of weights arranged in a specific architecture called a transformer. All of its knowledge, about history, code, language, reasoning, lives in those numbers. There is no database being queried. No rules being followed. Just a very large, very well-trained function mapping your input to a probable output.

That is the foundation. Everything else is built on top of this.

How a Network Learns: Backprop & Gradient Descent