Why Do Neural Networks Form Concepts At All?

Modern AI systems are usually described in terms of numbers. A neural network takes an input, runs it through layers of mathematical operations, and produces an output. At first, this sounds mechanical and lifeless, as if the model is only pushing numbers around without any real structure inside.

But when researchers look inside these systems, something more interesting appears. Neural networks often develop internal representations that seem to correspond to recognizable concepts. Some neurons respond strongly to edges or textures. Others activate for faces, objects, animals, or even more abstract features. In language models, words and ideas arrange themselves in high-dimensional spaces where related concepts sit near each other.

This raises a deeper question: why do neural networks form concepts at all? Are these concepts real, or are they patterns we project onto the model because we are looking for familiar structure?

From Data to Representation

A neural network does not begin with concepts. It starts with random numbers, called parameters, which are adjusted during training. The model is shown many examples, and over time, it changes those parameters to reduce error. In a vision model, that might mean learning to classify images correctly. In a language model, it might mean learning to predict the next word.

During this process, the model has to find useful structure in the data. If it treats every image, word, or sentence as completely separate, it cannot generalize. To perform well, it needs to notice similarities. Dogs share certain shapes and textures. Sentences about food share certain patterns. Mathematical explanations often have recurring structures.

Concepts emerge because they are useful compressions of the world. Instead of storing every example separately, the model builds internal patterns that group similar things together. In this sense, a concept is a kind of shortcut. It helps the system organize many examples under a smaller structure.

Concepts Without Definitions

Human concepts are often difficult to define precisely. Think about the word “game.” Some games involve competition, some do not. Some have rules, some are informal. Some involve physical movement, while others are purely mental. Philosopher Ludwig Wittgenstein used this example to argue that many concepts work through family resemblance rather than strict definitions.

Neural networks seem to behave in a similar way. They do not usually define a concept with a clean rule. Instead, they learn a region of similarity. A model may recognize cats not because it has a symbolic definition of “cat,” but because it has learned a pattern that connects many cat-like examples.

This makes neural network concepts feel less like dictionary entries and more like flexible clusters. They are fuzzy, context-dependent, and shaped by use. That may seem strange, but it also resembles how human concepts often work.

Are These Concepts Real?

There is still a philosophical problem. When we say a neural network has a concept of “dog” or “justice” or “number,” are we describing something genuinely inside the model, or are we interpreting the model through our own categories?

This is similar to debates in philosophy of mind. Daniel Dennett argued that we often understand systems by adopting the intentional stance, meaning we treat them as if they have beliefs, goals, or concepts when that helps us predict their behavior. From this view, saying a model “has a concept” may be a useful way of describing what it does, even if there is no human-like understanding inside.

A stricter view would say that a concept requires more than internal structure. It may require experience, embodiment, or the ability to use the concept across real-world situations. Under this view, a neural network may represent statistical patterns without truly possessing concepts.

The disagreement depends on what we think concepts are. If concepts are internal structures that support generalization, neural networks clearly form them. If concepts require meaning grounded in lived experience, then the answer becomes much less certain.

The Role of the Training World

Neural networks form concepts based on the world they are trained on. This matters because their concepts are not neutral. They reflect the data, labels, tasks, and objectives that shape them.

A model trained on internet text may learn associations that are useful for language prediction, but those associations also inherit cultural patterns, biases, and gaps from the data. A model trained on images may learn visual concepts that work well for common objects, while struggling with rare or unfamiliar cases.

This means neural network concepts are not pure discoveries. They are shaped by the environment of training. In a way, the model’s conceptual world is a compressed reflection of the data it has seen.

This also raises a question about human concepts. Our own ideas are shaped by language, culture, biology, and experience. Perhaps the difference between human and machine concepts is not that one is shaped and the other is pure, but that they are shaped by different kinds of worlds.

These feature visualizations from the InceptionV1 neural network reveal how artificial systems develop internal representations of shapes, textures, and objects while learning from data. Interestingly, some of these generated patterns resemble abstract or surrealist artwork, raising questions about whether concepts emerge naturally whenever a system learns to organize visual structure.

"The Psychedelicism of Feature Visualizations in InceptionV1", Carolin Lüübek, Jan 16 2023, Medium, https://medium.com/@lyybek.carolin/the-psychedelicism-of-feature-visualizations-in-inceptionv1-9e82fcba6c9b

Why This Matters

The question of neural network concepts matters because it changes how we interpret AI behavior. If models form real internal abstractions, then their outputs are not merely surface-level imitation. They involve structured representations that help the model generalize.

At the same time, concept formation does not automatically imply understanding. A system can organize information in useful ways while still lacking consciousness, intention, or lived experience. The existence of concepts inside a model does not settle the deeper question of whether the model knows what those concepts mean.

This distinction is important. Neural networks may form concepts because the world has structure, and because learning requires compressing that structure into usable representations. But whether those representations become meaning depends on what we think meaning requires.

Can Intelligence Emerge from Simplicity?

Recently, I joined BrightStar Labs ( https://brightstarlabs.ai/ ) as an affiliate researcher, where I’ve been working on a class of systems known as Emergent Models (EMs). They’re not built like neural networks or traditional programs. Instead, they evolve from very simple beginnings: just a line or grid of colored cells following a small set of update rules. Over time, patterns emerge. Some of those patterns perform tasks that resemble basic forms of computation. At first, EMs may seem abstract or even mechanical. But the more I work with them, the more they raise interesting questions. These questions aren't just about computation, but about how we understand concepts like memory, intelligence, and structure. Intelligence Without Intent? One of the main features of an EM is that it’s not designed in the usual sense. There's no optimizer telling it what to do, and no fixed model architecture. Instead, its behavior is shaped by initial conditions and a rule table that dete...

Reason & Reflection

Search This Blog