Understanding Understanding

working draft synthesized

A metacognitive inquiry into understanding itself — what it is, how it is built, what makes it efficient, and what ultimately limits it. The literature is deep but vague and non-conclusive, because "understanding" sits awkwardly between knowledge (facts) and capability (action), and most disciplines study only the slice that fits their tools. The thread below is an attempt to consolidate those fragments into a single picture.

The more you understand, the more you understand that you don't understand.

This is not a paradox but a mechanism. If K is the set of understood concepts and U the unknown, learning enlarges the interface ∂K between them. As K grows, the number of adjacent unexplored questions grows with it, so ignorance becomes more visible, not less. Beginners see a small, seemingly complete map; intermediates hit contradictions and edge cases; experts perceive large networks of open problems. The same intuition recurs as Socrates ("I know that I know nothing"), the Dunning–Kruger effect, and Feynman's comfort with uncertainty.

A working definition

You understand something if you can build, manipulate, and apply a model of it across contexts.

Understanding, on this view, is not a static mental state but the possession of a manipulable internal model. Operationally it decomposes into five abilities:

AbilityTest
ExplanationExplain the phenomenon in your own words
PredictionAnticipate outcomes under new conditions
InterventionChange a variable and reason about the effect
CompressionRepresent the idea in simpler form without losing function
TransferApply it in a domain where it was never learned

A compact personal test follows from this: for any concept, ask (1) Can I derive it? (2) Can I modify it? (3) Can I apply it somewhere unexpected? If all three hold, the understanding is probably real. This aligns with Feynman's framing — you understand something when you can reconstruct and teach it — and with the mathematician's standard that you understand a theorem when you can recreate it without looking.

Levels of depth

Understanding is graded, not binary. A rough ladder:

LevelDescription
Recognition"I've seen this before"
RecallCan restate definitions or formulas
ProceduralCan apply the method when prompted
StructuralGrasps relationships between concepts
GenerativeCan derive new results and variations
TransferCan deploy the idea in an entirely different domain

Mastery generally begins around the structural → generative boundary, where knowledge stops being a list and becomes a network capable of producing new results.

Understanding as a process, not a state

The most durable structural insight is that understanding is a dynamical loop, not a fixed possession:

model → prediction → comparison with reality → error → revision → improved model

This same loop reappears, renamed, across nearly every field that studies adaptive systems:

DomainForm of the loop
Scientific methodhypothesis → experiment → revision
Bayesian inferenceprior → evidence → posterior
Machine learningmodel → loss → gradient update
Neuroscienceprediction → sensory input → prediction error → update
Reinforcement learningpolicy → reward → adjustment

Karl Friston's Free Energy Principle casts biological cognition as the continuous minimization of prediction error; the brain is, on this account, a prediction machine and understanding corresponds to better predictive models. Because the loops are structurally identical, independent frameworks keep converging on the same architecture — they are all describing a system that improves a model of reality through iterative error correction. Error is therefore essential, not incidental: progress happens precisely where predictions fail.

The consequence is a sharper definition:

Understanding is the capacity to construct and refine predictive models of reality through iterative, error-driven optimization.

And mastery is not measured by what you currently know, but by how quickly your models improve when confronted with new information.

The understanding stack

Knowledge can be arranged as a hierarchy where each level compresses and organizes the one below it:

reality → observations / data → patterns → concepts → models → predictions → theories

Lower levels ground knowledge in reality; higher levels provide explanatory power. Understanding emerges when the concept → model → prediction segment works reliably, and mastery is the ability to move fluidly up and down this stack — descending to check a theory against data, ascending to compress many observations into a single law.

Why understanding resists clean formalization

Concepts like mass, probability, and entropy became precise because each maps to a single measurable quantity. Understanding does not, for several compounding reasons:

The predictable result is a fragmented literature, where each discipline captures one face of the same phenomenon:

FieldFocus
Philosophy of scienceexplanation and causality (Hempel: explain via general laws)
Cognitive sciencemental representation (Marr's levels)
Expertise researchpattern recognition and chunking (Simon)
Machine learningmodel learning from data
Education theoryconceptual change
Neuroscienceneural mechanisms

The cognitive operations of understanding

Stripped of repetition, expert thinking across mathematics, physics, and cognitive science reduces to a small toolkit of operations applied recursively. This is the core consolidated set:

What separates experts from novices is not the operations themselves but richer internal models, faster model updating, and stronger abstraction. Simon's work showed expert knowledge is stored as large networks of structured "chunks" that enable rapid inference and transfer. The felt quality of expert intuition is largely this: an internalized map of a domain's structure that lets one jump directly toward promising regions.

A useful unifying image: deep understanding is the construction of internal simulators. A physicist runs a simulator of physical systems; an ML researcher, of learning dynamics; an engineer, of built systems. The better the simulator, the deeper the understanding.

Breakthrough moves: transformations of concept space

Major conceptual breakthroughs are not usually the product of more computation; they come from finding the right structure. A concept space is the set of all models capable of describing a system, and the breakthrough operations are transformations of that space that make useful models easier to locate.

OperationWhat it does to the space
Representation transformationMoves to a different coordinate system (Fourier: time → frequency; combinatorics → linear algebra)
DualityExposes a complementary description (primal ↔ dual via Lagrangian duality; position ↔ momentum)
Symmetry detectionIdentifies equivalent regions, removing redundancy (Noether: continuous symmetries ↔ conservation laws)
Dimensional reductionRemoves irrelevant degrees of freedom (PCA: keep directions of largest variance)
Constraint relaxationTemporarily widens the feasible region (integer → continuous optimization, then recover)
Limiting-case analysisSimplifies via extremes (learning rate → 0 gives gradient flow; width → ∞ gives the neural tangent kernel regime)
Compositional constructionBuilds complex systems from simple, scalable parts
AbstractionCollapses many specific models into one general form

Geometrically: a representation change can turn a jagged search landscape into a smooth one; symmetry deletes redundant regions (if A ≡ B, reasoning about A explains B for free); abstraction and dimensional reduction shrink the space; constraint manipulation reshapes its topology. Breakthroughs often chain several movesidentify symmetry → change representation → relax constraints → analyze a limiting case → discover an invariant. The recurring deep pattern:

observe phenomenon → build initial model → detect hidden structure
→ change representation → reveal a simpler underlying law

This is why understanding can be framed as efficient navigation of concept space: construct representations, identify structure, move toward compact predictive models. Experts appear intuitive because they have internalized the geometry of their domain's space.

What makes understanding efficient

If understanding is a model-improvement loop, why do some systems learn far faster than others? The efficiency of the loop depends on its surrounding architecture:

A training system for understanding

Treating understanding as a trainable skill, the operations above can be drilled in a deliberate sequence. For any concept:

  1. Structural mapping — identify the core primitives, their relationships, and the constraints/invariants. Build a concept graph, not a linear list.
  2. Mechanistic modeling — explain the concept across Marr's three levels. For attention: computational = select relevant information; algorithmic = similarity-based weighting; implementation = matrix multiplication on a GPU.
  3. Derivation — reconstruct results instead of memorizing them (derive backprop from the chain rule + computation graph; derive attention from similarity search). Derivation reveals assumptions, exposes hidden constraints, and compresses.
  4. Multi-representation — express the concept as equation, diagram, code, analogy, and physical intuition. Understanding that survives a change of representation is the real thing.
  5. Perturbation testing — stress the model: what if a parameter → 0? scale → ∞? noise increases? a constraint is removed? (Transformers without positional encoding become permutation-invariant — which reveals what positional encoding is for.)
  6. Compression — reduce the idea to its minimal core. Per David Deutsch, a good explanation is hard to vary without breaking its predictive power.
  7. Transfer — find the concept's structural twin in another domain (gradient descent ↔ energy minimization; feedback loops in biology, cybernetics, and ML).

Run as a daily loopencounter concept → map → model → derive → stress-test → compress → transfer — across hundreds of concepts, this is what builds mastery. Early on it feels vague, nonlinear, and slow because the meta-models for learning don't yet exist; once the operations internalize, learning accelerates noticeably.

This corresponds to a predictable arc — the "complexity valley":

beginner → illusion of understanding → complexity shock
→ structured mental models → expert intuition

The disorienting middle, where a domain reveals its real depth, is not a failure of the method; it is a necessary stage of it.

The architecture of a maximally understanding system

Push the question further — what design maximizes the rate at which understanding improves? — and the same structural properties recur across cognitive science, AI, and scientific practice:

The convergence is striking: predictive brains, deep generative models, hypothesis–experiment science, and hierarchical concept systems are independent discoveries of the same architecture, which suggests common structural requirements for efficient understanding. The frontier question this raises — what representations achieve the largest compression of reality while preserving predictive power? — is essentially the search for fundamental explanatory frameworks in science.

The limits of understanding

No system reaches perfect understanding. The ceilings are structural, not merely practical:

LimitSource
InformationFinite, incomplete observations admit multiple explanations — understanding is irreducibly probabilistic (Shannon)
ComputationalThe right model may exist but cost exponential search to find (complexity theory)
RepresentationA system can only model what its internal language can express (irrationals before the reals; quantum phenomena before quantum theory)
ApproximationAll models simplify — "all models are wrong, but some are useful" (Box)
ObservationalSome states (internal biology, hidden social variables) can't be observed directly
Self-referenceSufficiently expressive systems can't prove all truths about themselves (Gödel) — a ceiling on self-understanding
Environmental complexitySensitivity to initial conditions limits long-range prediction (chaos)
Cognitive resourcesFinite memory, attention, and speed force reliance on heuristics that sometimes misfire
Generality vs. precisionBroad theories lose precision; precise ones lose scope (Newtonian vs. quantum field theory)
The expanding unknownBetter models reveal new anomalies, so the frontier of ignorance keeps growing

Understanding is therefore best seen as an asymptotic process — systems approach ever-more-accurate models without ever reaching a final, complete description. Scientific progress is the steady pushing-outward of these limits: better measurement expands available data, better mathematics expands the representation language, better computation expands the searchable model space. This is also why inquiry stays open-ended across centuries — and why the opening paradox holds.

The open frontier: cognitive primitives

A recurring hypothesis is that a small set of primitive operations — composition, abstraction, analogy, decomposition, transformation, generalization — generates most conceptual knowledge through recursion, the way few axioms generate a vast theory space, or simple cellular-automaton rules generate complex patterns. The search for the minimal such set runs through several traditions:

TraditionPrimitive viewKey figures
Symbolicsymbols + manipulation rules → intelligence (Physical Symbol System Hypothesis)Newell & Simon
Probabilisticpriors + likelihoods + update rulesBayesian inference
Information-theoreticpattern detection, compression, prediction, error correctionShannon
Neuralunits, weighted connections, learning rules (gradient descent)deep learning
Compositionalobjects, relations, causal rules composed language-likeTenenbaum

Despite decades of work there is no consensus on the exact primitives, but many researchers suspect the truly fundamental set is surprisingly small — a kind of generative grammar of thought. If so, the implication for mastery is direct: understanding is less about storing knowledge than about mastering the transformations that generate it.

The recursive turn

Studied far enough, understanding becomes an object of its own study, and the process turns recursive — one builds models of concepts, then of reasoning, then of model-construction itself:

understanding concepts → understanding systems → understanding how understanding works

Each layer improves the next. Deep conceptual mastery tends to move through three phases, with the second the hardest:

PhaseDescription
Accumulationgathering concepts and tools
Structural integrationconnecting them into networks
Generative insightproducing new models and frameworks

This connects naturally to the rest of this site — to Sand to Band comprehension across abstraction levels, and to the feedback-driven, Go Slow To Go Fast view of action and learning as a closed-loop system. It also runs up against the genuinely open questions: Can understanding be measured objectively? What does embodied, neuroscience-grounded experience add that text alone cannot? Can artificial systems truly understand, or only simulate the operations of understanding?


"The more I understand about understanding, the more I realize how little I understand about understanding."