AI scales like a child learns language says Wits study

Author Profile Image

Ronald Ralinala

May 28, 2026

The University of the Witwatersrand has unveiled new research that sheds light on a question echoing boardrooms and classrooms alike: why AI gets smarter as it scales. By marrying concepts from linguistics with deep‑learning theory, the study suggests that large language models acquire the same kind of structured, compositional behaviour that children develop when learning to speak. The implications reach far beyond academic curiosity, offering a fresh lens through which to view the rapid progress of generative AI tools that dominate South Africa’s tech landscape today.

Lead author Devon Jarvis, a lecturer in the School of Computer Science and Applied Mathematics, explains that the team built a “computer brain” mirroring the developmental constraints of a child. Feeding it data resembling human language, they observed how successive generations of the model refined its internal representations. The result? A dataset that becomes progressively more organised, echoing the way human languages evolve to become easier for new learners to grasp.

In linguistic terms, the phenomenon is known as iterated learning – the idea that language handed down through generations gradually sheds irregularities because each learner favours patterns that are easiest to acquire. Jarvis and his colleagues paired this with deep neural networks, which have long served as computational analogues of the brain’s information processing. Their hybrid approach uncovers a mechanism that could explain the emergent capabilities of today’s massive AI systems.

How depth drives emergent structure in AI models

The researchers ran parallel experiments with shallow and deep linear networks, deliberately opting for simplified mathematical models to keep the dynamics tractable. The results were stark: only networks with enough layers – enough “depth” – managed to internalise the regularities that make language learnable. Shallow architectures failed to capture the compositional nuances, producing flat, unstructured outputs regardless of how much data they consumed.

Network TypeDepth (layers)Ability to capture structureTypical performance
Shallow1‑2Minimal – fails to develop compositional rulesLow accuracy on grammar‑heavy tasks
Moderate3‑5Partial – some pattern recognition, but inconsistentModerate accuracy, struggles with novel combinations
Deep6+High – robust emergence of structured, compositional behaviourHigh accuracy, generalises to unseen constructions

The table makes clear that depth is the decisive factor; without sufficient processing layers, even abundant data cannot coax a model into learning the underlying grammar of language. This mirrors observations in commercial LLMs, where scaling up parameters and layers often unlocks capabilities that were absent in earlier, smaller versions.

Jarvis likens the process to a child learning that most birds can fly, then encountering a penguin. The initial over‑generalisation is corrected, forcing the mind to sharpen its concepts. In the same way, each “generation” of the computer brain refines its internal map, discarding hard‑to‑learn artefacts and reinforcing the patterns that ease future learning.

It is essential to stress that the study employed deep linear networks, not the massive transformer‑based models behind ChatGPT or Claude. While the findings are theoretical, they point to a fundamental principle that could underlie the empirical success of scaling in contemporary AI. By stripping down the architecture to its bare bones, the researchers were able to analyse the dynamics mathematically – a feat far harder with today’s opaque, billion‑parameter behemoths.

“The fact that this was shown in a very simple version of the technology underpinning the modern boom in AI tools is also encouraging,” Jarjar tells SA Report. “It suggests that the intersection of multiple fields – linguistics, neuroscience and machine learning – holds the key to the fundamental principles of cognition.”

South Africa’s burgeoning AI ecosystem stands to benefit from these insights. Local start‑ups developing domain‑specific language models can now better gauge how much depth they need to embed genuine compositional understanding, rather than relying solely on data volume. Universities and research institutes may also use iterated‑learning simulations as teaching tools, giving students a tangible grasp of why scale matters.

The paper’s co‑authors – Richard Klein (head of the School of Computer Science and Applied Mathematics), Benjamin Rosman (director of the Wits Mind Institute) and Andrew Saxe (University College London) – all concur that the study opens a pathway for future work. By extending the framework to non‑linear networks and more realistic linguistic corpora, the team hopes to bridge the gap between elegant theory and the messy reality of commercial AI.

In a nation where digital transformation is accelerating, understanding the roots of AI’s scaling law is more than academic indulgence. It equips policy‑makers, investors and technologists with a clearer picture of what drives performance, helping to allocate resources more wisely. As the research community continues to peel back the layers of deep learning, one thing becomes evident: the depth of a model isn’t just a technical detail – it’s the engine that powers the leap from “big data” to genuinely intelligent behaviour.