Animals and humans get very smart very quickly with vastly smaller amounts of training data than current AI systems. Current LLMs are trained on text data that would take 20,000 years for a human to read. And still, they haven't learned that if A is the same as B, then B is the same as A. Humans get a lot smarter than that with comparatively little training data. Even corvids, parrots, dogs, and octopuses get smarter than that very, very quickly, with only 2 billion neurons and a few trillion "parameters." My money is on new architectures that would learn as efficiently as animals and humans. Using more text data (synthetic or not) is a temporary stopgap made necessary by the limitations of our current approaches. The salvation is in using sensory data, e.g. video, which has higher bandwidth and more internal structure. The total amount of visual data seen by a 2 year-old is larger than the amount of data used to train LLMs, but still pretty reasonable. 2 years = 2x365x12x3600 or roughly 32 million seconds. We have 2 million optical nerve fibers, carrying roughly ten bytes per second each. That's a total of 6E14 bytes. The volume of data for LLM training is typically 1E13 tokens, which is about 2E13 bytes. It's a factor of 30. Importantly, there is more to learn from video than from text because it is more redundant. It tells you a lot about the structure of the world.