The Hidden Complexity of Thought
Thinking is harder than people think. And the hardest kind of thinking is the kind people don't even register as "thinking".
If you've read Thinking, Fast and Slow, you'll be familiar with the concepts of a "system 1" that does fast, unconscious processing, and a "system 2" that does slow, methodical processing. The important insight here is that these systems don't sit side by side. System 2 is built on top of system 1. Conscious thought is an emergent property of all the low-level unconscious thinking that's going on.
(Every part of your conscious thinking process originally comes from your unconscious mind. Think about how you explicitly work through a problem step-by-step. How do you determine what step comes next? How do you perform one of those steps? The details are all performed unconsciously, and only a "summary" is brought to conscious awareness.)
If you had to use your system 2 to catch a ball, you couldn't do it.
(Just watch people try to program a robot to catch a ball or assemble a puzzle.)
System 1 evolved first, and it's found in the minds of all animals. System 2 comes into existence once system 1 gets complicated enough to support a second layer of processing on top of it. (Think about someone building a computer in Minecraft. Their physical computer is simulating a universe, and then a computer is implemented inside the physics of that universe. The computer inside Minecraft is vastly slower and more limited than the actual computer it's built on top of.)
There's a conception of fields of thought as being "hard" or "soft", such as in the hard sciences/soft sciences and hard skills/soft skills dichotomies. And the hard skills/sciences are generally thought of as being more difficult. This is generally true, for humans. Soft skills are the sorts of things that we evolved to be good at, so they feel natural and effortless. Hard skills are those that didn't matter all that much in our ancestral environment, so we have no natural affinity for them.
But in a fundamental sense, hard skills are vastly simpler and easier than soft skills. Hard skills are those that can be formalized. Knowing how to perform long division is challenging for humans, but it's trivial to program into a computer. Knowing how to hold a polite conversation with a coworker? Trivial for most humans, but almost impossible for an algorithmically-programmed computer.
(The technical term for this is Kolmogorov complexity; the length of the shortest computer program that can do what you want. The shortest program that can perform long division is much shorter than the shortest program that can competently navigate human social interaction.)
This is why experts in "hard skills" tend to be good at explaining them to others
Soft skills experts, on the other hand, tend to function through intuition and tacit knowledge. When someone is asked to explain why they're so charismatic, they'll often stumble and say things that boil down to "just say nice things instead of rude things". They don't actually understand why they behave they way they do; they just behave in the way that feels right to them, and it turns out that their unconscious mind is good at what it does.
This is the explanation behind Moravec's paradox; the observation that computers tend to be good at the sorts of things humans are bad at and vice versa.
Computers formally implement an algorithm for a task. This makes them only capable of performing tasks that are simple enough for humans to design an algorithm for.
Life evolved to do things that were necessary for survival. These things require a massive amount of low-level processing, which can be optimized specifically to do those things and nothing else. You are in some sense doing "calculus" any time you catch a ball mid-flight, but the mental processes doing that calculus have been optimized specifically for catching thrown objects, and cannot be retasked to do other types of calculus.
The end result is that computers are good at things with low Kolmogorov complexity, while humans are good at things that are useful for survival on the surface of a planet. There's no particular reason to expect these two things to be the same.
Neural networks are the computer scientists' attempt to break this trend. Rather than being explicitly programmed with an algirithm that a human designed to return the desired result, they learn rough heuristics from large quantities of data. This is very similar to how humans learn, though humans also come with a bunch of "pre-programmed" instincts, while neural networks have to start from scratch.
As a result, GPT-4 can carry on a conversation about as well as a human can, but ask it to multiply together two large numbers and it will give you an answer that "looks right", in that it has about the right number of digits and maybe starts with the correct few digits, but is actually wrong.
Modern scaling approaches to LLMs are trying to make them complex enough to recreate the second level of explicit reasoning on top of these heuristics, just like happened to humans once our brains became complicated enough.