Hanoi Turned Upside Down – O’Reilly

July 30, 2025

37

Prompted partly by Apple’s paper in regards to the limits of huge language fashions (“The Phantasm of Pondering: Understanding the Strengths and Limitations of Reasoning Fashions through the Lens of Downside Complexity”), I spent a while enjoying with Towers of Hanoi. It’s an issue I solved some 50 years in the past once I was in faculty, and I haven’t felt the will or have to revisit it since. Now, in fact, “We Can Haz AI,” and all which means. After all, I didn’t need to write the code myself. I confess, I don’t like recursive options. However there was Qwen3:30B, a “reasoning mannequin” with 30-billion parameters that I can run on my laptop computer. I had little doubt that Qwen may generate Towers program, however I believed it could be enjoyable to see what occurred.

First, I requested Qwen if it was accustomed to the Towers of Hanoi downside. After all it was. After it defined the sport, I requested it to jot down a Python program to unravel it, with the variety of disks taken from the command line. High quality—the end result appears to be like so much like this system I bear in mind writing in faculty (besides that was method, method earlier than Python—I believe I used a dialect of PL/1). I ran it, and it labored completely.

The output was a bit awkward (only a listing of strikes), so I requested it to animate it on the terminal. The terminal animation wasn’t actually passable, so after a few tries, I requested it to attempt a graphical animation. I didn’t give it any extra info than that. It generated one other program, utilizing Python’s tkinter library. And once more, this labored completely. It generated a pleasant visualization—besides that once I watched the animation, I noticed that it had solved the issue the wrong way up! Giant disks had been on high of smaller disks, not vice versa. I need to be clear—the answer was completely appropriate; along with inverting the towers, it inverted the rule about transferring disks, in order that it was by no means placing a smaller disk on high of a bigger one. If you happen to stacked the disks in a pyramid (the “regular” method) and made the identical strikes, you’d get the proper end result. Symmetry FTW.

So I advised Qwen that the answer was the wrong way up and requested it to repair it. It thought for a very long time and ultimately advised me that I should be trying on the visualization the incorrect method. Maybe it thought I ought to stand on my head? Proving, if nothing else, that LLMs might be assholes, too. Similar to 10x programmers. Perhaps that’s an argument for AGI?

Severely, there’s some extent right here. It’s definitely vital to analysis the bounds of synthetic intelligence. It’s undoubtedly fascinating that reasoning LLMs tended to desert issues that required an excessive amount of reasoning and had been most profitable at issues that solely required a average reasoning finances. Fascinating, however is that shocking? Very arduous issues are very arduous issues for a purpose: they’re very arduous. And most people behave the identical method: We quit (or search for the reply) when confronted with an issue too arduous for us to unravel.

However we should additionally take into consideration what we imply by “reasoning.” I had little doubt that Qwen may resolve Towers. In spite of everything, options should be in a whole bunch of GitHub repos, StackOverflow questions, and on-line tutorials. Do I, as a consumer, care the least little bit if Qwen appears to be like up the answer in an exterior supply? No, I don’t, so long as the output is appropriate. Do I believe which means that Qwen shouldn’t be “reasoning”? Ignoring all of the anthropomorphism that we’re caught with, no. If an affordable and reasoning human is requested to unravel a troublesome downside, what can we do? We attempt to search for a course of for fixing the issue. We confirm that the method is appropriate. And we use that course of in our resolution. If computer systems are related, we’ll use them, somewhat than fixing on pencil and paper. Why ought to we count on something completely different from LLMs? If somebody advised me that I needed to resolve Hanoi with 15 disks (32,767 strikes), I’m certain I’d get misplaced someplace between the start and finish, though I do know the algorithm. However I wouldn’t even consider itemizing the strikes by hand; I’d write a program (just like the one Qwen generated) and have it dump out the strikes. Laziness is a advantage—that’s one thing Larry Wall (creator of Perl) taught us. That’s reasoning—it’s as a lot about on the lookout for the straightforward resolution as it’s doing the arduous work.

A weblog submit I learn lately reported one thing comparable. Somebody requested openAI’s o3 to resolve a traditional chess downside by Paul Morphy (in all probability the best chess participant of the nineteenth century). The AI realized that its makes an attempt to unravel the issue had been incorrect, so it appeared up the reply on-line, used that as its reply, and gave rationalization of why the reply was appropriate. This can be a completely affordable method to resolve the issue. The LLM experiences no pleasure, no validation, in fixing a troublesome chess downside; it doesn’t really feel a way of accomplishment. It’s simply supplying a solution. Whereas it’s not the form of reasoning that AI researchers need to see, trying up the reply on-line and explaining why the reply is appropriate is nice demonstration of human-like reasoning. Perhaps this isn’t “reasoning” from a researcher’s perspective, nevertheless it’s definitely downside fixing. It represents a series of thought wherein the mannequin decides that it could possibly’t resolve the issue by itself, so it appears to be like up the reply on-line. And once I’m utilizing AI, downside fixing is what I’m after.

I need to make it clear that I’m not a convert to the cult of AGI. I don’t contemplate myself a skeptic, both; I’m a non-believer, and that’s completely different. We are able to’t discuss basic intelligence meaningfully if we are able to’t outline what “intelligence” means. The hegemony of the technorati has us chasing after problem-solving metrics, as if “intelligence” might be represented by a quantity. It’s all Asimov till you should run benchmarks—then it’s diminished to numbers. If we all know something about intelligence, we all know it’s not represented by a vector of benchmark outcomes testing the flexibility to unravel arduous issues.

But when AI isn’t the embodiment of some form of undefinable intelligence, it’s nonetheless the best engineering undertaking of the twenty first century. The power to synthesize human language accurately is a significant achievement, as is the flexibility to emulate human reasoning—and “emulation” is a good description of what it’s doing. AI’s detractors ignore—bizarrely, in my view—its great utility, as if citing examples the place AI generates incorrect or grossly inappropriate output implies that it’s ineffective. That isn’t the case—nevertheless it does require pondering fastidiously about AI’s limitations. Programming with AI help will definitely require extra consideration to debugging, testing, and software program design—all themes that we’ve been watching fastidiously over the previous few years, and that we’re speaking about in our AI Codecon conferences. Functions like detecting fraud in welfare functions could should be scrapped or placed on maintain, as town of Amsterdam discovered, till we are able to construct AI programs which might be free from bias. Constructing bias-free programs is more likely to be a lot tougher than fixing troublesome issues in arithmetic. It’s an issue which may not be solvable—we people definitely haven’t solved it. Both worrying about or breathlessly anticipating AGI achieves little, apart from diverting consideration away from each helpful functions of AI and actual harms brought on by AI.

Hanoi Turned Upside Down – O’Reilly

Related Articles

A mirrored image on ‘Utilizing carbon nanodots as cheap and environmentally pleasant sensitizers in mesoscopic photo voltaic cells’

Intuitive buys European surgical robotic distributors

Reshaping cell community paradigm within the AI period

LEAVE A REPLY Cancel reply

Latest Articles

A mirrored image on ‘Utilizing carbon nanodots as cheap and environmentally pleasant sensitizers in mesoscopic photo voltaic cells’

Intuitive buys European surgical robotic distributors

Reshaping cell community paradigm within the AI period

AI Proof Verification: Gauss Tackles 24D

CollPlant targets formulation bottleneck with new DLP bioprinting equipment

ABOUT US