Can GPT-4 *Truly* Write Code?
Since ChatGPT got here out I’ve seen fairly lots of people posting about its functionality to write down code. Individuals have posted about how they had it design and implement a number puzzle game (with out realizing that that game it “invented” already exists), how they’ve had it clone pong, and hell I’ve even used it to write down a few simple python utility scripts. It’s very succesful, and a fairly useful gizmo.
However, there’s a commonality in all of those examples individuals have been posting. They’re all issues which were solved earlier than, or extraordinarily minor modifications to these issues. And whereas, to be truthful, a *lot* of programming is actually simply that- gluing collectively present options and becoming present code into your particular use case, the *exhausting* a part of programming is fixing issues that haven’t been solved earlier than.
So I made a decision to check it on a very tough “algorithmic” downside I solved a pair years in the past. One thing small and remoted sufficient that I might match it right into a ChatGPT immediate, however with sufficient subtleties that I really feel like it could have bother getting it proper.
Let’s begin with an outline of the particular real-use-case downside. In Mewgenics, motion talents use pathfinding to get the cat from his origin to his vacation spot.
Cats have a most motion vary stat (on this case it’s 6) and tiles have a price (on this case its 1 for primary tiles and 9999 for blocking obstacles). We even have water tiles that price 2 to path by way of.
Up to now, nothing I’ve described is out of the extraordinary for grid primarily based pathfinding. Each Dijkstra’s Algorithm and A* can deal with tiles with completely different prices simply advantageous and its trivial to only minimize off the trail after a sure distance to take care of the utmost motion vary.
The complication is available in once we add Hearth (and different hazard sort tiles) into the combination. Hearth tiles don’t price additional to pathfind by way of like water tiles do, nonetheless a cat actually needs to keep away from pathing by way of the hearth tile if it will possibly. Proven right here is the working answer to that. There’s quite a lot of extra difficult instances as properly, if there’s quite a lot of fireplace tiles it ought to undergo the least variety of them it will possibly and such, and that’s all working properly.
Now, its not utterly apparent why it is a tough downside. Or at the least, why the answer is extra difficult than merely A* with a modified heuristic that considers “need prices” like fireplace. It’s an issue that may be very subtly completely different to the usual A* pathfinding, and the answer, whereas very very near A*, has a couple of non-intuitive adjustments that mattered quite a bit. This was a “exhausting” downside, and again in 2020 after I was first including fireplace tiles into the sport, this pathfinding downside took a few days to really correctly remedy. The principle complication is the limiting issue of “most motion vary” and the way it interacts with each the price to path by way of tiles and the “need price” used to keep away from hazard tiles. You may see in that gif I posted how the trail adjustments as you attempt to transfer farther and farther previous the hearth tile, finally pathing round it could not be attainable inside 6 motion, so it has to path by way of it as a substitute.
And there’s quite a lot of complication to it past the easy instances too:
(14 complete motion vs 10 complete motion on the identical degree)
There’s a form of inbuilt assumption in A* and Dijkstras that, if the shortest path from A to C passes by way of tile B, that it additionally takes the shortest path from A to B. So should you’ve discovered the shortest path to B already, you may simply begin there and proceed on to C. These algorithms depend on that to be environment friendly, as a result of you may skip tiles that you simply’ve already discovered a shorter path to when pulling stuff of the precedence queue, and when reconstructing the trail on the finish you may depend on this by having every tile retailer the tile it was reached from through the pathfinding algorithm
That is what I imply: This above scenario doesn’t occur with “customary” pathfinding! The most effective path from A to B doesn’t line up with one of the best path from A to C, regardless of the trail from A to C containing tile B. This uh, complicates issues and breaks some assumptions that present pathfinding depends on.
So, its A* or Dijkstra with modified prices and heuristics proper? Properly… virtually, however not fairly. You may see the code for it beneath (together with some additional stuff for minimizing bends within the path as properly). You’ll discover… it’s not precisely dijkstra or A*, and there’s quite a lot of stuff in there that isn’t apparent why its the way in which it’s. In truth, I wrote this in 2020 and have since cleared it from my thoughts… it’s not even apparent to me anymore why stuff is the way in which it’s. There was quite a lot of trial and error to get this to work, quite a lot of shifting issues round and doing slight modifications to the algorithm. In preparation for this weblog publish I double checked and tried to run it again and make this simply modified A* once more, however each change to simplify it simply added in a bunch of pathfinding bugs. So its the way in which it’s. And the practice of thought that has led right here has largely been dumped from my thoughts, since it really works already.
Why is the trail saved contained in the cells that get added to the precedence queue, as a substitute of the usual means A* does it the place the nodes retailer which node they had been reached from? Why does it test that the trail is one of the best path when popping from the queue as a substitute of when pushing to the queue? Each of those are decently massive diversions from A*, however completely mandatory for this to work apparently.
This file incorporates bidirectional Unicode textual content which may be interpreted or compiled in another way than what seems beneath. To evaluation, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Anyway. That’s all of the background information for the issue. You’re welcome to strive fixing it your self if you wish to get an actual really feel for simply how not-trivial it’s (or should you assume that it actually might simply be A* with a modified heuristic as a substitute) (and if you’re going to attempt to “counsel” to me methods to enhance it, then please really implement and take a look at it in context first, and know I gained’t use it as a result of what I’ve works advantageous already, has been *completely* examined on 1000’s of various instances, and doesn’t have any efficiency points both)
Right here’s the meat of this text. The precise experiment.
Can GPT-4 (by way of ChatGPT Plus) really remedy this downside?
I requested it
Write a operate in code that solves the next downside:
An agent wants to seek out one of the best path on a 10×10 tile grid from their present location to a goal location.
They’ve a restricted motion vary of 5 factors
Common tiles price 1 level to maneuver by way of, water tiles price 2 factors to maneuver by way of.
Hearth tiles price 1 level to maneuver by way of, however they need to keep away from pathing by way of them even when it means taking an extended path to their vacation spot (offered the trail continues to be inside their restricted motion vary)
For the needs of the experiment, I explicitly overlooked particulars of the algorithm, as a substitute gave it an outline of the *downside* that must be solved. That is the way you begin with an issue anyway, and I *am* really curious if there’s possibly a extra apparent or trivial answer I missed, so I’d slightly not pollute it’s practice of thought with implementation particulars. I additionally didn’t say what language to code in since I assumed it’ll work higher if it picks python slightly than asking it to write down C++. I solely care in regards to the precise algorithm in any case.
ChatGPT responded,
You need to use the A* pathfinding algorithm to seek out the optimum path contemplating the prices and constraints. This is a Python operate that solves the issue:
This file incorporates bidirectional Unicode textual content which may be interpreted or compiled in another way than what seems beneath. To evaluation, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Which like, it *sorta* understood the issue, although as a substitute of “avoiding” going by way of fireplace it simply… by no means goes by way of fireplace ever.
So I ask
this code refuses to path by way of fireplace, it ought to nonetheless path by way of fireplace if it has to so as to attain its vacation spot inside its motion vary
And it apologized after which mounted the a_star_search operate like so
This file incorporates bidirectional Unicode textual content which may be interpreted or compiled in another way than what seems beneath. To evaluation, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Once more, not fairly. It made no distinction between the penalty price for pathing by way of fireplace, and the bodily price for pathing by way of tiles, so it nonetheless wouldn’t path by way of fireplace (because the +1000 penalty makes fireplace tiles means out of your motion vary).
Anyway I went forwards and backwards with it a couple of extra occasions, had it generate a check case, and stuck a runtime error that occurred after I really tried to run it. Landed on this, and I personalized the check case to one thing easy that may fail fairly clearly. It blended up x and y in its check case stuff, however I didn’t wish to argue with it over that so I simply handled it (its fairly gradual to re-generate)
This file incorporates bidirectional Unicode textual content which may be interpreted or compiled in another way than what seems beneath. To evaluation, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
On the check case given, as a result of it ends on a fireplace tile it simply ignores ever attempting to keep away from the hearth tiles.
I requested it to repair it, and it gave me an answer that went again to only failing to pathfind by way of fireplace tiles in any respect.
I requested it to repair that, and it went in a circle again to the one it had earlier than.
After getting in circles a couple of extra occasions, I made a decision that was it. It obtained *shut*. It appeared to grasp the issue, but it surely couldn’t really correctly remedy it.
Would this have helped me again in 2020? In all probability not. I attempted to take its answer and use my mushy human mind to change it into one thing that truly labored, however the path it was happening was not fairly right, so there was no salvaging it. Once more, the modifications from primary A* to the precise answer should not apparent, so ranging from a base of “its simply A*” isn’t actually that a lot of a assist. Furthermore, GPT doesn’t even actually acknowledge that this downside is tougher than simply “modifying A*”, an perception that may have saved me time after I initially wrote this algorithm.
Anyway, maybe this isn’t an issue suited in the direction of GPT. In any case, A* is a particularly frequent algorithm, one which it definitely has 1000’s or hundreds of thousands of examples of in its coaching knowledge, so it might be restricted from deviating too removed from that, regardless of how a lot you attempt to push it and assist it.
I attempted this once more with a few different “tough” algorithms I’ve written, and its the identical factor just about each time. It’s going to usually simply suggest options to comparable issues and miss the subtleties that make your downside completely different, and after a couple of revisions it’ll usually simply crumble.
Within the case of this one (that is how Mew resolves Knockback) it sort of simply missed *all* the small print of the issue, like that it must not let 2 shifting objects transfer onto the identical time. This one was notably unhealthy, in that after I by accident requested GPT 3.5 as a substitute of GPT 4 it obtained *a lot* nearer to an actual answer than GPT 4 did.
This.. isn’t even shut.
Once I requested GPT 3.5 by accident it obtained a lot a lot nearer. That is really a “working answer, however with some bugs and edge instances”. It might probably’t deal with a cycle of objects shifting onto one another in a sequence, however yeah that is significantly better than the completely nothing GPT4 gave… odd…
Its attainable that comparable issues to which have proven up in its coaching set.
It’s exhausting to consider really distinctive issues that it positively has not seen something just like earlier than, at the least ones that may be described in a paragraph or two. Lets strive a bizarre constructed instance. Let’s have it create a collision detection algorithm between… crescent moon shapes. (I couldn’t discover an algorithm for this from a fast google, and it appears considerably non-trivial, so yeah let’s do that)
After which it coded that algorithm. To ChatGPT’s credit score, it’s not instantly apparent why it’s improper should you strive to consider it. Nevertheless it ain’t too exhausting to discover a counter instance.
I requested it to strive once more
Once more, its really actually exhausting to seek out counter examples, however right here’s one which fails 2.a (outer circles collide, every internal circle collides with the opposite’s outer circle however not with one another, but the crescents should not colliding):
I feel ChatGPT is simply sort of bullshitting at this level. It doesn’t have a solution, and can’t consider one, so it’s simply making shit up at this level. And like… its actually good at making shit up although! Like that algorithm… the instances the place it fails are delicate! You can simply publish that algorithm in a ebook and confuse individuals into pondering they in some way tousled the implementation when there’s bugs within the collision detection, as a result of it definitely *sounds* like the kind of algorithm that may remedy this.
Given an outline of an algorithm or an outline of a well-known downside with loads of present examples on the internet, yeah GPT-4 can completely write code. It’s largely simply assembling and remixing stuff it’s seen, however TO BE FAIR… quite a lot of programming is simply that.
Nonetheless, it completely fumbles when attempting to *remedy precise issues*. The kind of novel issues that haven’t been solved earlier than that you could be encounter whereas programming. Furthermore, it likes to “guess”, and people guesses can waste quite a lot of time if it sends you down the improper path in the direction of fixing an issue.
The crescent instance is a bit damning right here. ChatGPT doesn’t know the reply, there was no instance for this in its coaching set and it will possibly’t discover that in it’s mannequin. The helpful factor to do can be to only say “I have no idea of an algorithm that does this.” However as a substitute it’s overcompetent in its personal capabilities, and simply makes shit up. It’s the identical downside it has with loads of different fields, although it’s unusual competence in writing easy code sorta hides that truth a bit.
Anyway, in its personal phrases,