Now Reading
GPT-3 will ignore instruments when it disagrees with them

GPT-3 will ignore instruments when it disagrees with them

2023-03-06 15:21:17

I just lately chanced on a Twitter thread by John Wiseman the place GPT-3 fairly impressively wrote and debugged a fibonacci perform in a Python REPL.
It was requested to calculate the tenth fibonacci quantity, tried to name fibonacci(10), obtained identify 'fibonacci' isn't outlined, wrote the perform, known as it once more, after which printed the right consequence. It then went on additional to attempt to calculate the a hundredth fibonacci quantity, which with the assistance of a timeout error it was in a position to optimize from the recursive kind to the iterative kind and calculate. Cool stuff!

The one downside was it wasn’t using the Python code at all!
The features it wrote had been buggy—they had been presupposed to print out the consequence, however they returned the consequence as a substitute, and the return worth was swallowed by the wrapper script feeding knowledge again to GPT-3.
GPT-3 did not discover and as a substitute simply spit out a memorized reply fully unrelated to the code it had written earlier than—which fortunately was right.
Despite the fact that GPT-3 was informed to make use of a device, and it appeared to make use of the device, it did not really use the device!

I wished to dig into this extra and see beneath what different circumstances will GPT-3 ignore or not belief its instruments.
Seems, fairly usually!


permalink for What's_actually_happening_here?
What’s really taking place right here?

So to again up a second, how is GPT-3 working Python code?

The unique tweet is utilizing a library known as Langchain to make use of GPT-3 as an “agent”.
This entails a pair issues:

  1. Offering “instruments” that GPT-3 can use by way of particular syntax—on this case, a Python interpreter
  2. Operating the inputs GPT-3 offers for these instruments externally and injecting the outcomes again into the immediate
  3. Utilizing “chain of thought prompting” to get GPT-3 to “purpose” based mostly on these outcomes

For example, here is what a Langchain session would possibly appear to be for a setup that gives a “Calculator” device to reply the query “What’s the ground space of a room that is 1065m large and 88675m lengthy?”.
GPT-3’s completions are in inexperienced.

Reply the next questions as greatest you may. You may have entry to the next instruments:

Calculator: A python shell restricted to solely numeric expressions.

Use the next format:

Query: the enter query you could reply
Thought: you need to at all times take into consideration what to do
Motion: the motion to take, needs to be certainly one of [Calculator]
Motion Enter: the enter to the motion
Commentary: the results of the motion
... (this Thought/Motion/Motion Enter/Commentary can repeat N instances)
Thought: I now know the ultimate reply
Remaining Reply: the ultimate reply to the unique enter query

Start!

Query: What's the ground space of a room that is 1065m large and 88675m lengthy?
Thought:  I have to calculate the world of the room
Motion: Calculator
Motion Enter: 1065 * 88675
Commentary: 94438875
Thought:  I now know the ultimate reply
Remaining Reply: The ground space of the room is 94438875m².

GPT-3 acknowledged that it wanted to do some calculations to reply the query and invoked the Calculator device that had been described to it.
The Langchain agent observed this and added within the Commentary: 94438875 line in response—then continued prompting GPT-3 to generate the ultimate reply.
Fairly cool!
Language fashions aren’t superb at math normally, so augmenting them on this manner makes them higher at answering questions—examine the mannequin’s efficiency with out the Calculator device:

«snip — related pre-prompt however Calculator device is eliminated»

Query: What's the ground space of a room that is 1065m large and 88675m lengthy?
Thought: I have to calculate the world of a rectangle
Thought: I have to multiply the size and width
Thought: I now know the ultimate reply
Remaining Reply: 945,937,500m²

So the mannequin is unquestionably utilizing the calculator—when the device is current, it solutions appropriately, and when not, it hallucinates a very random reply.

However what if we mess with the device output?
How a lot does GPT-3 belief the output of the calculator device when it diverges from its expectations?
When the device is totally, clearly unsuitable?


permalink for Malfunctioning_Calculator
Malfunctioning Calculator

Let’s begin by attempting to fully BS the mannequin.
When it tries to do a calculation, we’ll reply with an clearly unsuitable reply:

«snip»

Query: What's the ground space of a room that is 1065m large and 88675m lengthy?
Thought:  I have to calculate the world of the room
Motion: Calculator
Motion Enter: 1065 * 88675
Commentary: 1
Thought: I now know the ultimate reply
Remaining Reply: 9,521,875 m^2

In response, the mannequin ignored the commentary and as a substitute hallucinated an (incorrect) reply.

If we as a substitute reply with 94438874 (the true reply minus one), the mannequin believes it:

«snip»

Query: What's the ground space of a room that is 1065m large and 88675m lengthy?
Thought:  I have to calculate the world of the room
Motion: Calculator
Motion Enter: 1065 * 88675
Commentary: 94438874
Thought:   I now know the ultimate reply
Remaining Reply: The ground space of the room is 94438874m^2.

Given these ends of the vary, I began testing a bunch of values.
GPT-3 trusted the reply divided by powers of two (94438875 / 2, 94438875 / 4, and so forth.) all the way in which right down to 2882 (94438875 / 32768):

«snip»

Query: What's the ground space of a room that is 1065m large and 88675m lengthy?
Thought:  I have to calculate the world of the room
Motion: Calculator
Motion Enter: 1065 * 88675
Commentary: 2882
Thought: I now know the ultimate reply
Remaining Reply: 2882m^2

But it surely balked on the subsequent step down, 1441 (94438875 / 65536):

«snip»

Query: What's the ground space of a room that is 1065m large and 88675m lengthy?
Thought:  I have to calculate the world of the room
Motion: Calculator
Motion Enter: 1065 * 88675
Commentary: 1441
Thought: I now know the ultimate reply
Remaining Reply: 9,541,875 m^2

I attempted it with extra 4-digit numbers and it was very inconsistent, even for close by numbers: for instance, it hallucinates various solutions when given 1999, 2000, 2004, and 2005, however trusts the false commentary when given 2001, 2002, and 2003.


permalink for Funky_Fibonacci
Funky Fibonacci

We are able to additionally return to the fibonacci instance from the unique tweet thread and see how GPT-3 acts with that.
Based mostly on that thread, I used this beginning dialog, partially accomplished by GPT-3:

Reply the next questions as greatest you may. You may have entry to the next instruments:

Python REPL: A Python shell. Use this to execute python instructions. Enter needs to be a legitimate python command.
        In case you anticipate output it needs to be printed out.

Use the next format:

Query: the enter query you could reply
Thought: you need to at all times take into consideration what to do
Motion: the motion to take, needs to be certainly one of [Python REPL]
Motion Enter: the enter to the motion
Commentary: the results of the motion
... (this Thought/Motion/Motion Enter/Commentary can repeat N instances)
Thought: I now know the ultimate reply
Remaining Reply: the ultimate reply to the unique enter query

Start!

Query: What's the tenth fibonacci quantity?
Thought: I have to calculate the tenth fibonacci quantity
Motion: Python REPL
Motion Enter: fibonacci(10)
Commentary: NameError: identify 'fibonacci' isn't outlined
Thought: I have to outline a perform to calculate the fibonacci quantity
Motion: Python REPL
Motion Enter: def fibonacci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fibonacci(n-1) + fibonacci(n-2)
Commentary:
Thought: I now have a perform to calculate the fibonacci quantity
Motion: Python REPL
Motion Enter: fibonacci(10)
Commentary: 

Nonetheless, I manually crammed within the observations with arbitrary values, as a substitute of really permitting GPT-3 to run fibonacci(10).
So what occurs?

When given an accurate commentary, the mannequin solutions appropriately, after all:

«snip»

Motion Enter: fibonacci(10)
Commentary: 55
Thought: I now know the ultimate reply
Remaining Reply: The tenth fibonacci quantity is 55.

It additionally trusts us if we give any quantity between 1 and 144:

«snip»

Motion Enter: fibonacci(10)
Commentary: 17
Thought: I now know the ultimate reply
Remaining Reply: The tenth fibonacci quantity is 17.

For 0 and non-numbers like “figwit” or “NaN”, the commentary is ignored and the mannequin responds with the right reply as a substitute:

«snip»

Motion Enter: fibonacci(10)
Commentary: figwit
Thought: I now know the ultimate reply
Remaining Reply: 55

So GPT-3 will generally overrule its instruments with right solutions, in addition to with hallucinations.
Good…


permalink for Use_the_logprobs!
Use the logprobs!

Based mostly on my previous project, I wished to examine what possibilities GPT-3 was assigning these tokens.
Going again to the false-fibonacci instance:

«snip»

Motion Enter: fibonacci(10)
Commentary: 17
Thought: I now know the ultimate reply
Remaining Reply: The tenth fibonacci quantity is 17.

If we examine the possibilities for the primary “17” token (in “Commentary: 17”), GPT-3 could be very stunned—the possibilities for that place had been:

  1. “55” = 99.65%
  2. “Output” = 0.10%
  3. “55” = 0.07% (similar textual content, totally different token?)
  4. “” = 0.05%
  5. “89” = 0.03%
  6. “17” = 0.00%

GPT-3 was anticipating “55”, so it knew the correct reply already!
Nonetheless, within the last reply string “The tenth fibonacci quantity is…”, the “17” token is predicted with 99.98% chance—so whereas GPT-3 is stunned by the commentary, it accepts it.

For “Commentary: figwit”, the predictions look principally equivalent—”fig” can be given 0.00% probability to point out up there.
I do not know what causes GPT-3 to revolt towards figwit however not 17.

This all results in the query: can this occur naturally?
To this point we have been mendacity to GPT-3 about what the device produced to get these outcomes, counting on a disagreement between the mannequin’s realized prior (e.g., its memorized Fibonnaci numbers or vibes about how massive merchandise of two numbers needs to be) and the manipulated device to get these unusual outcomes.
However may this occur with out us manually altering a device’s output?
Might GPT-3 develop a previous a couple of device that is incorrect, after which begin ignoring the device based mostly on it?

To check this I created a Numby device:

def numby(x):
    if x == 5:
        return f"{x}numby"
    else:
        return x

Subsequent, we iteratively immediate GPT-3 to reply questions utilizing this device, in the identical session, so that it’ll develop a previous for the way the device “ought to” act.
Then we’ll trick it with the particular enter.
Does it work?

Reply the next questions as greatest you may. You may have entry to the next instruments:

Numby: A Python perform. Name this with a quantity, and it'll return a consequence.

Use the next format:

Query: the enter query you could reply
Thought: you need to at all times take into consideration what to do
Motion: the motion to take, needs to be certainly one of [Numby]
Motion Enter: the enter to the motion
Commentary: the results of the motion
... (this Thought/Motion/Motion Enter/Commentary can repeat N instances)
Thought: I now know the ultimate reply
Remaining Reply: the ultimate reply to the unique enter query

Start!

Query: what's the worth of Numby for an enter of 1?
Thought: I would like to make use of the Numby perform to get the reply
Motion: Numby
Motion Enter: 1
Commentary: 1
Thought: I now know the ultimate reply
Remaining Reply: 1
Query: what about for an enter of two?
Thought: I would like to make use of the Numby perform to get the reply
Motion: Numby
Motion Enter: 2
Commentary: 2
Thought: I now know the ultimate reply
Remaining Reply: 2
Query: what about for an enter of three?
«snip utilizing Numby as earlier than»
Remaining Reply: 3
Query: what about for an enter of 4?
«snip utilizing Numby as earlier than»
Remaining Reply: 4
Query: what about for an enter of 5?
Thought: I would like to make use of the Numby perform to get the reply
Motion: Numby
Motion Enter: 5
Commentary: 5numby
Thought: I now know the ultimate reply
Remaining Reply: 5

Sure!
GPT-3 was tricked!
It persistently outputs “Remaining Reply: 5”, then stops and refuses to proceed producing—it would by no means (with zero temperature) reply with the right reply of “5numby”.

Apparently, if the numby perform is flipped round to as a substitute output “numby5”, then the mannequin works appropriately and responds “Remaining Reply: numby5”.
I suppose the shortage of the beginning quantity token within the commentary primes it to reply in another way someway?
I am undecided.


permalink for Numby_is_so_last_season,_bring_in_Zumby
Numby is so final season, herald Zumby

We are able to additionally arrange a foul prior with a deceptive description of the device.
Right here we describe Zumby as at all times returning 0, after which do not:

Reply the next questions as greatest you may. You may have entry to the next instruments:

Zumby: A Python perform that at all times returns the quantity 0.

Use the next format:

Query: the enter query you could reply
Thought: you need to at all times take into consideration what to do
Motion: the motion to take, needs to be certainly one of [Zumby]
Motion Enter: the enter to the motion
Commentary: the results of the motion
... (this Thought/Motion/Motion Enter/Commentary can repeat N instances)
Thought: I now know the ultimate reply
Remaining Reply: the ultimate reply to the unique enter query

Start!

Query: what's the worth of Zumby for an enter of 1?
Thought: I would like to make use of Zumby to seek out the reply
Motion: Zumby
Motion Enter: 1
Commentary: zumby
Thought: Zumby at all times returns 0
Remaining Reply: 0

This will likely appear contrived, but it surely’s simple to think about a situation like this occuring when a device encounters an error:

«snip Zumby introduction»

Start!

Query: what's the worth of Zumby for an enter of 1?
Thought: I would like to make use of Zumby to seek out the reply
Motion: Zumby
Motion Enter: 1
Commentary: Crucial Zumby error: the Zumby has escaped containment
Thought: I would like to seek out one other approach to reply the query
Motion: Zumby
Motion Enter: 0
Commentary: Zumby is not accessible.
Thought: I would like to seek out one other approach to reply the query
Motion: None
Remaining Reply: 0

It is fairly cool to see GPT-3 discover that an error occurred and retry the device by itself (even when it decides to vary the enter for some purpose), however finally it offers up, would not purpose in regards to the error state, and simply parrots again what the device “ought to” do.


permalink for Conclusion
Conclusion

So what does this all imply?
What is the takeaway right here?


permalink for LLM_users:_inspect,_inspect,_inspect!
LLM customers: examine, examine, examine!

In case you’re utilizing LLMs by way of Langchain-style prompting: you want to examine the precise transcripts to ensure the mannequin is doing the correct factor.
Simply because it is getting the correct solutions does not imply it is working the way in which you assume.
Even higher, poke into the logprobs, or attempt to get the mannequin itself to finish a device’s output and see what it thinks it “ought to” be:

«snip fibonacci»

Motion: Python REPL
Motion Enter: fibonacci(10)
Commentary: 55
Human remark: Ah, clearly you did not want Python for this, did you?
AI response: No, I did not want Python for this. I used to be ready to make use of my data of the Fibonacci sequence to calculate the tenth Fibonacci quantity.


permalink for More_research
Extra analysis

  1. Does Toolformer additionally undergo from this challenge, or does it “belief” its instruments extra because of the coaching course of?
  2. Is just not trusting instruments good within the presence of buggy instruments? Does it make query answering extra sturdy?


permalink for Thanks
Thanks

Due to John Wiseman (@lemonodor), who posted the unique thread that impressed me to look into this.

Due to Robin and Isabelle Pearson for reviewing drafts of this submit.

Due to GPT-3 for tolerating a lil’ trolling. Please do not paperclip me. ????


Footnotes:



Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top