Now Reading
Bard is far worse at puzzle fixing than ChatGPT

Bard is far worse at puzzle fixing than ChatGPT

2023-03-21 23:33:14

Google launched Bard today, the corporate’s competitor to ChatGPT. Earlier this week, we wrote about ChatGPT’s (spectacular) potential to unravel Twofer Goofers.

TLDR

  • GPT-4 solves Twofer Goofers at a 96% charge
  • People resolve at an 82% charge
  • Bard solves at, primarily, a 0% charge

Wait, what’s a Twofer Goofer?

Twofer Goofers are day by day pairs of rhyming phrases described by a roundabout immediate. Gamers use the immediate and a sequence of clues to unravel the puzzle and are rewarded with a chunk of customized artwork. At this level, greater than 12,000 human customers have cumulatively solved the 240 distinct puzzles greater than 100,000 cumulative instances.

Here is an instance of a solved Twofer Goofer:
example

Again to the chatbots

In fancy phrases: now we have a proprietary dataset of human puzzle-solving information in opposition to which we are able to check these AI instruments.

In regular phrases: it is enjoyable to see if the robots can determine the inventive non-linear considering required to unravel rhyme-based riddles. It is significantly enjoyable as a result of the robots do not really perceive what rhyming is.

The outcomes from final week’s check (full weblog put up here):

  • Human customers resolve about 82% of Twofer Goofers, utilizing a median of 1.6 clues
  • GPT-4 is a lot better than people, fixing 96% of the puzzles and needing solely 0.9 clues
  • GPT-3.5 is spectacular, however worse than people at a 72% resolve charge with 2.0 clues per puzzle

Results chart

I used to be thrilled to toss Bard into the fray after getting access to the open beta as we speak. Nonetheless, the outcomes had been shockingly disappointing.

Bard was not capable of resolve a single Twofer Goofer when given the immediate. It was shut in a pair cases, however finally unsuccessful.

Here is Bard’s try on the first 20 Twofer Goofers:
bard1

Even with out seeing the prompts, you’ll be able to inform these are incorrect guesses as a result of they don’t seem to be pairs of rhyming phrases. In the end, Bard’s first try in any respect 100 puzzles was a failure.

On a handful of Twofers, I ran by the total gauntlet of clues, however Bard nonetheless failed. Here is one of many best puzzles (as evidenced by a person resolve charge of 97%):
Easy Twofer

See Also

GPT-4 and GPT-3.5 solved this puzzle instantly. Here is Bard’s try(s):
bardcactus

However these are laborious puzzles to unravel, even for people!

Certainly, however the idea of rhyming is not too tough for people. (Although Twofer Goofer HQ’s adherence to strict “good” rhyme might be tough for these slant rhyme-inclined.) Regardless, Bard’s understanding of rhyming is meaningfully behind ChatGPT, as evidenced by this hastily-conceived check.

rhyme

What does this all imply?

Not a lot! However we have seen many “empirical” checks quoted to show the standard of a given AI mannequin (LSAT scores, and many others). Nonetheless, everyone knows that people thrive at creativity, non-linear considering, and conceptual understanding (like what rhymes are!). Assessments on information like Twofer Goofer resolve charges are a beneficial solution to assess the true progress of those instruments.

Extra to come back! Ship any suggestions or complaints or musings to [email protected]

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top