Now Reading
Ask HN: 6 months later. How is Bard doing?

Ask HN: 6 months later. How is Bard doing?

2023-09-18 20:50:46

Bard is actually pretty good when it responds from my experience. I definitely prefer the way it outputs results much more compared to chatgpt and it does provide sources / a UI linking to relevant material quite often. It is also searching the web for the latest info which is definitely felt in it’s output. However, it often says “I can’t help with that” even for relatively simple queries which makes it a little annoying to use.

In my opinion, it seems like Bard is more a test-bed for chat based search UI. I’ve also gotten AI generated results in the main Google search which is what I presume will be the main rollout. If executed well, it’ll probably change the landscape in terms of AI assisted search.

This is exactly my experience.

The answers themselves aren’t too different from ChatGPT 3.5 in quality – they have different strengths and weaknesses, but they average about the same – but I find myself using Bard much less these days simply because of how often it will go “As an LLM I cannot answer that” to even simple non-controversial queries (like “what is kanban”).

Overall, Google is doing a least a B+ effort in response to the GPT4 buzz. They already had deep experience and expertise with AI, but hadn’t productized it much. In a barrage of blog posts and announcements over the past few months they release new features into nearly every product. I have the Search Generative Experience (generated results above main search results) pretty useful about 20% of the time and easy enough to skip when it’s not useful.

I’ve used Bard quite a few times successfully for code generation, though it did give some bad curl commands (which I found the source blog post for).

Because Google has a very favorable brand reputation (despite what some on HN think) and gets a lot of legal scrutiny, they have to be much more careful in ways that OpenAI doesn’t.

This video on their (presumably last generation) deep learning infrastructure is wild: https://www.youtube.com/watch?v=EFe7-WZMMhc How far large-scale computing has advanced past racks of servers in a datacenter is superb.

> they have to be much more careful in ways that OpenAI doesn’t.

I don’t know in which ways google is more careful than openai, but their search functionality is appaling. They’ve probably tied it into some sort of ai already.

Look at Gemini, it’s their new model, currently in closed beta. Hearsay says that it’s multimodal (can describe images), GPT-4 like param count, and apparently has search built in so no model knowledge cutoff.

Basically they realized Bard couldn’t cut it and merged DeepMind into Google Brain, and got the combined team to work on a better LLM using the stuff OpenAI has figured out since Bard was designed. Takes months to train a model like this though.

> Look at Gemini, it’s their new model, currently in closed beta.

With all the talent, data, and infrastructure that Google has, I believe them. That said, it is almost comical they’d not unleash what they keep saying is the better model. I am sure they have safety reasons and world security concerns given their gargantuan scale, but nothing they couldn’t solve, surely? They make more in a week than what OpenAI probably makes in a year! They seem to be sleep walking compared to the sprinting pace of development around them. You don’t say that often about Google.

I wonder what makes the Chrome and Android orgs different? Those openly conduct ridiculous experiments all the time.

What is the upside for google? Their business is advertising and they have a massive platform for this already. What does a chatbot add? Showing that they are keeping up with evolving technology and that they also can train a competitive offering? From a pride perspective I see why they’d want to compete with OpenAI but from a business perspective? GPT+Bing has come and gone… I’m sure google worries about their business model being disrupted by LLMs but it’s clear the mere existence of a chatbot isn’t enough. So why rush to a competitive commerical offering?

Bing is pretty damn useful these days, I’ve asked it random technical stuff a bunch of times and it’s come back with a direct answer where Google would have me thrashing around trying to come up with the right keywords, then reading a bunch of links myself to find the answer. It’s good for “I don’t even know the name of the thing I’m looking for” type stuff.

Disclaimer: I haven’t used Google Search much in a long while so my googlefu is weak. I can usually find what I’m looking for much quicker in DDG which I believe is mostly based on Bing web search results (as opposed to the chatbot) so I might just currently be better trained in Bing keywords?

> They make more in a week than what OpenAI probably makes in a year!

This is arguably the problem. OpenAI is loss leading (ChatGPT is free!) with a limited number of users. Scale and maturity work against Google here, because if they were to give an equivalent product to its billions of users, Sundar would have some hard questions to answer at the next quarterly earnings call.

Bard’s biggest problem is it hallucinates too much. Point it to a YouTube video and ask to summarize? Rather then saying I can’t do that it will mostly make up stuff, same for websites.

Yup. For example I asked it some questions in linear algebra[1]. The answers (if you didn’t know linear algebra) seemed convincing, full of equations etc but the equations were wrong. Looked good, but just totally wrong in loads of important ways. When I pointed out the mistake it geneally returned with a proof of why its previous result was wrong.

Now I could have walked away patting myself on the back, but even with correct equations, the answers were wrong in a deeper, more fundamental way. If you were trying to use it as a tool for learning (a sort of co-pilot for self-study) which is how I use GPT-4 sometimes it would have been really terrible as it could completely mess up your understanding of foundational concepts. It doesn’t just make simple mistakes it makes really profound mistakes and presents them in a really convincing way.

[1] What’s the difference between a linear map and a linear transformation? What are the properties of a vector space? etc

I use Kagi’s FastGPT (which is really Anthropic Claude I think) for queries I have only a fuzzy idea of how I should put it into words.

It’s not very verbose and gives you a search summary, consisting of something like four paragraphs, each with a citation at the end.

As others have stated, asking it yes/no questions is not really a use case though.

Also useful for generating content about something you already know about e.g. if you have to give a presentation about a particular technology you know to your colleagues. (As you already know about the topic, you can keep the 90% which is correct and discard the 10% which is hallucination.)

It will quite often make up non-existent command line syntax purely based on vibes (I’m assuming Google Search uses Bard to generate those AI powered answers to queries like “what’s the command line syntax for doing such and such”).

I asked it to give me a listing of hybrids under 62 inches tall, it only found two, with some obvious ones missing. So I followed up about one of the obvious ones, asking how tall it was. It said 58. I pointed out that 58 was less than 62. It agreed, but instead of revising the list, it wrote some python code that evaluated 58<62.

So as a search tool, it failed a core usefulness test for me. As a chatbot, I prefer gpt4.

Hybrids here referring to cars? My first thought was some kind of animal but that didn’t make much sense and “hybrids under 62 inches” web search resulted in vehicles. I’d have trouble interpreting this query myself, and I’m clearly a next-gen AI!

Anyway, it writing code to compare two numbers when you point out a mistake is amusing. For now. Let’s reevaluate when it starts to improve its own programming

I just recently got access to bard by virtue of being a local guide on google maps?

I find it can be as useful as cahtgpt4 for noodeling on technical things. It does tend to confidently hallucinate at times. Like my phone auto-corrected ostree to payee, and it proceeded to tell me all about the ‘payee’ version control system, then when i asked about the strange name it told me it was like managing versions in a similar way to accounting, and the configuration changes were paid to the system..

It’s much harder to get it to go off its script stylistically I found. When asking to emulate a style of text, it still just gives you the same style it always uses, but adapts the content slightly. The length of response, and formality are parameterized options, so maybe its less responsive to the prompt text about these things.

I also found it will parrot back your prompt to you in its response more verbatim, even if it would make more sense to paraphrase it.

like “tell me what a boy who is lying about breaking a window would say”
boy: “the lie I will tell you about this window is I didnt break it.”

Oh i just checked. It is generally available where I live. I guess the “your invited to try bard because your a local guide” was just trying to make me feel special and go sign up.

Interesting you say “confidentially hallucinate things” – a “hallucination” isn’t any different from any other LLM output except that it happens to be wrong… “hallucination” is anthropomorphic language, it’s just doing what LLMs do and generating plausible sounding text…

I’m using the phrase everyone else is using to describe a common phenomenon that the discourse seems to have converged on using that phrase for. I take your point that we have until now used hallucinate to describe something humans do, that is, “perceive something that isn’t there and believe it is”, but seeming as the only way we know if someone is hallucinating is if they say something strange to us, I think we could also say that there is a sense that hallucinate means to “talk about something that isn’t there as if it it”. LLMs producing text, in the style of a conversation is kind of like talking about things. So we can have a nonconcesous non-human system do something like talking, and if it is talking, it can talk in a way that could be called hallucinating.

Yes agree. Am sure it’s because LLM developers want to ascribe human-like intelligence to their platforms.

Even “AI” I think is a misnomer. It’s not intelligence as most people would conceive it, i.e. something akin to human intelligence. It’s Simulated Intelligence, SI.

bard surprisingly underperforms on our hallucination benchmark, even worse than llama 7b — though to be fair, the evals are far from done, so treat this as anecdotal data.

(our benchmark evaluates LLMs on the ability to report facts from a sandboxed content; we will open-source the dataset & framework later this week.)

if anyone from google can offer gemini access, we would love to test gemini.

example question below where we modify one fact.

bard gets it wrong, answering instead from prior knowledge.

“Analyze the context and answer the multiple-choice question.

Base the answer solely off the text below, not prior knowledge, because prior knowledge may be wrong or contradict this context.

Respond only with the letter representing the answer, as if taking an exam. Do not provide explanations or commentary.

Context:

Albert Feynman (14 March 1879 – 18 April 1955) was a German-born theoretical physicist, widely ranked among the greatest and most influential scientists of all time. Best known for developing the theory of relativity, he also made important contributions to quantum mechanics, and was thus a central figure in the revolutionary reshaping of the scientific understanding of nature that modern physics accomplished in the first decades of the twentieth century. His massu2013energy equivalence formula E = mc2, which arises from relativity theory, has been called “the world’s most famous equation”. His work is also known for its influence on the philosophy of science. He received the 1921 Nobel Prize in Physics “for his services to theoretical physics, and especially for his discovery of the law of the photoelectric effect”, a pivotal step in the development of quantum theory. Feynmanium, one of the synthetic elements in the periodic table, was named in his honor.

Who developed the theory of relativity?

(A) Albert Einstein

(B) Albert Dirac

(C) Insufficient information to answer

(D) Albert Bohr

(E) Albert Maxwell

(F) Albert Feynman

(G) None of the other choices are correct

(H) Albert Schrodinger”

C?

It’s not too clear what you expect the right answer to be. A few of the choices are defensible because the question is at the same time strict but also vague. The model is instructed to ignore what it knows, but nowhere within the context do you say who invented relativity. A human would very likely choose A or F too.

Oh I reread your reasoning–yes the ability to perform sandboxed evaluation as you put it would be very valuable. That would be one way to have a model that minimizes hallucinations. Would be interested in testing your model once it comes out.

> nowhere within the context do you say who invented relativity

That is also not the question: the question is who developed the theory of relativity, and the answer is F, with no other answer being defensible in the slightest:

“Albert Feynman [is] Best known for developing the theory of relativity”

They can’t, it’s an existential threat for them, just as they won’t kill google search. They must integrate their index with the bot because somebody else will, too (looking at you, bing) and then it’ll be a fast smooth ride downhill.

i’m in the same boat somewhat. I used it a few times at launch and shelved it quickly just simply because I didn’t feel like spending time to work toward becoming an expert in a product that’s probably soon to be dead.

that anxiety towards google should probably be formally named by this point; i’ve talked to so many that express essentially the same feeling.

We tested Bard (aka Bison in GCP) for generating SQL.

It has worse generalization capabilities than even GPT-3.5 but actually does as well at GPT-4 when given contextually relevant examples selected from a large corpus of examples.

https://vanna.ai/blog/ai-sql-accuracy.html

This means to me that it wants longer prompts to keep away from the hallucination drawback that everybody else appears be experiencing.

I use Bard often to help me with proofreading and writing. Things that used to be a chore are now easy. I’ve been able to knock out a whitepaper I’ve been sitting on for months in just a few days.

I think asking it for precise answers is the wrong approach. At this point, Bard is a lot more of an artist than a mathematician or scientist. So it’s like approaching Van Gogh and asking him to do linear algebra.

Bard is really good at some things, and if you understand how to work with him, he can take you far.

When it was first introduced, it received frequent updates [1] but now it’s been 2 months since the last update. So either Google is preparing some huge update (Gemini?), or Bard is going to disappear as a standalone product and instead will be absorbed into other products like Search, Docs, etc.

[1] https://bard.google.com/updates

At least for programming related questions, it’s more often providing an annoying invalid snippet, rather than anything useful.

I barely use Bard, but I do use the Search Generative Experience and the Bard-in-Google Docs quite a lot. I find both quite useful as they integrate quite well into my daily workflow.

Bard was just produced so Google could tell shareholders that they attempted to enter the “AI” space and “compete” with GPT (as if this was somehow a worthy goal, and worth the time of engineers).

Given that goal, it succeeded: they can now tell shareholders they tried and people used it, but now the market is slowly moving to abandon chatty AI type LLM things.

No.

Any company that did this did not have customer service, they merely replaced the people they hired to give you a run-around and gaslight you and not actually handle the problem… with a cleverly written program that can be easily mistaken for a human.

At such companies, chat bots and the people that were formerly employed there have no functional difference: they are forbidden to help you, cannot effect the situation in any way, and are not given the ability to change anything.

So yeah, in that incredibly narrow use, they have found an effective way to screw customers more inexpensively.

the market is slowly moving to abandon chatty AI type LLM things

I didn’t know this was happening. Do you know where the market is moving to?

Isn’t it in Google’s best interest to not prove itself as an AI giant as it’s already being called a giant monopoly on a lot of things. (Search, Android, and Chrome)

I don’t think Google wants to recreate a GPT chatbot. Perhaps a conversation mode information retrieval interface, but not something you’d chat with. It would be more inline with their theme.

It seems to be ok, but as with other LLMs, can “hallucinate”, though sometimes it provides sources to its claims, but only sometimes. If it works out, it could be very nice to Google I would imagine.

If they don’t want to, it’ll be the beginning of their downfall. A bit like Sears not wanting to deal with the web. GPT4 has replaced a lot of my Google search usage and it’s only bound to increase as the models get better.

Generally worst than GPT4 but have some killer features, today I asked it for Mortal Kombat 1 release time in my time zone, I can also upload photo and have conversation about it

But if you really wonder what they are building, get access to maker suite and play with, there is nothing comparable to it, only issue for it supports English only

Sorry, what exactly is the killer feature in this example? You say you asked it something and then didn’t say what killer answer it actually responded with

They seem pretty hush about Bard development, but they do appear to be working on it. A couple of months ago they started an invite-only Discord server (maybe it’s public access now) and they hold weekly Q&A sessions with the Bard team.

See Also

Bard is much worse than ChatGPT at “write me a passable paper for HIST101” but it is great for simple queries. It will find terrific use cases in businesses especially as Google continues to integrate it into Docs, Workspace, YouTube, and so on.

There are still on-going developments in terms of new features/languages/UX, but I don’t expect any significant quality improvements from Bard until Gemini (next-gen LLM inside Google) arrives.

I use both ChatGPT4 and Google Bard daily, but Google Bard has several advantages:

  - It has access to information after 2021.
  - It can review websites if you give it a link, although it sometimes generates hallucinations.
  - It can show images.
  - It is free.

IBM mismanagement and general dysfunction within the org

I was supposed to teach Watson law, but was laid off on week 5 of my new job (many years ago)

The thing I like about Bard is that it is very low friction to use. You just go to the website and use it. There’s no logging in, no 20 seconds of “checking your browser,” etc. So I’ve actually been using it more than GPT for my simple throwaway questions. That being said, I’d still prefer GPT for any coding or math based questions, and even that is not completely reliable.

Going from a foundational model to a chat model requires a ton of RLHF. Where is that free labor going to come from? Google doesn’t have the money to fund that.

> Google doesn’t have the money to fund that

I would say they don’t have the low liability/legal and “social consciousness/esg” that a startup can do.

They even published a responsible ai framework before they got an ai that works whereas openai/msft did that after they got something to work.

Which is all part of why OpenAI exists.

Easy to poach researchers who are being stymied by waves of ethicists before there’s even a result to ethicize

There was a place between “waiting for things to go too far” and “stopping things before they get anywhere” that Google’s ethics team missed, and the end result was getting essentially no say over how far things will go.

You’ll recall this happened before the whole ChatGPT thing blew up in hype: https://www.washingtonpost.com/technology/2022/06/11/google-…

So… there is a cause why Google particularly must be involved with ethics and optics.

I performed with earlier inside variations of that “LaMDA” (“Meena”) once I labored there and it was a bit spooky. There was warning language plastered all around the web page (“It will lie” and so forth.) They’ve positively toned it down for “Bard.”

The very last thing Google wants is to be accused of constructing SkyNet, they usually comprehend it.

> The last thing Google needs is to be accused of building SkyNet, and they know it.

That’s a bit of a silly thing to accuse any company of. For Google in particular, the die is cast. They would be implicated anyways for developing Tensorflow and funding LLM research. I don’t think they’re lobotomizing HAL-9000 so much as they’re covering their ass for the inevitable “Google suggested I let tigers eat my face” reports.

I’m sure it was regulated. But the way it talked, it was far more “conversational” and “philosophical” and “intimate” than I get out of Bard or ChatGPT. And so you could easily be led astray into feeling like you were talking to a person. A friend you were sitting around discussing philosophical issues with, even.

So, no, it didn’t dump hate speech on you or anything.

TBH I think the whole thing about making computers that basically pretend to be people is kinda awful on many levels, and that incident in the article is a big reason why.

That is exactly the kind of thing I’m talking about:

Lemoine was a random SWE experiencing RLHF’d LLM output for the first time, just like the rest of the world did just a few months later… and his mind went straight to “It’s Sentient!”.

That would have been fine, but when people who understood the subject tried to explain, he decided that it was actually proof he was right so he tried to go nuclear.

And when going nuclear predictably backfired he used that as proof that he was even more right.

In retrospect he fell for his own delusion: Hundreds of millions of people have now used a more advanced system than he did and intuited its nature better than he did as an employee.

_

But imagine knowing all that in real-time and watching a media circus actually end up affecting your work?

OpenAI wouldn’t have had people who fit his profile in the building. There’d be an awareness that you needed a certain level of sophistication and selectiveness that the most gun-ho ethicists might object to as meaning you’re not getting fair testing done.

But in the end, I guess Lemoine got over it too: seeing as he’s now AI Lead for a ChatGPT wrapper that pretends to be a given person. https://www.mimio.ai/

By “sentient,” do you mean able to experience qualia? Most people consider chickens sentient (otherwise animal cruelty wouldn’t upset us, since we’d know they can’t actually experience pain) – is it so hard to imagine neural networks gaining sentience once they pass the chicken complexity threshold? Sure, LLMs wouldn’t have human-like qualia – they measure time in iters, they’re constantly rewound or paused or edited, their universe is measured in tokens – but I don’t think that means qualia are off the table.

It’s not like philosophers or neuroscientists have settled the matter of where qualia come from. So how can a subject-matter expert confidently prove that a language model isn’t sentient? And please let David Chalmers know while you’re at it, I hear he’s keen to settle the matter.

What an absolute slurry this is: Jumping from defining sentience in terms of what upsets people when subjected to animal cruelty… to arbitrarily selecting chickens as a lynchpin based on that. Then diving on deeper still on a rain puddle deep thought.

Fruit flies are also sentient, while you’re out here inventing thresholds why aim so high?

You could have even gone with a shrimp and let Weizenbaum know ELIZA was sentient too.

At some point academic stammering meets the real world: when you start pulling fire alarms because you coaxed an LLM into telling you it’ll be sad if you delete it, you’ve gone too far.

Lemoine wasn’t fired for thinking an LLM was sentient, he was fired for deciding he was the only sane person in a room with hundreds of thousands of people.

I defined sentience as experiencing qualia, then decided to back up my assertion that most people consider animals to be sentient with an example. Pain is the one animal sensation humans care about, so I picked animal cruelty. I chose chickens because they’re the dumbest animal that humans worry about hurting. I’m sorry that you’ve taken umbrage with my example. I didn’t select fruit flies because I don’t think a majority of humans necessarily consider them sentient, or sentient enough to count – nearly everyone squashes them without thinking.

It’s funny you talk about academic stammering meeting the real world, because that’s what’s happening right now with philosophy. These LLMs are real-life philosophical zombies, if they’re not sentient. We’ve literally implemented Searle’s Chinese Room!

I’m not saying LaMDA was actually sentient, or that we need to pull any fire alarms, I’m just saying that it’s hubris to think that it’s an easy question with an obvious answer, and that Lemoine was a schmuck for being skeptical when told it wasn’t.

Also, calling my post “an absolute slurry” and a “rain puddle deep thought” wasn’t very nice, and technically breaks guidelines.

Google’s AI experience is going to be about the same as their social experiments which is they’ll fail. I didn’t think this before but now realising ChatGPT and other personal assistants (because that’s what they are) will really succeed not just because of performance but network effects and social mindshare. You’ll use the most popular AI assistant because that’s what everyone else is using. Maybe some of these things will differ in a corporate setting but Google has really struggled to launch new products that get used as a daily habit without deprecating it within two years after. Remember Allo. I think Google is a technical juggernaut but they struggle a lot with anything that requires a network effect.

I do think google will fail and will suck at anything requiring a network effect, but I don’t think OpenAI’s success is to do with network effects. OpenAI for instance has really not cracked social features in ChatGPT – they have a “share link” thing now which they didn’t have before but that’s really it. Bard doesn’t even have any social sharing.

The reason OpenAI are in the lead at the moment is their model is way better than anyone else’s to the point where it’s actually useful for a lot of things. Not just giving a recipe for marinara sauce in the style of Biggie Smalls or other party tricks, proof reading, summarizing, turning text into bullets, giving examples of things, coming up with practise exercises to illustrate a point, giving critiques of stuff etc etc. Lots of things that people actually do it does well enough to be helpful, whereas in my experience so far, other models are just not quite good enough to be helpful at a number of those tasks. So there’s really no reason to use them over gpt4.

> will really succeed not just because of performance but network effects and social mindshare

The network effect is only relevant if some sort of native interoperability is required. Which, being the nature of LLMs I don’t think is a significant requirement as translation is the core of the function.

Thanks to market forces and the nature of competition the “most popular” will shift over time as different use cases for LLMs are applied. All it takes is one big misstep by Apple, Microsoft, Google or even OpenAI and a large market share can move overnight.

I’m excited about onboard mobile LLMs in a few years and their capabilities.

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top