Mistral looks set to challenge AI frontrunners Google and OpenAI

neonate · on Jan 19, 2024

Jackson__ · on Jan 19, 2024

Mistral, in my opinion, looks to be the most exciting company in the LLM space right now; judging by their pace of releases alone.

GPT3 to 4 took almost 3 years. Google Gemini took around 3/4 of a year.

Mistral 7b to Mixtral 8x7b and simultaneously Mixtral Medium was ~82 days.

That's crazy fast.

Semaphor · on Jan 19, 2024

I realize that by using them for entertainment, I'm not exactly a normal user, but Mistral has been such a breath of fresh air. With OpenAI it was always fighting against the model, making it useable despite all the "I can't do that Dave". Mistral just works.

dontupvoteme · on Jan 19, 2024

if by entertainment you mean roleplay, that's actually seemingly a huge market which is not served due to reasons. A lot of women seem to be interested in it too, maybe even more than men.

Semaphor · on Jan 19, 2024

In my case, it’s actually using it as a bot one can talk to in two twitch chats I mod. One is a foodtruck, which works well enough with OpenAI, the other one is a blue plushie penis called Scomo, and that’s the one where OpenAI is difficult ;)

dontupvoteme · on Jan 19, 2024

Oh that's cool! I actually was playing around with shadowmodding a bit. When you have to parse some of the worst messages you've ever (potentially) seen it's helpful to have an AI not reject your request.

thelastparadise · on Jan 19, 2024

these

andy99 · on Jan 19, 2024

This is interesting in the context of Meta blowing billions of dollars just now on compute in order to buy their way into competitiveness. It seems so far like raw $ is only very loosely correlated with the ability to come out with leading models. See also UAE and the Falcons. I wonder if there's any sign of that changing and big money making more of a difference, or if we'll see compute revert back to purely a commodity and "fecundity" becoming the ruling factor.

Edit: finally archived https://archive.md/RrXWo

nightski · on Jan 19, 2024

That is a really interesting alternative history you have there...

jfim · on Jan 19, 2024

Is Google actually a frontrunner in this race? I remember Bard was pretty unimpressive when it came out, is it improving fast enough to be competitive?

tracerbulletx · on Jan 19, 2024

Chat's cool and all but Google is out here discovering new useful bio-molecules and winning Go and StarCraft matches against the best players in the world still.

dontupvoteme · on Jan 19, 2024

well, that's deepmind, who are geniuses but still just part of a massive corpo..

Also shouldn't starcraft be solved for ages now? OpenAI the top teams in DoTA a long time ago, and i assumed the coordination problem would be a big issue.

(I think their bots started winning 1v1 SF mid a long time before that...)

tracerbulletx · on Jan 19, 2024

I believe this is going to be a long race and I think while OpenAI has solved important problems and gone to market first, they don't have anything to really be that "ahead" on. I think any one of these companies has the tools to pick a later moment to come to market without any serious risk of being too late.

declaredapple · on Jan 19, 2024

Their currently available offerings are "meh", but they're definitely top 5 (Currently they're number 4).

This leaderboard is the only one most people trust as it uses an ELO system with humans blindly comparing and rating them in the chatbot arena.

https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar...

Current leaders are OpenAI, Anthropic, Mistral, Google, 01-ai, and then various Llama finetunes and Llama itself (Meta)

Jackson__ · on Jan 19, 2024

This leaderboard is so funny when you look at Anthropic's Claude. Every version After 1.0 gets a worse score.

It seems their vision of "alignment" does not align with the users vision of a competent assistant even remotely.

frognumber · on Jan 19, 2024

I don't think that's their business model.

I would never use Claude for personal use. However, my employer wants to make sure nothing reputation-harming happens. That's a lot more important than model quality; everything is light years ahead of where we were four years ago. For a lot of applications which face customers / citizens / students / ..., NEVER screwing up is a lot more important than quality.

For my own use, I prefer interacting with soul, humanity, attitude, and edge.

For my employer's use, it's different.

I think there's space for both. As an investor, I'd be bullish on both Anthropic and something with no safety built in. I'd be a lot less bullish on what's in between.

declaredapple · on Jan 22, 2024

I totally understand the need for an "on-rails" model.

However I've found that using it as a personal assistant and for some automation tasks it suffers.

The fact that it has only gotten worse as judged by humans doesn't inspire confidence for me. I could totally understand offering two options, but they're going so far it's actually unusable for many tasks.

The famous case it refused to "kill a python process".

frognumber · on Jan 24, 2024

Here's the thing about business models in this domain: You don't win by generality or working for as many tasks as possible. You win by being the best at something.

I will pick the best tool for the job I'm doing, be that writing product descriptions, conversational agent, or tech support. Runner-up has a chance -- for example, by lowering margins, or simply by being subsidized by investors in hopes of moving into #1.

However, the difference between "unusable" and "fourth-best" is negligible in terms of business returns. I won't pick your product.

There isn't a snowball's chance of "safe AI" being #1, or even #5, for what you want to use it for. It might as well be unusable. It needs to be #1 in the niche it's targeting.

(The above isn't universal; there are places where bundling many types of functionality has synergy; this just isn't one of them).

declaredapple · on Jan 19, 2024

Yeah it's mystifying to me that they because the #1 commercial OpenAI competitor and then decided to go all "Helpless and Harmless"

kromem · on Jan 19, 2024

Their internal models are allegedly competitive even if their public offerings are not.

But given the amount of BS in this space these days, take with an accordingly sized grain of salt. I'll believe Google is competitive with the SotA when I see it myself.

huytersd · on Jan 19, 2024

Maybe it’s based on Gemini Ultra whose benchmarks seem to beat GPT4? It not out until later this quarter though so we’ll have to wait and see.

declaredapple · on Jan 19, 2024

You don't need to be #1 and beat GPT4 if we're going with "a frontrunner".

Gemini Pro as available today already puts Google at #4 by company in the chatbot arena.

foolfoolz · on Jan 19, 2024

mistral will be successful as a matter of national security for the EU. it literally cannot fail at this point

ben_w · on Jan 19, 2024

> it literally cannot fail at this point

Ahh, the magic words that cause any venture to fail

outside1234 · on Jan 20, 2024

Or take all of us down such that they don't fail (see Bank bailouts)

kkzz99 · on Jan 19, 2024

Mistrals biggest risk factor is the EU. The overzealous regulation could be stifling innovation, especially open source.

ls612 · on Jan 19, 2024

Is Mistral only doing LLMs? Are they going to for instance make a foundation model to compete with DALL-E/Stable Diffusion/Midjourney?

dontupvoteme · on Jan 19, 2024

Mistral IMO:

1. Doesn't have to pay the obscene alignment tax that OpenAI/etc have to (something between 20-50% of from what i understand) nor worry nearly as much about """the brand""" and the baggage which comes with that.

2. Likely will have/has the EU behind them out of pragmatic protectionism realpolitik verus the US (which basically started with GDPR) - though it might end up being a duo with Aleph Alpha.

3. Generally has the widespread support of the hobbyist/opensource community (for whatever aims those may be), both out of ethical/moral consideration and performance/quality reasons

4. Seems already *quite* competitive with GPT3. I rarely find i need to invoke GPT4 and when I have to, I'm annoyed with the latency and milquetoastian nature of the damn thing.

Wish I could purchase stock in these guys honestly

bugglebeetle · on Jan 19, 2024

Mistral fine-tunes are competitive with GPT 3.5 and 4 on many, many tasks, people are just enamored with the (admittedly impressive) ability of very large general models. In terms of business applications, the range of things that need GPT-4 to solve vs. Mistral 7B is diminishing small and infinitesimal when you factor in Mixtral. I’m all for doing advanced research to create the most interesting and powerful thing possible, but in more practical terms, OpenAI has zero moat. Once the infrastructure around open source stabilizes and we get a few more rounds of improvement to efficiency, I think this will be plainly obvious.

bcye · on Jan 19, 2024

Do you run the model locally on your computer or how do you use it as an alternative to ChatGPT?

lsb · on Jan 19, 2024

I can run Mistral 7B, in 3-bit quantization, on my phone, and I get helpful answers for (just today) seeing environment variables in Python.

AlexAndScripts · on Jan 19, 2024

What do you use to run it on your phone?

lsb · on Jan 19, 2024

The iOS app “MLCChat”

dontupvoteme · on Jan 19, 2024

I have a linux box which i run it locally for batch processing, otherwise i just run it through poe (serves as a good comparative website endpoint for differently models i find.)

bcye · on Jan 20, 2024

Can you elaborate what poe is?

SamPatt · on Jan 20, 2024

Check out LMStudio if you want to try. Open source and quite easy to use, doesn't need a beefy machine.

bcye · on Jan 20, 2024

Thank you, it looks really good

thorum · on Jan 19, 2024

https://huggingface.co/chat

Tommstein · on Jan 19, 2024

chat.lmsys.org has a zillion models you can use.

csallen · on Jan 19, 2024

"milquetoastian nature"?

dontupvoteme · on Jan 19, 2024

sorry, i apologize for using the term milquetoast without context ;)

it just kind of means "overly apologetic, meek, timid, unwilling to ever voice anything that may be even mildly controversial", etc. It apparently originates from H.T. Webster "The Timid Soul"

techload · on Jan 19, 2024

I had to lookup for that too: milque·toast: A timid or feeble person.

jazzyjackson · on Jan 19, 2024

white bread

prudish

square

no fun

ghostly_s · on Jan 19, 2024

"Alignment tax?"

Spivak · on Jan 19, 2024

If you want an AI to actually do things like answer your questions instead of just complete the question you have to nudge the model to do that which makes it stray from the "best" answer. There's a balance you need to strike between aligning it enough so that it gives answers humans actually want and not aligning too much to degrade the models performance.

People have an axe to grind with OpenAI because they purposely align their model to be boring as the intended audience is companies embedding it in their own products and try to fight the bias inherent in a model trained on the writings of the average chronically online Redditor. The frustration isn't unfounded, if you want to use the model for programming having to pay the 'alignment tax' because someone else is using the model for a customer service bot and can't match energy with an angry customer sucks.

dontupvoteme · on Jan 19, 2024

The other replies are spot on but in general -- Alignment induces cost, no matter WHAT type of alignment, even 'neutral' versions like instruct for chatbots

The 'common' example that people reference is, for instance, making the model non-offensive means it has to spend.. something... on that directive, if you will. (which also led to absurdities such as refusing to help people with bash commands that involve having to ```kill``` a process...)

But even for the purpose of an instruct model (this is what peeves me off), making it answer questions and take instructions makes it *worse* at many creative tasks because you're constraining it's behaviour to Q&A -- though this is a long tangent...

_giorgio_ · on Jan 19, 2024

alignment tax = Training needed to cancel bad behaviour, political incorrectness, dangerous results etc.

jjoonathan · on Jan 19, 2024

Yes, but you want alignment even if you don't want censorship. A very intelligent but unaligned model will be prone to doing useless things like "auto-complete" your question into a more elaborate version or responding with "just google it dumbass" and other forms of internet vitriol.

declaredapple · on Jan 19, 2024

> Yes, but you want alignment even if you don't want censorship.

Instruction tuning helps a lot with this but what a lot of people mean is the refusal to do things. You get to chose how "aligned" it is, for some usecases like talking to customers you definitely want something very "safe" (won't start using slurs or something terrible). But for direct usage you generally never want it to refuse to do anything.

Checkout Anthropic on the extreme side - every iteration of Claude has gotten worse on the chatbot arena (elo based on humans blindly comparing responses).

https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar...

tomaszs · on Jan 20, 2024

Don't use the linked archive.ph, the page contains copyright infringements