More

liuliu · 2026-05-07T18:04:29 1778177069

DSv4 generates much faster on NVIDIA class hardware. It is just a very efficient model.

liuliu · 2026-05-07T17:30:18 1778175018

I am not sure where this comment is from (possibly without looking at this project?). This project is running quasi-frontier model at reasonable tps (~30) with reasonable prefill performance (~500tps) with a high-end laptop. People simply project what they see from this project to what you optimistically can expect.

You can argue whether the projection is too optimistic or not, but this project definitely made me a little bit optimistic on that end.

liuliu · 2026-05-01T16:18:08 1777652288

I am actually getting interested in QAT these days, especially for LSQ+ type, but it doesn't seem like people have done that enough in open-source world at least, for 2-bit / 3-bit OPD with LSQ+ basically.

yinksta · 2026-05-01T16:39:47 1777653587

the industry has largely moved away from QAT because the hardware required for running a quantized model are an order of magnitude less than training/QATing the fp model.

That's why things like Autoround, GPTQ, AWQ have been so popular, you don't even need enough hardware to run the original model on gpu, just cpu is enough due to the data efficiency

liuliu · 2026-05-01T18:57:47 1777661867

Thanks. I think it is a good explanation, but also suggests a gap. QAT to me, if done right, is the only way to recover performance for extreme quantization regime. The only thing matters of course, if whether it can work. My confidence in QAT comes from the LoRA can recover most quality misses in quantization, and that is still different from QAT for extreme quantization, so it could be very wrong. I need to try it anyway.

liuliu · 2026-04-29T17:30:19 1777483819

The competition is on DeepSeek v4 Flash for similar size / deployment target.

simjnd · 2026-04-29T17:55:40 1777485340

DeepSeek v4 Flash is still over 100GB at Q4 IIRC, and Q4 has generally been the sweet spot. Although it's an MoE so it might run a lot faster that this dense Mistral model if you have the RAM.

pbgcp2026 · 2026-04-30T05:58:01 1777528681

"Q4 has generally been the sweet spot" for self-hosting, yes. For any real meaningful work it's dumb AF. The only way to get reasonable intelligence from mid-size Gemma or Qwen is to run full precision BF16. Anything else is just an emulation of AI.

EntityDeletr · 2026-04-30T16:33:47 1777566827

I would disagree. I have 8 GB of VRAM and 32 GB of RAM. I can either run a 4B BF16 dense model fully on GPU at around 30 t/s or Qwen3.6 35B A3B Q5_K_M at 20 t/s with GPU offload. Which one would I choose?

liuliu · 2026-04-28T17:55:32 1777398932

> but transformers are not AGI, and they will never be AGI

Like the claim "transformers are AGI", this needs proof, otherwise should be prefixed "I think". And honestly, positive proof is easier than negative proof (you just need to make one transformer model that is a AGI, whereas the never claim requires you to enumerated all possibilities).

gslepak · 2026-04-28T17:59:40 1777399180

That's like saying we should wait for positive proof of AGI from combustion engines. That'll never happen, no matter how much you tweak the engine. It's just not possible.

The negative proof is there in the definition itself. Transformers are not AGI, they're frozen human intelligence of the autocomplete variety. That can never be AGI and anyone who says otherwise doesn't understand transformers or AGI.

xscott · 2026-04-28T20:30:54 1777408254

This kind of proof isn't really as water tight as you claim. It's a lot like saying state machines are limited to processing regular expressions, and then completely ignoring how easy it is to add a stack or linear memory to a state machine to make it a PDA or Turing machine.

So yes, the LLMs can be trivialized as just randomized autocomplete, but if you add a database or memory to the side very basic MLPs can become a Turing machine. It's going to take a lot more proof to say a Turing machine could never be intelligent. And you can do more than just give the LLM side memory - you can have them invoked recursively, use message passing as coroutines, and so on...

You might be technically correct if you ignore anything other than the very restrictive definitions you're using, but even there I'm not certain. If you had a LLM with a trillion token window, is that good enough to act as a memory? Human brains aren't infinite either.

liuliu · 2026-04-28T23:01:12 1777417272

Agreed. It is nonsensical to argue that a 3B transformer that hard-capped to decode 100 tokens is "intelligent". Of course when we are evaluating whether "transformers" is intelligent or not, we are talking about taking transformers as a core part of the system in some ways and enhance it with some other means (as you said, it is pretty trivial to making transformers a Turing machine, hence can carry out any compute, including intelligence (if you are in the camp that intelligence is computable, I don't think it makes sense to argue with anyone who otherwise believes intelligence is not computable)).

xscott · 2026-04-29T00:11:35 1777421495

Lol, I totally agree about anyone using the non-computable angle.

However, I've got a 20GB GGUF file on my disk that can write code better than 99% of the people I ever worked with in the last 25 years, and ravens seem pretty clever with about 2 billion neurons... I have no idea what the lower bound is.

Fun to think about though :-)

scoopdewoop · 2026-04-28T20:43:50 1777409030

You are super positive that transformers can't become AGI, wow. Care to explain how atoms _can_ become AGI?

refulgentis · 2026-04-28T20:30:11 1777408211

Oh! would you mind explaining that out a bit? :)

gslepak · 2026-04-28T20:58:57 1777409937

See the adjacent thread with @altruios.

liuliu · 2026-04-22T18:01:21 1776880881

I think the parent comment is specifically addressing why the black box (or stochastic?) optimizer he used not working.

liuliu · 2026-04-20T21:45:20 1776721520

I honestly don't know. tim@apple.com is unavailable for quite some time now (since I tried a few years ago), while lisasu@amd.com still works around that time frame.

ladberg · 2026-04-20T23:37:28 1776728248

It's always been tcook@ - and it will get looked at by someone at least

rattus_rattus · 2026-04-21T00:48:23 1776732503

Yes! Can confirm. I emailed him in March 2020 after my 16-day old MacBook Pro had a logic board failure resulting in endless kernel panics. It was just past the return date so I couldn’t just return it and get a new one, so my local Apple Store had sent it in for repair. Then covid hit and everything shut down, so they couldn’t get it fixed and sent back either.

I had emailed with an explanation of what had occurred, and asked if I could get a refund so that I could just purchase a replacement. Within two hours of sending my email, an assistant from his office called me to arrange sending me a replacement. I was really impressed. I honestly figured I would just have to wait until the repair depot opened again, because I didn’t think I would hear back about my email.

Then a month or so later I got a call from the repair depot asking what address I’d like my repaired laptop sent to, since it was supposed to be sent back to the store for pickup (but stores were closed.) So I guess the right hand knoweth not what the left hand doeth in that case, because the person on the phone from repairs was pretty confused when I said no thanks.

djyde · 2026-04-21T00:32:49 1776731569

I'm really curious how he manages to read through so many emails every day.

lowdude · 2026-04-21T07:10:45 1776755445

I would have assumed that some assistant goes through the inbox and only a (random or filtered) subsample of those mails actually gets read by Tim Cook.

djyde · 2026-04-21T08:37:07 1776760627

I suspect they've implemented some kind of intelligent email filtering system

butlike · 2026-04-22T13:52:36 1776865956

Like hiring an assistant haha

liuliu · 2026-04-20T21:22:48 1776720168

He also donated to Kamala Harris campaign. He would also donate to the next Democratic president for their inauguration if they still choose to do this corruptive thing. And your point is?

liuliu · 2026-04-20T21:20:53 1776720053

Trump is the president. People voted him into the Office. Tim Cook didn't give him the golden statue before he is in the Office.

Everyone in the United States is complicit to the horrible things done by the Trump administration by your logic. I partially agree, but I also think burning Apple to the ground will not be Tim Cook's legacy and he is in no place to go against the executive branch.

It is not about Trump, it is about the corrupted executive branch. Tim didn't do any crime against humanity in his act.

throwaway173738 · 2026-04-20T22:04:51 1776722691

No, before Trump 2 nobody would’ve taken bribes and gifts so openly like this. It’s not even in the same league and it’s some really self-serving argumentation to pretend otherwise.

Every complicity is another nail in the coffin of our democracy.

phist_mcgee · 2026-04-20T21:52:26 1776721946

Nor does the cop who demands $100 for letting you go without arresting you.

But they're still responsible for their own personal piece of the rot in the system.

rescripting · 2026-04-20T22:08:22 1776722902

Is Tim the cop or the motorist in this example?

If a cop says your problems go away for $100, you pay it, because the downside is huge by comparison. The problem is the cop getting away with it, not that you paid the bribe.

2muchcoffeeman · 2026-04-20T22:21:00 1776723660

I hope you’re not comparing a gold trophy to a straight up bribe. It’s like giving Trump your Noble peace prize.

Having the prize doesn’t make you the winner. But it feeds Trumps ego sooooooo muuuuuch, it’s probably the “best” thing you can do to get on his good side without actually giving him anything.

tastyface · 2026-04-20T22:42:38 1776724958

Cook stood up to the FBI. He could have stood up to Trump -- he just didn't want to.

liuliu · 2026-04-21T16:39:57 1776789597

That's a lawful FBI. This is a lawless executive branch. As we all know by now, executive branch has a lot a power that cannot be limited by Congress nor the Courts and erasing a few zeros from 4T market valuation is a piece of cake (as we witnessed daily how they moved billions around the market to their favorite inside traders).

tastyface · 2026-04-21T17:06:24 1776791184

Tanking Apple would tank the economy -- the one thing Repubs are afraid of. Cook could have used that.

Other, much smaller organizations have stood up to Trump and forced him to back down. So much for "courage."

liuliu · 2026-04-21T17:17:46 1776791866

Like you said, yes, it is about courage. I just felt that I won't have that courage when I were in his shoes. We can just be different.

mcmcmc · 2026-04-20T21:54:21 1776722061

> Everyone in the United States is complicit to the horrible things done by the Trump administration by your logic.

This is a ridiculous strawman. I’ll give you the benefit of the doubt and assume ignorance instead of malice.

I wrote that going above and beyond to curry favor with an autocrat in order to protect your profits is collaboration.

And you read, what? Existing under a government means you necessarily support it because there was an election? You do understand an election means some people voted the other way, right?

pb7 · 2026-04-21T01:33:09 1776735189

Not an ounce of self reflection in this comment.

He's not an autocrat precisely because there was an election. He won the election because he got the most votes. He has since failed to do most things he campaigned on because his power is very limited by virtue of our government's structure.

mcmcmc · 2026-04-21T02:51:44 1776739904

Sorry for dropping the implied “wannabe” in autocrat, I figured HN commenters would be smart enough to infer that based on context. He is pushing and breaking boundaries on every front. No, he never accomplished any of the outlandish promises he made about the economy because he was lying and his team is incompetent, same reason the Iran war is a disaster. Project 2025 has been going pretty damn well though.

liuliu · 2026-04-20T19:52:35 1776714755

I32 are 8 4-bit value packed into one int32.