More

polotics · 2026-04-21T20:29:01 1776803341

that is one extremely unsubstantiated statement

polotics · 2026-04-18T17:50:49 1776534649

Hey poster. How is this not BS what are the actual stats? this sounds more like a diversion move from Fox.

rolph · 2026-04-18T18:20:34 1776536434

consider that there are a lot of quite old scientists that are overdue on lifespan.

consider also that knowing you could be targeted, from the start, not because of this supposed trend, but because thats the nature of the job.

the government and its agencies are not above faking decedence to cover an asset of extreme worth.

ordinarily an obituary can be found.

https://www.newsnationnow.com/missing/who-missing-dead-scien...

johnea · 2026-04-18T19:35:18 1776540918

The current US whitehouse counts moving to France to be able to do actual science as being a deceased person...

polotics · 2026-04-15T18:09:19 1776276559

"Surgical "is the kind of wordage that LLMs seem to love to output. I have had to put in my .md file the explicit statement that the word "surgical" should only be used when referring to an actual operation at the block...

fredmendoza · 2026-04-15T18:14:15 1776276855

you're right, they are tools. that's kind of the point. PAL is a subprocess that runs a python expression. Z3 is a constraint solver. regex is regex. calling them "surgical" is just about when they fire, not what they are. the model generates correctly 90%+ of the time. the guardrails only trigger on the 7 specific patterns we found in the tape. to be clear, the ~8.0 score is the raw model with zero augmentation. no tools, no tricks. just the naive wrapper. the guardrail projections are documented separately. all the code is in the article for anyone who wants to review it.

mrtesthah · 2026-04-15T18:28:52 1776277732

The core issue is that the LLM is using rhetoric to try to convince or persuade you. That's what you need to tell it not to do.

throwanem · 2026-04-15T18:47:49 1776278869

Which will not work. Don't think of a pink genitalia, I mean elephant...

mrtesthah · 2026-04-20T18:22:29 1776709349

An LLM that can't follow instructions wouldn't be able to write code anyway.

throwanem · 2026-04-21T16:47:06 1776790026

Nonsense. But even an LLM that can follow instructions cannot follow that one.

mrtesthah · 2026-04-21T19:03:25 1776798205

What is intrinsic to an LLM or its training that would prevent it from following the directive that it should not try to convince you of something?

polotics · 2026-04-14T18:20:49 1776190849

In this day and age, without serious evidence that the software presented has seen some real usage, or at least has a good reviewable regression test suite, sadly the assumption may be that this is a slopcoded brainwave. The ascii-diagram doesn't help. Also maybe explain the design more.

pranabsarkar · 2026-04-14T19:15:14 1776194114

Fair. "Does consolidation actually improve recall quality on a running system?" is exactly the benchmark I haven't published, and it's the one that would settle the question.

What I do have right now:

1178 core unit tests including CRDT convergence property tests via proptest (for any sequence of ops, final state is order-independent) Chaos test harness: Docker'd 3-node cluster with leader-kill / network-partition / kill-9 scenarios (tests/chaos/ in the repo) cargo-fuzz targets against the wire protocol and oplog deserializer Live usage: running on my 3-node homelab cluster with two real tenants (small — a TV-writing agent and another experiment) for the past few weeks. Caught a real production self-deadlock during this period (v0.5.8), which is what triggered the 42-task hardening sprint. What I don't have and should: a recall-quality-over-time benchmark. Something like: seed 5,000 memories with known redundancy and contradictions, measure recall precision@10 before and after think(), and publish the curve. That's the evidence you're asking for, and you're right it's missing. I'll run that and post the numbers in a follow-up.

The ASCII diagram fair point too — website has proper rendering (yantrikdb.com) but the README should have an SVG.

Appreciate the pushback — this is more useful than encouragement.

6r17 · 2026-04-14T18:48:49 1776192529

I kind of agree with the comment here that a lot of stuff happening around comes out from an idea without proof that the project has a meaningful result. A compacting memory bench is not something difficult to put off but I'm also having difficulties understanding what would be the outcome on a running system

pranabsarkar · 2026-04-15T00:40:15 1776213615

I have been using the memory while building it. I have a central server and all my workspaces are connected to it via the MCP server. This changed everything for me. But that's me. Now I don't have to repeat things, the agent knows my preferences, can connect different projects I am working on without me asking and it knows my infra so can plan the test deployments and stuff on its own. That is somewhat I was aiming for.

polotics · 2026-04-12T16:48:42 1776012522

Is there already a name for that effect where grandiose plans somehow appear to be more feasible than the simple mundane step-by-step issues resolution that even though they clearly stand in the way of said grand plans, are not worth investing thought and effort in?

When it comes to software projects my pet-name for it is the "big-bang theory", but in the article's domain that's kind of already taken.

jerf · 2026-04-12T17:35:22 1776015322

I use the term "30,000 foot view" a lot: https://nanoglobals.com/glossary/30000-foot-view/

It appeals to me because if you've ever taken a flight you can see how the details get progressively erased as you lift. Details that matter for a lot of reasons even if you can't see them.

restalis · 2026-04-12T23:01:17 1776034877

It's also called "vision". It's what provides and powers directions on large and long term scales. Those "simple" and "mundane step-by-step issues" are just chores by themselves, yet at the same time may become stepping stones in the context of a well thought vision that people buy into and rally behind.

hollerith · 2026-04-12T16:53:21 1776012801

"Far-mode thinking".

https://www.lesswrong.com/w/near-far-thinking

polotics · 2026-04-10T19:08:44 1775848124

Possible, but unlikely. To organise such a stunt and keep undetected you're going to need other consigliere than what Sam's got I presume.

josefritzishere · 2026-04-10T19:58:21 1775851101

Like another commenter wrote... anyone can cast a fireball. Sam has been called a sociopath by many who know him personally. So it seems more likely than it might be otherwise.

polotics · 2026-04-06T19:43:04 1775504584

It's not "boots on the ground" if it's a rescue mission, I guess.

johnbarron · 2026-04-06T19:44:36 1775504676

https://youtu.be/HhR6qPvUlk0

polotics · 2026-04-05T05:52:01 1775368321

I think you will like Robert Sapolski lectures on YouTube...

polotics · 2026-04-05T05:47:30 1775368050

AGI is here? Yann Le Cun has a few weeks ago once more presented his PoV about how current LLMs fail: https://youtu.be/nqDHPpKha_A?is=sQsO57UWwR8LGZkW

in french ...so in my own words:

1) Still unreliable at logic and general inference: try and try again seems to be SoTA...

2) Comically bad at pro-activity and taking the right initiative: eg. "You're right to be upset."

3) Most likely already reaching the end of the line in terms of available good training data: looking at the posted article here, I would tend to agree...

NitpickLawyer · 2026-04-05T06:46:55 1775371615

The problem is that LeCun was obviously wrong on LLMs before. You have to take what he says with the caveat that he probably talks about these in a purist (academic) way. Most of the "downsides" and "failures" are not really happening in the real world, or if they happen, they're eventually fixed / improved.

~2 years ago he made 3 statements that he considered failures at the time, and he was quite adamant that they were real problems:

1. LLMs can't do math

2. LLMs can't plan

3. (autoregressive) LLMs can't maintain a long session because errors compound as you generate more tokens.

ALL of these were obviously overcome by the industry, and today we have experts in their field using them for heavy, hard math (Tao, Knuth, etc), anyone who's used a coding agent can tell you that they can indeed plan and follow that plan, edit the plan and generally complete the plan, and the long session stuff is again obvious (agentic systems often remain useful at >100k ctx length).

So yeah, I really hope one of Yann, Ilya or Fei-Fei can come up with something better than transformers, but take anything they say with a grain of salt until they do. They often speak on more abstract, academic downsides, not necessarily what we see in practice. And don't dismiss the amout of money and brainpower going into making LLMs useful, even if from an academic pov it seems like we're bashing a square peg into a round hole. If it fits, it fits...

polotics · 2026-04-04T17:01:07 1775322067

As a sizable share of the market is going to want to use this for local LLMs, I do not think this is that misleading.

bigyabai · 2026-04-05T03:59:37 1775361577

Most people I know are not using TinyGrad for inference, but CUDA or Vulkan (neither of which are provided here).