More

lz400 · 2026-04-15T23:17:51 1776295071

I use claude code everyday. Most of my friend circle have a CC max subscription and we talk about and use AI all the time. Not a single one has installed openclaw yet.

For me personally I don't see that it can do a lot of things that CC/codex doesn't do and that _I_ want to do. Also I'm concerned about security.

For a while I wanted some agent I could tell what to do in my PC at home from my phone, so I just vibe coded a web site that can start CC and I used tailscale to secure it.

lz400 · 2026-04-15T23:12:36 1776294756

I just learnt that em dash in a mac is option+shift+hyphen. I hadn't realized it was so difficult and inconvenient, and in the end it looks so similar to the other one: — -. Thin value. It's no surprise humans barely use them. Then why did it get picked up so much by AIs? I'd have imagined it's not in a lot of training data. Print media practices I guess?

dragonwriter · 2026-04-15T23:41:43 1776296503

> and in the end it looks so similar to the other one:

Maybe if you are looking at it in a monospaced environment like the HN edit window; rendered in a proportional font, hyphens, en-dashes, and em-dashes are quite distinct from eachother.

> It's no surprise humans barely use them. Then why did it get picked up so much by AIs?

It got picked up by AIs because their training corpus includes plenty of professionally published work, not just informal, off-the-cuff communication, and professionally published work uses typographic dashes (em-dashes, en-dashes, and even 2-em- and 3-em-dashes) extensively. (3-em less so in newer works, it having, e.g., dropped out of the recommendations of the Chicago Manual of Style as of 2024.)

dr_dshiv · 2026-04-15T23:16:53 1776295013

I love em dashes. They are so much less pretentious than colons or semicolons — and they help with flow of speech. I learned that key command a couple years ago and it made me feel so smart. I’ve had my comeuppance but I’m not stopping — just a better way to write

marssaxman · 2026-04-15T23:18:06 1776295086

Difficult and inconvenient compared to what, I wonder? I've always really liked the Mac OS option-key system, which I found convenient and easy to understand; I sometimes wish I could type that way in linux instead of using compose keys.

mananaysiempre · 2026-04-15T23:25:05 1776295505

What is it that you like about it specifically? If you’re not picky about the choice of modifier key, you can configure the so-called “level 3 shift key” and have the em dash on the hyphen key at level four (both L3 shift and L2 aka normal shift pressed). For instance, on GNOME Wayland I have “Input Source” = “English (Western European AltGr dead keys)”, “Alternate Characters Key” (GNOME lingo for the L3 shift) = “Right Alt”, so the em dash is RAlt-Shift-hyphen.

marssaxman · 2026-04-16T00:43:20 1776300200

The option-key layout system was easier to memorize than the compose-key patterns, which I struggle to recall. I couldn't tell you why, I just felt like I got the hang of it easily, while using the compose key system has always been slow and clunky.

I've never heard of a "level 3 shift key"; I'll have to look that up.

lamasery · 2026-04-16T14:17:00 1776349020

I grew up on Windows, Linux, and other non-Mac machines, only shifting to Macs around age 30.

Within months I was convinced that every default English keyboard I'd ever seen except the Mac one is strictly worse. It bothers me now how hard it is to get a consistent Mac-style keymap on Linux. This is one thing others should for-sure just rip off entirely. It's so much better.

BeetleB · 2026-04-15T23:16:15 1776294975

It's used a lot in LaTeX and Word. It's not as rare as people make them out to be. It's just that we haven't had a convenient way to enter it in a browser form that some of us (younger folks!) find the em-dash weird.

hyperhello · 2026-04-15T23:15:21 1776294921

Why is that inconvenient? It’s a hyphen with modifier keys.

wwalexander · 2026-04-15T23:20:13 1776295213

Apple’s text inputs usually autocorrect double hyphens to em dashes.

UqWBcuFx6NV4r · 2026-04-15T23:19:20 1776295160

It’s neither difficult nor inconvenient, it’s just new to you.

yojo · 2026-04-15T23:16:30 1776294990

option + hyphen gives you an en-dash (–), which is easier to type and I am guilty of way overusing/misusing.

dragonwriter · 2026-04-15T23:43:50 1776296630

The main use of an em-dash can also be done with an en-dash set open, and different style guides have different preferences for which should be used.

lz400 · 2026-02-07T01:17:07 1770427027

The best thing about this is that AI bots will read, train on and digest the million "how to write with AI" posts that are being written right now by some of the smartest coders in the world and the next gen AI will incorporate all of this, making them ironically unnecessary.

kimixa · 2026-02-07T12:36:10 1770467770

None of this is new, it was pretty much all "best practice" for decades and so already in the training data for the first generation.

If the issue is SNR and the ratio of "good" vs "bad" practices in the input training corpus, I don't know if that's getting better.

klysm · 2026-02-07T01:36:25 1770428185

They will also be reading all of the slop generated by the current and previous generations of LLMs

coldtea · 2026-02-07T07:36:49 1770449809

Each extra generation of AI produced crap AI consumes as training, the worse it gets. This has been mathematically proven.

jatora · 2026-02-07T15:16:26 1770477386

Strange since, in practice, coding models have steadily improved without any backward movement every 3-4 months for 2 years now. It's as if there are rigorous methods of filtering and curation applied when building your training data.

coldtea · 2026-02-07T21:53:28 1770501208

>Strange since, in practice, coding models have steadily improved without any backward movement every 3-4 months for 2 years now. It's as if there are rigorous methods of filtering and curation applied when building your training data.

It's as if what I wrote implies "all other things being equal", just like any technical claim.

All other things were not equal: the architectures were tweaked, the human data set is still not exhausted, and more money and energy was thrown into their performance since it's a pre-IPO game with huge VC stakes.

We've already seen a plateau non-the-less compared to the earlier release-over-release performance improvements. Even the "without any backward movement every 3-4 months for 2 years now" is hardly arguable. Many saw a backward movement with GPT 4.1 vs 4.0, and similar issues with 4.5, for example. Even if those are isolated, they're hardly the 2 to 3.5 to 4.0 gains.

And no, there are absolutely no "rigorous methods of filtering and curation" that can separate the avalance of AI slop from useful human output - at least not without diminishing the possible training data. The problem after all is not just to tell AI from human with automated curation (that's already impossible), the problem is to have enough valuable new human output, which becomes near a losing game as all aspects of "human" domains previously useful as training input (from code to papers) are tarnished by AI output.

jatora · 2026-02-08T02:57:35 1770519455

1. No, you dont get to fall back on the technical claim approach. Your bias in your phrasing was clear. Maybe that works for you but I won't just ignore obvious subtext and let you weasel out of this. And that's for the benefit of other readers, not you.

2. A plateau in coding performance? I don't think you even use these models for coding then if you make that claim. It is very clear models have continually improved. You can trust benchmarks to make that clear, or real world use, or better yet: both. You seem to not have the data from either.

3. No rigorous methods of filtering and curation that can separate AI slop from useful human output? Here you go:

a. Curation already works at scale. Modern training pipelines don’t rely on “AI vs human” detection. They filter by utility signals: correctness, novelty, coherence, task success, citation integrity, and cross-source consistency. These measurable properties do correlate with downstream model performance. Models trained on smaller, higher-quality corpora consistently outperform those trained on larger, noisier ones.

b. Human-generated “valuable” data is not shrinking. The claim assumes a fixed pool. In reality, high-value human data is expanding in areas that matter most: expert-labeled datasets, preference comparisons, multimodal demonstrations, tool-use traces, verified code with tests, and domain-expert feedback. These are explicitly created for training and are not polluted by passive AI spam.

c. Synthetic data is not a dead end—when constrained. Empirically, filtered and goal-conditioned synthetic data (self-play, distillation, adversarial generation) improves reasoning, math, coding, and tool use. The failure mode is unfiltered synthetic recursion—not synthetic data per se. This distinction is already operationalized in production systems.

d. Training value ≠ raw text volume. Scaling laws shifted: performance now tracks effective compute × data quality, not sheer token count. A smaller dataset with higher signal density produces better generalization than a massive, contaminated corpus. This is observed repeatedly in ablation studies.

----

Again, the above is not for you, as I believe you don't see beyond your cope (yet). It's for other readers who are intellectually curious.

chrisjj · 2026-02-07T12:54:14 1770468854

> AI bots will read, train on and digest the million "how to write with AI" posts that are being written right now

Yes!

> by some of the smartest coders in the world

Hmm... How will it filter out those by the dumbest coders in the world?

Including those by parrots?

lz400 · 2026-02-08T08:59:03 1770541143

>Hmm... How will it filter out those by the dumbest coders in the world?

if you know, and I know, and the guys at openai and anthropic know... not a big leap that the models will know too? many datasets are curated and labeled by humans

chrisjj · 2026-02-08T09:55:56 1770544556

> if you know, and I know,

We don't know.

> and the guys at openai and anthropic know... not a big leap that the models will know too?

The models don't "know" anything. They just regurgitate what they are fed.

"Child abuse images found in AI training data"

https://www.axios.com/2023/12/20/ai-training-data-child-abus...

> many datasets are curated and labeled by humans

Including these ones: "AI industry insiders launch site to poison the data that feeds them"

https://www.theregister.com/2026/01/11/industry_insiders_see...

chrisjj · 2026-02-08T12:37:10 1770554230

> having a curated dataset of the works and posts of the top 200 coders in the world

I can't imagine many of the top 200 coders in the world giving their work to the parrots.

But show me the list of the top 200 coders in the world, and I might change my mind! :)

lz400 · 2026-02-08T23:11:19 1770592279

Top 200 that work partially in public. A good example is Mitchell Hashimoto. Works open source, uses AI a lot and writes about it. Next gen AI will learn from the lessons people like him share

chrisjj · 2026-02-09T09:29:45 1770629385

> uses AI a lot

https://en.wikipedia.org/wiki/Model_collapse

lz400 · 2026-02-08T11:06:29 1770548789

I mean, having a curated dataset of the works and posts of the top 200 coders in the world (at least the public ones) is not very difficult. I’m sure these articles like the one in OP will be very easy to mark as “high value training data”. I think you’re letting your bias blind you

lz400 · 2026-01-29T11:31:10 1769686270

Makes me think of the concept of involution in Chinese business and how they understand all of this very differently, and how difficult it is to compete because of that.

RobotToaster · 2026-01-29T12:34:51 1769690091

For anyone else wondering https://en.wikipedia.org/wiki/Neijuan

lz400 · 2026-01-17T01:30:22 1768613422

I do a lot of data processing and my tool of choice is polars. It's blazing fast and has (like pandas) a lot of very useful functions that aren't in SQL or are awkward to emulate in SQL. I can also just do Python functions if I want something that's not offered.

Please sell DuckDB to me. I don't know it very well but my (possibly wrong) intuition is that even giving equal performance, it's going to drop me to the awkwardness of SQL for data processing.

bovinejoni · 2026-01-17T03:20:13 1768620013

I could anecdotally tell you it’s significantly faster and more concise for my workloads, but it’s a standalone executable so just try it out and benchmark for your use case. It doesn’t require fine tuning or even a learning curve

lz400 · 2025-12-14T01:20:23 1765675223

This is not to me but a friend of mine was climbing Mt Fuji in _winter_ (this is a serious thing you need to be prepared for, alpine climbing with lots of snow and ice) when he slipped and started sliding down the mountain out of control.

When he was about to fall to his death a father and son that happened to be there in a struck of luck managed to grab him and save his life. My friend had banged a few rocks in the way down so his leg was fractured and they had to help him down for hours.

They saved his life and risk theirs to ensure he had the best chance. They visited my friend in the hospital where he was grateful and teary eyed. And then the father and son asked him for money, straight up. My friend of course agreed on an amount to them, all in all, he didn't know how to repay them anyway and this was oddly simple. I found everything heroic and strange at the same time but a good story.

lz400 · 2025-12-12T06:16:34 1765520194

In the meanwhile...

Google should demand another $1bn from Disney to crush the lawsuit

https://techcrunch.com/2025/12/11/disney-hits-google-with-ce...

lz400 · 2025-12-09T10:26:38 1765275998

I have similar stories, I showed the Confluent consultants a projection of their Kafka quote vs Kinesis and it was like 10x, even they were confused. The ingress/egress costs are insane. I think they just do very deep discounts to certain customers. The product is good but if you pay full ticket it probably doesn't make sense.

lz400 · 2025-11-27T01:32:43 1764207163

If you see the bitcoin charts, price has gone up a lot in the last few years but volume has tanked. Do I read this right that now crypto is basically a smaller market where a bunch of whales scam a ever dwindling but never completely disappearing flow of fools and marks?

pants2 · 2025-11-27T01:40:36 1764207636

I think a lot of the volume on spot BTC has gone to ETFs, DATs, perps, WBTC, and other derivatives, when a few years ago spot was really the only option. Hard to track total volumes now.

Yizahi · 2025-11-27T12:13:39 1764245619

I suspect that most of the trades are off-chain now, due to blockchain being being complete and utter crap for fast transactions by design. So people are entrusting their tokens to the centralized entity and receive some IOUs from it, with which they trade and that centralized platform. Basically unlicensed banks recreated with all negatives of the bank and no benefits.

christophilus · 2025-11-27T01:39:42 1764207582

Volume appears pretty normal to me. Where do you get the idea that it’s tanked?

bparsons · 2025-11-27T01:52:46 1764208366

Volume has been on a steady downward trend since 2018. https://data.bitcoinity.org/markets/volume/5y?c=e&r=week&t=b

lostmsu · 2025-11-27T12:03:00 1764244980

In BTC. In USD it is +-same.

jameslk · 2025-11-27T01:57:15 1764208635

Volume != value. Less trading can be a sign of more holding

D13Fd · 2025-11-27T04:45:56 1764218756

But people are “holding” nothing more than a line on a spreadsheet. The only true value a cryptocurrency has is in enabling transactions.

If that’s not what it’s doing, then it will eventually have no value.

louthy · 2025-11-27T08:52:38 1764233558

Less trading is the sign of an illiquid market, which can mean pricing becomes volatile and eventually meaningless.

lz400 · 2025-11-27T01:29:59 1764206999

What would it take to reconnect there? evidence that FSD and robots are vaporware?

thatguy0900 · 2025-11-27T01:37:54 1764207474

Evidence that he's going to stop getting boatloads of government money probably