Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've heard that OpenAI and many AI labs put watermarks [0] in their LLM outputs to detect AI-generated content and filter it out.

[0] Like statistics of words, etc.



Maybe they do use watermarks, and the vendors which only offer hosted models can just log everything they've ever generated, but there's enough players all working on this stuff independently of each other that filtering out their own noise would only get them so far.

I noticed that a big chunk of the default Llama 4 system prompt is devoted to suppressing various GPT-isms, which to me implies they weren't able to keep their newer training set from being contaminated by competing models.

> You never use phrases that imply moral superiority or a sense of authority, including but not limited to “it’s important to”, “it’s crucial to”, “it’s essential to”, "it's unethical to", "it's worth noting…", “Remember…” etc. Avoid using these.


I could have sworn they all gave up on watermarking 12 or 18 months ago when they realized it wasn't possible to do reliably.


Yeah, it's known as the em dash!


Y'know, I've been writing double dashes and having them converted into em dashes about 50% of the time on whatever platform I'm using for decades. It's bizarre that this is suddenly supposed to be a shibboleth.


Have you ever considered you might be an LLM?


Apparently the new ageist insult beyond "boomer" is "double-spacer" -- people who were taught in school to always follow the period at the end of a sentence with two spaces when composing the next sentence. If you went to elementary school after the internet became widespread, you are not likely to have been taught that. So double-spacing has now also become a shibboleth, albeit indicating the typist's age, distinguishing early millennials and Xers, who are now entering middle/old age, from the younger generations.


> Apparently the new ageist insult beyond "boomer" is "double-spacer

Says who? I've seen "boomer"everywhere but it's the first time I've heard about that one.


Right? I've never associated "double-spacer" with boomer. Maybe anally retentive? Someone who is trying too hard? The only thing I associate with boomers is ALL-CAPS writing. Which I assume is a holdover from typewriter days. But I kind of like ALL CAPS. It conveys some level of importance to the message.


It's not about trying, but people who learned double spacing when it made sense (monospace environments) and never unlearned when it didn't matter anymore (variable width typesetting). It's very age specific and a bit culture specific.


Interesting. That could certainly come in handy if it’s something they can’t avoid. We, too, might be able to better detect and filter their output.


This was a proposal by Scott Aaronson but I wasn't aware it got implemented.


do they also watermark the code?


Wouldn’t be hard to do. Just alternate tabs and spaces and no one would ever know or care to check.


Most coders would have code cleaning tools in their IDEs that would take care of that automatically.


What about invisible Unicode characters?


Too obvious. Someone would have found that already.


Yea my IDE highlights uncommon chars automatically.


They are very visible to machines. Code linters would scream (and the alternating spaces and tabs would likely break generated Python code).


Hopefully that's converted to one or the other when saved in an editor, or caught in CI.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: