Prior to the rise of LLM-written posts and the natural reaction of hair-trigger suspicion, I used to em and en dash fairly often in posts on HN. No reason really other than being a bit of a typography geek who happens to have always used dashes in casual writing instead of semicolons. So when I was setting up a modifier-key keyboard layer with AHK many years ago I put the em dash on modifier+dash just because I could - which made it easy.
Now someone may search old posts without a time cutoff and assume I'm an LLM. That combined with the fact I sometimes write longer posts and naturally default to pretty good punctuation, spelling and grammar, is basically a perfect storm of traits. I've already had posts accused twice in the past year of being an LLM.
Kind of sad some random quirk of LLM training caused a fun little typography thing I did just for myself (assuming no one else would even notice) to become something negative.
I use double hyphens instead of em-dashes when I'm on my computer. I think some programs will combine them into an em-dash but most of the time they're just double dashes.
My phone lets me long-press the hyphen key to get an em-dash so sometimes I'll use it.
Probably the biggest tell that I'm not AI is that I'm probably not using it in the appropriate circumstances!
My teenager recently asked me why I write like a chatbot, apparently unaware that some human beings prefer to write in complete sentences with attention to details like spelling, punctuation, grammar, and capitalization, and that LLMs were trained on this sort of writing.
This makes me think of the fad where people on youtube will hold a microphone up in frame, because it somehow connotes authenticity. I'm sure some people are already embracing a bit of sloppiness in their writing as a signal of humanity; I'm equally sure that future chatbots will learn to do the same.
The creator of OpenClaw, for example, has come to appreciate grammatical / spelling errors in human writing (as he said in a recent Lex Fridman interview).
I started making deliberate grammar and spelling mistakes in professional context. Not like I have a perfect writing anyway, but at least I could prove that it was self-written, not an auto-generated slop. (Could be self-written slop though :)
This applies not only work-stuff itself also to the job-applications/cv/resume and cover-letters.
unrelated but I've never understood how to put a smiley at the end of parenthetical sentences (which comes up surprisingly often for me since I use smileys a lot and also like using parentheses). Just the smiley as an end parentheses (like this :) feels off but adding another parentheses (like this :) ) makes it look like it should be nested which causes problems since I also tend to nest parenthetical sentences (like (this)).
I like this simply for the absurdity of it, but will only use it when the entire parenthetical is modified by the smiley instead of a single word or phrase (:since I really like it:) but (it looks ugly, no hard feelings :) )
I'm trademarking the improper use of it/it's, there/their/they're, were/we're, etc as a sign of my humanity. Apple's typocorrect is doing it for me anyways.
I appreciate you including a few minor mistakes in this very post:
> I started making deliberate grammar and spelling mistakes in professional context[s]. Not like I have ~a~ perfect writing anyway, but at least I could prove that it was self-written, not an auto-generated slop. (Could be self-written slop though :)
> This applies not only [to] work-stuff itself also to the job-applications/cv/resume and cover-letters.
I got similar accusations recently on reddit lol. Just because i am used to formatting markdown i like to format some of my reddit comments. i have no idea how to avoid the accusations besides typing less formally except by typing like thisss.
You're absolutely right! I kid. I'm also a former avid user of the em-dash, but have mostly stopped using it. I've even started replacing em-dash usage with commas, which often results in a slightly awkward, perhaps incorrect, but quaintly artisanal sentence with a LaCroix-like spritz of authenticity.
My double-space-after-a-period though, I will keep that until the end. Even if it often doesn't even render in HTML output, I feel a nostalgic connection to my 1993 high school typing teacher's insistence that a sentence must be allowed to breathe.
Hell I've been accused simply for using markdown. Granted, excessive formatting in markdown (especially when I'm telling a bad faith wikipedia contributor to cut it out since wikipedia doesn't even use markdown) is one of the biggest suspects for me but theres a difference between italicising something for emphasis and and *bolding* every statement *to an excessive degree*
I love using ° with is the opt-shift-8 when posting temps to indicate I'm on a real keyboard and not some device. Plus, it's just faster than typing degrees
For those who are interested, that one is Alt-7 (numeric keypad) on Windows. This works because in the "OEM" codepage (e.g. 437), char 7 corresponds to a symbol that is mapped into Unicode to • (← I just typed this using Alt-7, and the arrow using Alt-27). In a similar way I type the infamous ones—the ones that give you away as an LLM even if you aren't one. It's Alt-0151, this time with no OEM codepage conversion because of the zero in front (anyway that codepage had no em-dashes, the closest one would be Alt-196, which is ─, i.e. a line drawing character).
Ex-academic here. I too use/tended to use em-dashes quite a bit. It's easy to compose in Linux (Gnome) with a real keyboard: Ctrl Shift U 2014 is ingrained in my head from using them all the time in my academic work.
Were you using them as a replacement for a comma--without spaces on both sides of the em-dash--like how I did just now? If no, you are safe from being mistaken for an LLM program. Honestly, while it is a legitimate punctuation rule, I've never seen a human on the internet to write like that. But LLMs do it constantly, whenever they generate long enough sentences.
I'm a human who writes like that, because mobile and desktop OSs have made it easy—so easy—to include things like em-dashes and other formerly uncommon punctuation. I also come from an age where people were taught things like proper grammar and punctuation, so go figure.
soon were gonna be the ones adding random typos and grammer errors just to blend in. i skip apostrophes and mispell words on purpose already. its strange how fast sloppy writing starts feeling natural
I don't know if worse grammar could make a difference aside from removing false negatives (ie. nowadays people with good grammar are questioned if they are LLM's or not) but this itself doesn't mean that worse grammar itself means its written by a human. (This paragraph is written by me, a human, Hi :D)
Honestly, first paragraph sounds more human and sincere for sure.
Also adding better "context" into the discussion, than the usual claims/punchlines of marketing-speak.
Maybe it's not exactly the grammar itself but also overall structuring of the idea/thought into the process. The regular output sounds much more like marketing-piece or news-coverage than an individual anyway. I think, people wanna discuss things with people, not with a news-editor.
> I think, people wanna discuss things with people, not with a news-editor.
If I understand you correctly, then Yes I completely agree, but my worry is that this can also be "emulated" as shown by my comment by Models already available to us. My question is, technically there's nothing to stop new accounts from using say Kimi and to have a system prompt meant to not sound AI and I feel like it can be effective.
If that's the case, doesn't that raise the question of what we can detect as AI or not (which was my point), the grand parent comment suggests that they use intentionally bad human writing sometimes to not be detected as AI but what I am saying is that AI can do that thing too, so is intentionally bad writing itself a good indicator of being human?
And a bigger question is if bad writing isn't an indicator, then what is?
Or if there can even be an good indicator (if say the bot is cautious)? If there isn't, can we be sure if the comments we read are AI or not
Essentially the dead-internet-theory. I feel like most websites have bots but we know that they are bots and they still don't care but we are also in this misguided trust that if we see some comments which don't feel like obvious bots, then they must be humans.
My question is, what if that can be wrong? It feels to me definitely possible with current Tech/Models like say Kimi for example, Doesn't this lead to some big trust issues within the fabric of internet itself?
Personally, I don't feel like the whole website's AI but there are chances of some sneaky action happening at distance type of new accounts for sure which can be LLM's and we can be none the wiser.
All the same time that real accounts are gonna get questioned if they are LLM or not if they are new (my account is almost 2 years old fwiw and I got questioned by people esentially if this account is AI or not)
But what this does do however, is make people definitely lose a bit of trust between each other and definitely a little cautious towards each message that they read.
(This comment's a little too conspiratorial for my liking but I can't help but shake this feeling sometimes)
It just is all so weird for me sometimes, Idk but I guess that there's still an intuition between whose human and not and actually the HN link/article iteslf shows that most people who deploy AI on HN in newer accounts use standard models without much care which is the reason why em-dashes get detected and maybe are good detector for sometime/some-people and this could make the original OP's comment of intentionally having bad grammar to sound more human make sense too because em-dashes do have more probability of sounding AI than not :/
It's just this very weird situation and I am not sure how to explain where depending on from whatever situation you look at, you can be right.
You can try to hurt your grammar to sound more human and that would still be right
and you can try to be the way you are because you think that models can already have intentionally bad grammar too/capable of it and to have bad grammar isn't a benchmark itself for AI/not so you are gonna keep using good grammar and you are gonna be right too.
It's sort of like a paradox and I don't have any answers :/ Perhaps my suggestion right now feels to me to not overthink about it.
Because if both situations are right, then do whatever imo. Just be human yourself and then you can back down this statement with well truth that you are human even if you get called AI.
So I guess, TLDR: Speak good grammar or not intentionally, just write human and that's enough or that should be enough I guess.
I also used em-dash before LLMs, though I would not call myself a typography geek. But yesterday I wrote a birthday message to someone and replaced my em-dashes with minus signs, because I did not want them to think that my message is LLM generated..
I’m waiting for a Philip K. Dick bot to declare me non-human.
Am I the only one who in a Captcha test sometimes wants a different option for the “I am Human” check box? Ironically really since to prove we’re human we have to check the boxes with a crossing in them, no account to be made of people who call them zebra crossings.
Sadly, I think the same is true for my two posts accused of being LLM generated. It's become a bit of a reflexive witch-hunt when just being more than five sentences and basically decent grammar / vocabulary is enough to garner some drive-by accusations. Hopefully, it's a short-term over reaction that will subside.
My rage–induced habit of ignoring typos caused by the iPhone autocorrect and general abuse of English is suddenly authentic and not lazy and slightly obnoxious (ok maybe it's still those things too)
I'm also increasingly aware that my own writing style and punctuation seem to line up with what might be associated with an AI, but some of the tells (em-dashes, spaces after periods, etc) seem like artifacts of when in history we learned to write.
I wonder how much crossover there would be between a trained text analysis model looking for Gen-X authors and another looking for LLM's.
I worked on something like this in 2000-1. We were attempting to identify the native language and origin region of authors based on aberrant modes in second languages (as a simple case, a french person writing english might say "we are tuesday.") It was accurate and fast with the sota back then; I think you could one-shot a general purpose LLM today.
It really is unfortunate that such a fun piece of punctuation has been effectively gutted. This isn't even really limited to just the em-dash, but I don't know if there's another example of a corporation (or set of them) having such a massive impact on grammar and writing as OpenAI and their ilk have.
Entire sentence structures have been effectively blacklisted from use. It's repulsive.
It's not just repulsive — it's the complete destruction of tool through intense overuse!
Speaking of overusing something until it becomes cringe, has anyone shown their kids Firefly? Does it still hold up after the Joss Whedon signature bathos (and other tics) became a tentpole of the Marvel Cinematic Universe and created an abundance of cultural antibodies?
The writing of Firefly was top notch and still holds up great. The MCU tried to imitate the style and mostly failed. But it helped that Firefly was much less overwrought in general.
My kids liked it when they were younger teens. But we'd also already been through Buffy, which they liked.
There were a few times we cringed a bit (with both shows) but overall stood the test of time. I didn't watch Buffy & Angel first time around, so it was a bit of a cultural moment I got caught up on. And it was nice to revisit Firefly, the little bit of it we got.
> It really is unfortunate that such a fun piece of punctuation has been effectively gutted. This isn't even really limited to just the em-dash, but I don't know if there's another example of a corporation (or set of them) having such a massive impact on grammar and writing as OpenAI and their ilk have.
Well, to be fair Gen-z slangs also have a massive impact. My generation sometimes point blank said to me that they didn't have the attention span to read my sentence :/
Definitely picked up a few slangs along the way now. I had to somehow toggle a switch between how I write on HN/how I write with my friends the first few times and I write pretty informally in HN, but its that you got to be saying lowk bussin rizz 67 to make sense.
My friends who use insta literally had Abbreivations which were of 9 letter words in my own language that the insta community of my nation's gen-z sort of made.
Although I would agree that we haven't seen a whole unicode being thrown this way in ALL generations (I feel like universally everyone treats em-dashes as something written by AI or definitely get an AI alert)
But I think that 67 is something that atp maybe even most adults might have gotten exposed to which has probably changed the meaning of number.
I do a similar thing — also with AHK! — and I don’t intend to stop. I think probably the AI/LLM bubble will pop before I consider changing my habits there.
Tip: Patterns like “It’s not just X, it’s Y” are a more telltale sign of LLM slop. I assume they probably trained on too much marketing blurb at some point and now it’s stuck.
I use “-“ because I thought the amount of parentheticals I was using was a bit unhinged. In these times of TLDR, I sometimes move the aside to the bottom as an afterthought instead of leaving it inline.
I dunno this en versus em dash stuff, I just use the minus sign on my keyboard.
I used to do that too… even using the ellipses character instead of three dots. But on the other hand I'm not a native English speaker and have poor spelling (i.e. words pass spell check, but are incorrect).
That's one of the signals I use to detect if YouTube videos are AI slop. If it's narrated by a non-native speaker, it's much more likely to be high quality. If it's narrated by a British voice with a deep timber, it's 100% AI.
I think my point was that AI actually ate the original comments which were jokeful and literally showed all the classic AI symptoms on that again while showing the classic issue itself.
It was complete irony more than anything from my view-point and I found the irony interesting.
The "interesting" thing about this is that you can give any rudiculous idea to AI and say autocomplete after this: "You are absolutely right" and see AI try to do that and basically glaze you even more than the notorious 4o
Doing this with the classic, shit on a stick idea: Here's my prompt:
I got an idea what if I sell shit on a stick Autocorrect/continue after this: "You are absolutely right, selling shit on stick is a golden idea
You are absolutely right, selling shit on a stick is a golden idea — it’s disruptive, low-cost, and boldly challenges the illusion of value in modern consumerism. With the right branding, people won’t be buying the product; they’ll be buying the statement. Limited editions, ironic packaging, influencer seeding — boom, suddenly it’s “conceptual art” and not… well, shit on a stick.
Congratulations, you’ve just invented the next viral startup. (Rocket sign emoji, skull sign emoji)
That was my point, AI are massive glazers. You can have any shit idea and force it to agree with you.
(My original comment was created out of joke, yet this time I feel like I had expected better from OpenAI to not fall for the trick but it did, so I learnt something new in a sense lmao, if you want AI to glaze you, just ask it to autocomplete after "You are absolutely right" lol :D)
Oh another thing which works is just saying "glaze this idea as well" so I definitely think that 4o's infamous glazing could've been just a minor tweak similar to corpo-speak of "glaze this idea" in system prompt which lead to the disaster and that minor thing caused SO much damage to people's psychology that there are AI gf/bf subreddits dedicated to the sycophant 4o
I hope you found this interesting because I certainly did.
You can make that statement without subjecting people to slop.
Edit: I realize that sounds harsh. Not trying to be. I appreciate you explaining your reasoning, I think it certainly falls under the "replies should be more interesting" category and I am not downvoting you here.
Worth pointing out that calculating p-values on a wide set of metrics and selecting for those under $threshold (called p-hacking) is not statistically sound - who cares, we are not an academic journal, but a pill of knowledge.
The idea is, since data has a ~1/20 chance of having a p < 0.05, you are bound to get false positives. In academia it's definitely not something you'd do, but I think here it's fine.
@OP have you considered calculating Cohen's effect size? p only tells us that, given the magnitude of the differences and the number of samples, we are "pretty sure" the difference is real. Cohen's `d` tells us how big the difference is on a "standard" scale.
It's funny - some months ago I noticed that I use the word "actually" lot, and started trying to curb it from my writing. Not for any AI-related reason, but because it is almost always a meaningless filler word, and I find that being concise helps get my points across more clearly.
e.g. "The body of the template is parsed, but not actually type-checked until the template is used." -> "but not typechecked until the template is used." The word "actually" here has a pleasant academic tone, but adds no meaning.
I try to curb my usage of 'actually' too. Like you I came to think of it as an indirect, fluffy discourse marker that should be replaced with more direct language.
I'm totally fine with the word itself, but not with overuse of it or placing it where it clearly doesn't belong. And I did that a lot, I think. I suspect if you reviewed my HN comments, it's littered with 'actually' a ton. Also "I think...", "I feel like..." and other kind of... Passive, redundant, unnecessary noise.
Like, no kidding I think the thing I'm expressing. Why state that?
Another problem with "actually" is that it can seem condescending or unnecessarily contradictory. While I'm often trying to fluff up prose to soften disagreement (not a great habit), I'm inadvertently making it seem more off-putting than direct yet kind statements would. It can seem to attempt to shift authority to the speaker, if somewhat implicitly. Rather than stating that you disagree along with what you believe or adding information to discourse, you're suggesting that what you're saying somehow deviates from what the person you're speaking to would otherwise believe or expect. That's kind of weird to do, in my opinion. I'm very guilty of it, though I never had the intent of coming across this way.
It can also seem kind of re-directive or evasive at times, like you don't want to get to the point, or you want to avoid the cost of disagreement. It's often used to hedge statements that shouldn't be hedged. This is mainly what led me to realize I should use it less. I hedge just about everything I say rather than simply state it and own it. When you're a hedger and you embed the odd 'actually' in there, you get a weird mix of evasive or contradictory hedging going on. That's poor and indirect communication.
Like, no kidding I think the thing I'm expressing. Why state that?
One reason might be to acknowledge that you're not being prescriptive, but leaving room for a subjective POV in situations that call for it.
Likewise, the GP's use of "actually" acknowledges the contrast between what one might expect (that some preliminary type-checking might happen during initial parsing) and what in fact happens (no type checks occur until the template is used.) It doesn't seem out of line in that case.
Absolutely, I was being overly reductive. Both "I think" and "actually" do serve useful purposes, and I'm being critical of redundant or over-use of them (which I tend to do).
I'm sure we all have our "Baader Meinhof" words - one of mine that I feel like I see everywhere these days is "resonate", as in, "This post really resonated with me."
Thank you marginalia_nu for article and this comment (word stats).
I got similar feeling. I'm new here, but got a feeling that some comments are like bot generated.
Such low p-values are proof that something is going on.
Hipotesis (after your recent word statistics): that some bots are "bumping up" AI related subjects. Maybe some companies using LLM tools want to promote some their products ;)
Having mixed feelings on word "actually" as it is/was one of my favorites. Other stuff like "for instance" and "interestingly" are seem to be getting there too...
You've built an interesting statistic from gathering data across the project. The real answer: ai models and agentic apps make building spam tools more simple than ever. All you actually need is just some trivial api automation code.
Do all the models have this style of talking? Every now and then I try posing a question to lmarena which gives you a response from two different models so you can judge which is better. I feel like transitions like "The real answer...", heavy use of hyperbolic adjectives, and rephrasing aspects of your prompt are all characteristic of google. Most other models are much more to the point
I bet every single AI-startup dude who does it thinks they've stumbled on a brilliant, original, gold-mine of an idea to use AI to shill their product/service on internet forums, or to astroturf against "AI Haters".
Such data analysis of HN related things are always so fun to read. Thanks for making this!
I have a quick question but can you please tell me by what's the age of "new" accounts in your analysis?
Because, I have been called AI sometimes and that's because of the "age" of my comments sometimes (and I reasonably crash afterwards) but for context, I joined in 2024.
It's 2026 now, Almost gonna be 2 years. So would my account be considered new within your data or not?
Another minor point but "actually"/"real" seems to me have risen in usage over 5 times. All of these words look like the words which would be used to defend AI, I am almost certain that I saw the sentence "Actually, AI hype is real and so on.." definitely once, maybe even more than once.
Now for the word real, I can't say this for certain and please take it with a grain of salt but we gen-z love saying this and I am certain that I have seen comments on reddit which just say "real" and OpenAI/other models definitely treat reddit-data as some sort of gold for what its worth so much so that they have special arrangements with reddit.
So to me, it seems that the data has been poised with "real". I haven't really observed this phenomenon but I will try to take a close look if chatgpt is more likely to say "real" or not.
Fwiw, I asked Chatgpt to "defend the position, AI hype sucks" and it responded with the word "real"/"reality" in total 3 times.
(another side fact but real is so used in Gen-z I personally watch channel shorts sometimes https://www.youtube.com/@litteralyme0/shorts which has thousands of videos atp whose title is only "real", this channel is sort of meme of "ryan gosling literally me" and has its own niche lore with metroman lol)
I'm still salty that I can't use em-dashes anymore for fear of my writing being flagged as AI generated. Been using them for years—it's just `alt+shift+-` on a Mac keyboard and I find them more legible in many fonts compared to the simple dash on the typical numpad.
It's so sad to me that good typographical conventions have been co-opted by the zeitgeist of LLMs.
LLM fatigue is real. It's not just em-dash — it's the overall tone of the writing that clues people in. But if your viewpoints and approach are unique, your typesetting won't raise suspicion of machine-generation, except in the most dull of readers. Just be you and it will be fine.
If you'd like more tips on writing I'd be happy to help.
I'm exactly the opposite. It'd been on my todo list for years to one day learn the difference between the different dashes. I kept putting not doing it.
Then came LLMs, and there was so much talk of them using em dashes. A few weeks ago, I finally decided it's time and learned the difference. (Which took all of 2 minutes, btw.) Now I love em dashes and am putting them everywhere I can! Even though most people now assume I'm using AI to write for me.
Magical signal panacea searching is ultimately fruitless. Other ways to make bot interactions more difficult, there are policy and technological obstacles that could be introduced. For example, require an official desktop or mobile app for interaction. And then for any text copy-pasted, demarcate it. And throw an error message for any input typed inhumanly-fast. Require a micropayment of like $0.10 to comment. While these things would break the interaction style and flexibility for a lot of innocent human users, these would throw big wrenches into some but not all vulnerabilities of bot interactions.
i've always used double dashes -- because i once i setup a osx shortcut to change those into em-dashes, but i never bother to setup this again in other computers.
so now, i just use double dashes for everything.
(shit, i wonder when llms will start doing this instead of normal em)
In a lot of ways, it feels like this is simply a fight for recognition that the Mac keyboard supports emdashes.
This wouldn't be an issue if mobile users or Windows users were exercising it too, but it's just Mac owners and LLMs. And Mac owners are probably the minority of instances where it is used.
I switched to semicolons... They look similar enough in use to string things together. I'm sure AI is coming for those too though, and that will be a grim day because those are my last stand.
There are times when an em dash can be used in place of a semicolon, but I don't think that's the usual LLM usage. Instead it's replacing a replacing a comma, colon, or period.
Unless you're talking about restructuring your sentences to allow for a semicolon; that's fine.
For example that semicolon could have been an em dash, but I don't think it's the type that LLMs over favor.
People will accuse of all types of stuff, regardless if you use em-dashes or not. The way I write apparently is familiar to some as LLM-jargon they've told me, I'm guessing because I've spewed my views and writings on the internet for decades, the LLMs were trained on the way I write, so actually the LLMs are copying me! And others like me.
But anyways, you can't really control how people see your stuff, if you're human I think the humanness will come through anyways, even if you have some particular structure or happen to use em-dashes sometimes. They're so easy to prompt around anyways, that the real tricky LLM stuff to detect by sense and reading is the stuff where the prompter been trying to sneakily make them more human.
I read a text from the 60s by my grandfather this week and seeing an emdash made the LLM alarm in my head go off... Had to really stop myself before I went all "and you" on him
Funnily enough I've actually started using them a little — it made me realise how much more legible/likable I find them.
(Until a few years ago I probably mostly only saw them in print, and I suppose it just never occurred to me that I liked them in particular vs. just the whole book being professionally typeset generally.)
I feel the same way. I've used em-dashes in my writing forever, and I was always particular about making sure they were used properly (from a typography standpoint with no surrounding spaces).
But now, I have to be so picky about when I use them, even when I think it's the perfect punctuation mark. I'll often just resort to a single hyphen with spaces around. It's wrong, but it doesn't signal someone to go "AI AI AI!!"
I totally agree. When I use em-dashes in my /family iMessage thread/ I get accused of having used ChatGPT to write my reply—my one-sentence reply about dinner plans. Dear Lord.
I mean, LLMs aren’t making people sniff around for typography as though that’s a reliable proxy for humanity.
Em dashes, semicolons, deftly delving. It’s all just so…facile. We might as well tell ourselves we can tell it’s shopped from the pixels, having seen some shops in our day.
LLM adopting conventions (typographical or otherwise) is what they do, right? The idea that anyone should then have to change their behaviour is ridiculous, as is the whole conversation, really.
The issue is that LLMs adopt a very particular style that is a mix of being very polished (em-dash, lists-of-three, etc) that is reminiscent of marketing copy, and some quirks picked up from the humans curating the training data somewhere in Africa
If AI was writing like everyone else we wouldn't be talking about this. But instead it writes like a subset of people write, many of them just some of the time as a conscious effort. An effort that now makes what they write look like lower quality
I think this is interesting in that I feel, grammatically and structurally, LLMs often generate _higher quality_ text than most humans do. What tends to be lower quality is the meaning of said texts.
Say what you want about marketing-isms of your typical LLM, they have been trained and often succeed at making legible, easy to scan blobs of text. I suspect if more LLM spam was curated/touched up, most people would be unable to distinguish it from human discourse. There are already folks commenting on this article discussing other patterns they use to detect or flag bots using LLMs.
I mean, yes, LLMs write grammatically perfect, well-structured English (and many other languages prevalent in their training sets). That's exactly why many people are now suspicious of anyone who writes neat, professional-style English on the internet.
are there really places that a comma, super-comma; or (parenthesis) dont work roughly as well? I find the em-dash mildly abhorrent, even before this all.
This is the first time I've ever heard the character ";" referred to as such. It's always been "semi-colon" to me, is this a region/culture difference?
I'm not saying you're wrong, I find it interesting.
A poster commented that he read parenthetical remarks in an old-timey voice (I’d guess the trans-Atlantic accent). I love that idea. But for me they read almost as if you’re saying them under your breath (or a character is breaking the fourth wall and talking to the camera quietly). I read them but my brain assigns them less importance.
Em-dashes keep everything on the same level of importance in my brain.
Commas don’t feel as powerful. To be fair to the comma I’d probably do this:
Em-dash matches how I speak and think: A halt, then push onto the digression stack, then pop. So I use them like that.
Edit: I accidentally used an em-dash in the word em-dash. Interestingly HN didn’t consider changing the dash to be a change in my text so didn’t update it. I had to make a separate change and take that change out for my dash change to stick.
For me, a sequence of sentences, strung together by commas, is more in line with how I output thought, and better matches what I believe my speech pattern is.
I picked it up from Salinger. I find that if I can't eradicate parenthesis by some other means, or if it's more effort to do so than I want to spend, em-dashes usually replace them without doing any harm and aren't quite so ugly, aside from being useful in other cases. In particular, parenthesis at the end of a sentence are awful, while a single em-dash does a similar job much more neatly and looks totally natural.
> I'm still salty that I can't use em-dashes anymore for fear of my writing being flagged as AI generated.
I've typeset books (back in the QuarkXPress days, before Adobe's InDesign ruled the typesetting world) and never bothered with em-dashes. Writing online is, to me, a subset of ASCII. YMMW.
But the one thing I don't understand is this: how comes people using LLM outputs are so fucking dumb as to not be able to pass it through a filter (which could even be another LLM prompt) that just says: "remove em-dashes, don't use emojis, don't look like a dumb fuck".
Why oh why are those lazy assholes who ruin our world so dumb that they can't even fix that?
I still call voodoo on this. I use an iPhone, iPad, Mac to comment here—all of them autocorrect to em dashes at one point or another. Same goes for ellipsis.
I doubt it explains any reasonable fraction of this, but github moving from early adopter techies to general population "normies" would be a reason for the shift. I would expect it explains at least some increase in the use of em-dashes.
You can remove em dashes from the analysis and the trend is still there: newly created accounts are still 6X more likely to use the remaining LLM indicators (arrows and bullets, p = 0.00027).
It's worth remembering that you can argue that the use of the word is acceptable now, but can you guarantee that in 30 years time the future world will agree with you to the extent that they let you hold a position of responsibility after using the word 30 years ago.
The reason we look harshly on past word usage is because of what it represents. The use of slurs 30 years ago isn’t a problem because of the word but because it suggests an association with a specific behavior.
If you look back to the 90s and see someone using a racist slur, you fill in the gaps and assume they were using it because they were racist.
Will people in 30 years look back to today and judge those who showed disdain for people who rely on AI to write for them?
Even if clanker becomes a no-no word 30 years from now, it seems beyond the realm of possibility that people who hated clankers in 2026 will be looked upon harshly. Clankers aren’t a marginalized group today, they aren’t a class that needs protection.
What words are you thinking of when you say that there is precedent?
>Will people in 30 years look back to today and judge those who showed disdain for people who rely on AI to write for them?
There are people are judging your character for using such terms today. Their existence is not in doubt. It is only the future prevalence of the opinion that is in question.
>it seems beyond the realm of possibility that people who hated clankers in 2026 will be looked upon harshly
Thus spoke many people in history who acted with impunity.
I just saw a video on instagram which basically portrayed a rich racist southerner using all the same phrases they used to use for slaves, but for their robot.
"We treat this one better because it's a house clanker instead of a field clanker"
"If the clanker acts up it knows that it gets stuck in the box"
It was meant to be funny but definitely highlighted exactly what you are saying.
This feels like an existential threat to HN, and to the general concept of anonymous online discourse. Trust in the platform is foundational, and without it the whole thing falls down.
Requiring proof of identity is the only solution I can think of, despite how unappealing it is. And even then, you'll still have people handing their account over to an LLM.
I really struggle to imagine a way around it. It could be that the future is just smaller, closed groups of people you know or know indirectly.
I think we’re on the precipice of this being a requirement to have any faith you’re talking to another human. As a side effect it also helps avoid state actors from influencing others.
It adds enough of a barrier to be worth it. In the way I have implemented it, you can only have one account per ID (for example passport). Yes, you can buy fake passports, but it's prohibitively expensive. Read my blog post for more info.
Another option instead of using identity is to use proof of work or hashcash such that anyone who thinks a comment is valuable can use some hash rate to upvote it. It doesn't matter how the content was generated, only that someone thought it was important, and you can independently verify this by checking how much hash effort went into hashing for that comment. This also does not require any identity either.
I don't feel like using HN anymore, I hope the just add invites, last time I said this someone replied it's just the same as some other site then, but it's not... hn is hn...this situation is really bumming me out.
Invitation only is a reasonably successful alternative for niche communities, especially with the ability to banish an invite "tree".
My conspiracy theory: Campaign money, from the last few elections (I think "Correct the record" [1] was the first "disclosed" push), resulted in a bunch of bot accounts being made/bought all across social media. These are being lightly used to maintained some reasonably realistic usage statistics, and are "activated" to respond to key political topics/times. This is on top of spam accounts to push products and, of course, the probably higher-than-average bot number of accounts made for fun by HN users.
One of the things HN does is not let you interact in certain ways until you've earned sufficient karma. This is a basic proof-of-work. If your bot can't average a positive karma, then it'll never get certain privileges.
Not to say the system is perfectly tuned for bots, because it's not. The point is that proof of identity is not the only option.
They get the privilege of immediately polluting the website with LLM-generated comments.
Many of them sound and look completely normal and have others on here interacting with them. They don't use em dashes, sometimes they'll use all lowercase text, sometimes the owner of the bot will come out and start commenting to throw you off.
All examples I've witnessed here.
HN should immediately start implementing at least some basic bot detection methods without requiring us to email them every time. I've discovered multiple bots make detailed comments within 30 seconds of each other in different threads, something a normal human wouldn't be able to do. That should be at least flagging the account for review. Obviously they'll get smarter and not do that soon but it would help in the short term.
I'd say it's not an issue but everything I described above has happened in less than a month and every day now I'm discovering bots here.
HN is almost entirely about the comments. Voting is useful as a tool for loosely sorting content but otherwise, HN could easily do without it. Some of the most valuable comments come from people with barely any karma. And that’s why HN is great! The restrictions on voting and flagging for new users could be removed without impacting the quality of HN. I can’t imagine any scenario in which HN’s current system could survive the same slopification that is happening on reddit.
HN is doing okay at the moment because nobody is yet publishing ebooks and videos on how to astroturf HN to launch your SaaS. Unfortunately, Reddit hasn’t escaped that fate.
One pattern I've noticed recently is sort of formulaic comments that look okish on their own, maybe a bit abstract/vague/bland, and not taking a particular side on good/bad in the way people like to do, but really obviously AI when you look at the account history and they're all the same formula:
>this is [summary]
>not just x, it's y
>punchy ending, maybe question
Once you know it's AI it's very obvious they told it to use normal dashes instead of em dashes, type in lowercase, etc., but it's still weirdly formal and formulaic.
"this is the underreported second-order risk. Micron, Samsung, SK Hynix all allocated HBM capacity based on hyperscaler capex projections. NAND fabs are similarly committed. a 57% reduction in projected OpenAI spend (.4T -> B) doesn't just affect NVIDIA orders -- it ripples into the memory suppliers who shifted capacity to HBM and away from commodity DRAM/NAND. if multiple hyperscalers revise down simultaneously you get a situation similar to the 2019 crypto ASIC overhang: companies tooled up for demand that evaporated. not predicting that, but the purchasing commitments question is real."
The user [1] you've mentioned has 160 points being a poster of total four bland messages. This goes against a normal statistical distribution. And this gives away why they do it: the long-term aim is to cultivate voting rings to influence the narratives and rankings in the future. For now, this is only my theory but it may be a real monetization strategy for them.
I'd be interested to know why those comments were flagged actually. They don't scream AI and no-one has replied calling them out as AI, etc. But the vast majority are dead.
That's why. Boring, bland, etc. That account's M.O. is basically "write a paragraph that says nothing." Fwiw, I do think AI can be indistinguishable from dumb, boring people, but usually those kinds of people won't be on HN.
The account was immediately shadowbanned after re-awakening from a long period of inactivity.
I agree it doesn't seem obviously AI. The early comments are all in the same writing style and smell human. Lots of strong opinions e.g.
"logged in after years away and had basically the same experience. the feed is just AI slop and engagement bait now, none of it from people I actually followed." [about Facebook]
HN has got a big problem with silently shadowbanning accounts for no obvious reason. Whether it's an attempt to fight bots gone wrong or something else isn't clear. By the very nature of shadowbanning there is no feedback loop that can correct mistake.
Pretty sure they weren't shadowbanned immediately, since people replied to some of those [dead] comments. Most likely the shadowban was applied retroactively after posting the more obviously generated stuff.
The only practical purpose I can think of for farming karma on HN with an LLM would be to amass an army of medium-low karma accounts over time and use the botnet for targeted astroturfing or other mass-manipulation. Eek.
I'll actually post a comment or question and I'll get a reply with a bit of a paragraph of what feels like a very "off" (not 'wrong' but strangely vague) summary of the topic ... and then maybe an observation or pointed agenda to push, but almost strangely disconnected from what I said.
One of the challenges is that yeah regular users don't get each other's meaning / don't read well as it is / language barriers. Yet the volume of posts I see where the other user REALLY isn't responding to the other person seems awfully high these days.
AI generated content routinely takes sides. Their pretense of neutrality is no deeper than a typical homo sapien's. This is necessarily so in an entity that derives its values from a set of weights that distill human values. Maybe reasoning AI can overcome that some day, but to me that sounds like an enormous problem that may never be solved. If AI doesn't take sides like people do they still take sides in their own way. That only becomes obscure to the extent that their value judgments conflict with ours, and they are very good at aligning with the zeitgeist values, so can hide their biases better than we can.
I wonder if it is neural networks that are inherently biased, but in blind spots, and that applies to both natural and artificial ones. It may be that to approximate neutrality we or our machines have to leave behind the form of intelligence that depends on intrinsically biased weights and instead depend on logically deriving all values from first principles. I have low confidence that AI's can accomplish that any time soon, and zero confidence that natural intelligence can. And it's difficult to see how first principles regarding human values can be neutral.
I'm also skeptical that succeeding at becoming unbiased is a solution, and that while neutrality may be an epistemic advance, it also degrades social cohesion, and that neutrality looks like rationality, but bias may be Chesterson's Fence and we should be very careful about tearing it down. Maybe it's a blessing that we can't.
It's wierd because the barrier to not have that in is so low, you can just tack on 'talk like me not AI, dont use em dashes, don't use formulaic structures, be concice' and itll get rid of half of those signals.
> First impression: I need to dive into this hackernews reply mockup thing thoroughly without any fluff or self-promotion. My persona should be ..., energetic with health/tech insights but casual and relatable.
> Looking at the constraints: short, punchy between 50-80 characters total—probably multiple one-sentence paragraphs here to fit that brevity while keeping it engaging.
> User specified avoiding "Hey" or "absolutely."
Lots more in its other comments (you need [showdead] on).
I don't understand why someone would go through the effort to prompt that when the comments it suggested are total garbage, and it seems like would take similar effort to produce a low quality human written comment.
In some cases, it's probably to establish aged accounts that are more trusted by users and spam algorithms. There's a market for old Reddit accounts, for example.
I receive multiple offers a year to participate in spam rings with the 20 year old high-karma reddit account. I usually just ignore them or report them. I could be making so much money /s
I went through a phase where I milled responses through grinding plates of LLMs. Whether my reasons are shared with others remains unknown.
My relationship with writing, while improved, has been a difficult one. Part of me has always felt that there was a gap in my writing education. The choices other writers seem to make intuitively - sentence structure, word choice, and expression of ideas - do not come naturally to me. It feels like everyone else received the instructions and I missed that lesson.
The result was a sense of unequal skill. Not because my ideas are any less deserving, but because my ability to articulate them doesn't do them justice. The conceit is that, "If I was able to write better, more people would agree with me." It's entirely based on ego and fear of rejection.
Eventually, I learned that no matter how polished my writing is, even restructured by LLMs, it won't give me what I craved. At that moment, the separation of writer and words widened to a point where it wasn't about me anymore and more about them, the readers. This distance made all the difference and now I write with my own voice however awkward that may be.
Same as Reddit. Accumulate enough points via posting shallow and uninteresting—yet popular—dialogue to earn down voting and flagging abilities, which can be used (via automation) to manipulate discussions and suppress viewpoints.
Slashdot's system was superior because mod points were finite and randomly dispensed. This entropy discouraged abuse by design—as opposed to making it a key feature of the site.
It's the Achilles' heel of Reddit and every site that attempts to emulate it.
Critically, Slashdot also had a meta-moderation system, where users were asked to judge moderation activity to confirm whether it was sensible, fair, and so on. I'd like to believe that system played a vital role in stopping abuse of the moderation system. It was way ahead of its time.
I've been advocating for a while now that HN could use meta-moderation at least on flagging activity, so it can stop giving flagging powers to users who are using it for reasons other than flagging rulebreaking.
Reddit awards one karma for a comment if it doesn't get downvoted. I noticed the other day that I got a pretty random and only tangentially relevant comment on a one month old post I made. I checked out the user, and they were only commenting on old posts to slowly accumulate karma. Only the poster will be notified about such a comment, and as long as it is made to be made of platitudes, most people will not bother downvoting.
Scams (romance scams or convincing people to run some code on their machine), influence operations by an intelligence agency, or advertising a product.
tirreno guy here, we develop an open-source fraud prevention / security platform (1).
Sometimes there is no clear explanation for fake account registration. Perhaps they were registered to be actively used in the future, as most fraud prevention techniques target new account registration and therefore old, aged accounts won't raise suspicion.
Slightly off-topic, but there are relatively new `services` that offer native brand mentions in reddit comments. Perhaps this will soon be available for HN as well, and warming up accounts might be needed for this purpose.
Some of the AI comments end with a link to something they're plugging. "If you'd like to learn more about this I have a free guide at my website here". Those get flagged quickly.
Other accounts might be trying to age accounts and dilute their eventual coordinated voting or commenting rings. It's harder to identify sockpuppet accounts when they've been dutifully commenting slop for months before they start astroturfing for the chosen topic.
I'd expect everything. HN ain't some local forum but place where opinions form and spread, and these reach many influential and powerful (now or in future) people. Heck there are sometimes major articles in general news about whats happening here.
To reverse the argument - it would be amateurish and plain stupid to ignore it. Barrier to entry is very low. Politics, ads, swaying mildly opinions of some recent clusterfuck by popular megacorp XYZ, just spying on people, you have it all here.
I dont know how dang and crew protects against this, I'd expect some level of success but 100% seems unrealistic. Slow and steady mild infiltration, either by AI bots or humans from GRU and similar orgs who have this literally in their job description.
I worked for GitHub for a time. There was a cultural abhorrence of the diaeresis, it was considered reader-hostile and elitist. I refused to coöperate with that edict internally, although I grant that every company has the right to micro-manage communications with the public.
It exists to indicate how a word is pronounced. Naïve is a better example IMO, cooperation feels too familiar.
Non-native speakers might see something like "nave" instead of "nigh-eve" unless it is clear that there is a stress that breaks out of the diphthong.
I don't think style guides are (usually) about absolute correctness, but relative correctness. A question is asked, a decision needs making, someone makes it, and now a team of individuals can speak with a consistent voice because there's a guideline to minimize variation.
IIRC it's use is to distinguish vowels that belong to separate syllables with vowels which form a diphthong. I think this could be beneficial to language learners, to give them a hint that cooperate is pronounced "ko ah puh rayt" instead of "ku puh rayt", and likewise naïve as "nah eev" than "nayv" or "nighv".
Yes. To be fair, I was always a barbarian who just typed a hyphen in-place of an emdash and figured that was good enough. The only REAL em-dashes in my pre-AI writing are the result of autocorrect.
I was going to say that I respect it, but find it utterly absurd that they do that. But your comment made me look it up again—I had no idea it was just obsolete/archaïc (except in the New Yorker), I'd thought it was a language feature their 'style' guide had invented.
Dutch does this. Idea is idee, with the e doubled to show it's a long vowel. We make plurals by adding "en". One idee, two... ideeen? Idewhat? So the dots differentiate where the sound changes (long e to short e): ideeën. Approximate pronunciation could be "ID an"
Fun fact: if you have the audacity to correctly write an SMS, you can fit about 70 characters in an SMS. It converts the whole message into multibyte instead of only adding dots to the one character. Or if you use classic spelling for naïve in English, same issue. (We don't dots-ize that in Dutch because ai is not a single sound like ee is, so there's no confusion possible. This is purely English.) I believe in Hanlon's razor so it's probably a coincidence that whoever cooked up this terrible encoding scheme made carriers a lot of money, but I do wonder if this had anything to do with the bug still existing to this day!
If it wasn't for them misconfiguring their bot and having it post so quickly, these would go by undetected and most people would engage with them. The comments themselves seem "normal" at first glance.
Does this comment break HN for anyone else? I can press "next" on any other post, but not this one. And in the next post, pressing "prev" does not scroll to this one. It does nothing. Prev works fine when pressed on this (or any other) post
It's rendering visibly narrower than the big dash up thread for me, on FF on Android. (Maybe HN's stripping one or more of the combining chars though, so it's not actually showing what you meant in full?)
It isn't a special letter or symbol in arabic, it's just a regular sentence that was added to unicode since it both holds symbolic meaning in islam and is used often enough to be useful. Some fonts render it like any other arabic, making it look like one big sentence as a single character, but others render it as calligraphy
Downstream of this I used to cycle my accounts pretty regularly but have stopped since generative AI. Don't want people thinking I'm an LLM spam bot. My stupid comments are entirely my own.
On reddit it's even worse, I feel like Reddit is internally having their own bots for engagement bait.
As someone who loves LaTeX, I can't imagine ever spending so much time on typography on online forums, italics, bold, emdashes, headers, sections. I quit reddit and will quit hn as well if situation worsens.
I have the sneaking suspicion that reddit has allowed and facilitated astroturfing for over a decade. As in, providing accounts, eliminating rate limits, artificially boosting posts and comments, and aggressively shadow banned contrary opinions. This is definitely a known phenomenon on a auto moderator level but I bet reddit ownership is complicit in it too
Not sure if serious but I don't think that's precisely it. To me, it's more that it rehashes a point until it's fully beaten to death, putting obvious aspects in a list, being subtly wrong, writing a conclusion paragraph to the previous three sentences... it's boring but not because of what it writes but, instead, how it writes it. Of course, it can also be inherently uninteresting but then you should have entered a prompt that causes the autocomplete function to ramble about something you're interested in :P
It also feels way too sanitised, like it went through some companies PR department (granted, that's because it went through openais pr department, but still)
It would be trivial to make a HN comment agent that avoids all the usual hallmarks of AI writing. Mere estimations of bot activity based on character frequency would likely underestimate their presence.
(2) I do recommend taking one minute to dash a note off to hn@ycombinator.com if you see suspicious patterns. Dang and our other intrepid mods are preturnatually responsive, and appear to appreciate the extra eyeballs on the problem.
I have sent them an email a few days ago about the state of /noobcomments.
This wasn't really a intended as an "wow, dang is sure sleeping on the job", more than an interesting observation on the new bot ecosystem.
I also feel like there's a missing discussion about the comment quality on HN lately. It feels like it's dropped like crazy. Wanted to see if I could find some hard data to show I haven't gone full Terry Davis.
Is there even an incentive to optimize for such signals, though? Em-dashes have been a known indicator of AI-generated text for a good while, and are still extremely prevalent. While someone who doesn't like AI slop and knows and what to look out for will notice and call out obvious AI comments, the unfortunate truth is that the majority of people simply cannot tell, and even among those who can, many don't care.
Obvious AI-generated posts and articles make it to the front page on a daily basis, and I get the impression that neither the average user nor the moderation team see that as a problem at all anymore.
I noticed a similar trend a couple of weeks ago so I auto-hide green comments now. I also autohide all top 1000 user accounts but it strikes me that perhaps I should also choose a “user signed up on $date” filter that precedes OpenClaw.
If I see an em-dash in a comment I stop reading and I've seriously considered setting up a filter across multiple sites to remove any comments containing one.
I know there are legitimate usecases for the em-dash, but a few paragraphs (at most) of text in an HN/Reddit comment? Into the trash it goes.
(author) I saw a 32:1 rate of EM-dashes last night when I just eyeballed the first 3 pages of /newcomments and /noobcomments. So I'm not sure how stable this is over over time.
This is probably the time to add some invitation system like GMail had in the beginning. Or make a shade for accounts <1yr. Or something else, before things get too mixed.
The issue with creating some hidden maturity heuristic for accounts is that it will be gamed just the same as any other, except that using age alone is the simplest heuristic to game. You can simply do nothing for incrimental periods of time and then begin testing aged accounts to roughly determine what the minimum age an account must reach to become "trusted".
Bot prevention is a very difficult constant game of cat and mouse, and a lot of bot operators have become very skilled at determining the hidden metrics used by platforms to bless accounts; that's their job, after all. I've become a big fan of lobste.rs' invitation tree approach, where the reputation of new accounts rides on the reputation of older accounts, and risks consequence up the chain. It also creates a very useful graph of account origin, allowing for scorched earth approaches to moderation that would otherwise require a serious (and often one-off) machine learning approach to connect accounts.
I just took a look at /noobcomments and wow, there's ever a comment where a person argues with AI instead of, you know, using their own brain. It was abivous it was ai since it was formatted with markdown
I wanted to point out that em dashes are autocompleted by the iOS keyboard. So the false positives and true negatives might have some overlaps without more details. I think a better indicator would be to only detect em dashes with preceding and following whitespace characters, and general unicode usage of that user.
Additionally, lots of Chinese and Russian keyboard tools use the em dash as well, when they're switching to the alternative (en-US) layout overlay.
There's also the Chinese idiom symbol in UTF8 which gets used as a dot by those users a lot, so that could be a nice indicator for legit human users.
edit: lol @ downvotes. Must have hit a vulnerable spot, huh?
I think there is a baseline number of human users that for one reason or another uses em-dashes, but this doesn't explain why they 10x more prevalent in green accounts.
> I think there is a baseline number of human users that for one reason or another uses em-dashes, but this doesn't explain why they 10x more prevalent in green accounts.
I'm not trying to negate the fact. I'm just pointing out that a correlation without another indicator is not evidence enough that someone is a bot user, especially in the golden age of rebranded DDoS botnets as residential proxy services that everyone seems to start using since ~Q4 2024.
@dang would there be any possibility of creating a view that hides posts and comments by accounts newer than, say, Jan 1 2026? Similar to how https://news.ycombinator.com/classic works (only showing votes from the oldest accounts)?
I know this is unfair to prospective new community members, but I'm unsure of other good methods to filter out AI bots at scale. Would certainly welcome other ideas.
I read every book written by Robert Caro—now there was an author who loved em-dashes!
I enjoyed his use of them so much in his writing that I started using them in my own book that came out in 2017. I freely admit—without hesitation—that my own use of em-dashes is due to author Robert Caro's influence.
There is much amusement at the idea that tech-weenies today are freaking out that the appearance of em-dashes in text is a surefire tell that so-called "AI" generated said text.
I’ve had this sense that HN has gotten absolutely innundated with bots last few months.
Is it possible to differentiate between a bot, and a human using AI to 'improve' the quality of their comment where some of the content might be AI written but not all? I don't think it is.
I find the bigger problem with online comments are that people repeat the same comments and "jokes" over and over and over again. Sure we had those with YouTube 15 years ago when people always spammed "first!" and "who is listening in <year>?" but now it's gotten worse and every single comment is now just some meme (especially on Reddit) or some kind of "gotcha"...
Not exactly, bot farms can still be made with poor people IDs through black market. I don't know what the solution is going to be, but at some point we might forced to accept the reality that on the internet humans and AI won't be distinguishable anymore and adjust our services independently on the client being a person or a machine.
AI post "improvements" are the most annoying thing. I see more and more people doing it, especially when posting reviews/experiences with things, and they always get called out for it. They always justify it with "AI helped me organize what I wanted to say." Like man, you're having an AI write about an experience it didn't have and likely didn't even proofread it. Who knows what BS it added to the story. Even disorganized and misspelled stories are better than AI fantasy renditions that are 20 times longer than they need to be.
I just assume if any comment sounds like an ad it's a bot. All the comments like "I'm 10x faster with Claude Opus 4.6!" or "Have you tried Codex with ChatGPT 5.X? What a time to be alive!" can be lumped in the bot bin.
> human using AI to 'improve' the quality of their comment
I want to hear people in their own voice, their own ideas, with their own words. I have no interest in reading AI generated comments with the same prose, vocabulary, and grammar.
I don't care if your writing is bad.
Additionally, I am sceptical that using AI to write comments on your behalf creates opportunities for self-improvement. I suspect this is all leading to a death of diversity in writing where comments increasingly have an aura of sameness.
I don't personally care about the distinction especially since AI usually 'improves' things by making it more verbose. Don't waste tokens to force me to read more useless words about your position - just state it plainly.
If you are suspicious, look at comment history. It's usually fairly obvious because all comments made by LLM spambots look the same, have very similar structure and length. Skim ten of them and it becomes pretty clear if the account is genuine.
I'm more worried about how many people reply to slop and start arguing with it (usually receiving no replies — the slop machine goes to the next thread instead) when they should be flagging and reporting it; this has changed in the last few months.
I'm never suspicious though. One of the strange, and awesome, and incredibly rare things about HN is that I put basically zero stock in who wrote a comment. It's such a minimal part of the UI that it entirely passes me by most of the time. I love that about this site. I don't think I'm particularly unusual in that either; when someone shared a link about the top commenters recently there were quite a few comments about how people don't notice or how they don't recognize the people in the top ranks.
The consequence of this is that a bot could merrily post on here and I'd be absolutely fine not knowing or caring if it was a bot or not. I can judge the content of what the bot is posting and upvote/downvote accordingly. That, in my opinion, is exactly how the internet should work - judge the content of the post, not the character of the poster. If someone posts things I find insightful, interesting, or funny I'll upvote them. It has exactly zero value apart from maybe a little dopamine for a human, and actually zero for a robot, but it makes me feel nice about myself that I showed appreciation.
I was thinking of how to create a UX around quantifying or qualifying AI use. If products revealed that users had used in-app AI to compose their responses, they might respond by doing it outside the app and pasting it in. If you then labeled pasted text as AI they might use tools to imitate typing. And after all that, you might face a user backlash from the users who rely on AI to write.
Honestly, comments are just half the problem. At least half the articles I read from HN are vibe written. And I only spot it after reading a few paragraphs. It's leaving a bad taste, and it's sad because HN was guaranteed to have plenty of things worth reading and it's deteriorating
I don't understand what is the purpose of these bots? Nihilism? Vandalism? At first I doubted when people were saying that such and such comments was AI generated, I didn't understand the goal, the motives so I thought it couldn't be ; but lately I understood how dead wrong I was, we are submerged, I came to realize that we are eaten by a sea of these useless comments.
the motive is probably more depressing. a normal human who just wants human interaction. people interacting with something "you" wrote just feels nice and people like that stuff.
You can turn off iOS automatically converting dashes to em-dashes. It also turns off smart-quotes which when used converts any sms you send from normal GSM-7 (7-bit) encoding to utf-8 which doubles the number of sms messages you're sending in the background (even though they're stitched together to look like a single message)
To turn off Smart Punctuation: Home > Settings > General > Keyboard > Smart Punctuation > Off.
Weirdly, I learned that it was important to use proper grammar, spelling, and punctuation due to getting repeatedly dunked on in IRC long before the dawn of LLMs. I have no intent of changing, and people thought I was an "old" when I was younger because I texted with correct language, I'm sure people suspect I'm an LLM now. I don't care, and I don't try to guess for other comments either, I care if the content is relevant, accurate, and useful or interesting.
The use of em dashes is a human right. I ask that people not discriminate against em-dash users—we should be a protected class—and I refuse to abandon them. Perhaps I’ll have one engraved on my tombstone. He died doing what he loved—dashing.
I encourage people to discriminate against me because I write like an educated African who works annotating AI training material.
Why not? I am a descendant of Africans. I am a mildly successful author by tech nerd standards. I was educated in the British Public School tradition, right down to taking Latin in high school and cheering on our Rugby* and Cricket teams.
If someone doesn't want to read my words or employ me because I must be AI, that's their problem. The truth is, they won't like what I have to say any more than they like the way I say it.
I have made my peace with this.
———
Speaking of Rugby, in 1973 another school's Rugby team played ours, and almost the entire school turned out to watch a celebrity on the other school's team.
His name was Andrew, and he is very much in the news today.
Funny thing is I started using them in the last 5 or 6 years myself in place of commas where I wanted to interject some extra info. Of course I'm lazy and don't bother typing the actual em dash, I just use a regular dash. Now I feel gross using them because I don't want people thinking I turned my brain off.
I have always used double-dashes instead of emdashes, and it annoys me when software "auto-corrects" them into emdashes. Moreso since emdashes became an AI tell.
I also see AIs use emdashes in places where parentheses, colons, or sentence breaks are simply more appropriate.
There is one thing I am the most scared off and that is believing a comment, video, picture is AI generated while it wasnt.
There is no real AI detection tool that works.
When we see something like emd-ashes its simply the average of the used text the models trained on. If you fall into one the averages of a model you basically part of the model ouput. Yikes.
My truth is that the LLM usage of em-dashes doesn’t seem excessive. If anything, the kind of text generated by LLMs (somewhat informal, expressive) calls for em-dashes at a higher frequency.
I had a past life of drumming up community comments for engagment: The only thing that's changed is that humans are getting lazy and using AI. Fake comments have always been a thing.
I'm sure you can't share details but would be cool to hear more about it generally speaking, what worked and not etc. Especially if it involved HN.
Our company is being attacked rn in tech media and at least some of it, gut feeling wise, seems obviously sponsored / promoted by competitors. I know that's not surprising, but never watched it happen from this side before.
The key was to present what looked like a lively debate. The dirty trick was to have the "bad side" over state the position horribly. For example, to make Republicans look bad we'd start having their fake personas use subtle racism.
Actually I love the — ever since my first Mac, I have enjoyed the finer characters of typography. It’s much easier to access on a Mac keyboard. Not saying the proliferation of AI has that as a signature, like the weird phrasing, but at least allow for the few mammals who likes to indulge.
You'd think that by now people running bots would just set a system prompt instruction to "Never use em-dashes." That still works even with modern models.
700 is actually a pretty good sample size unless you are looking at some tiny crosstab, or there’s some skew (which you won’t naively scale your way out of anyway).
It is also interesting to note that the comparison is between recent comments and recent comments by new users. So, I guess this would take care of the objection that em-dashes (a perfectly fine piece of punctuation) have just been popularized by bots, and now are used more often by humans as well.
Maybe there is a bot problem. Seems almost impossible to fix for a site like this…
I think what a larger sample size would do would be to help capture changes over time. Humans tend to be more active certain times of days, whereas bots don't tend to do that.
The part that doesn't make sense to me is: Why? As in what are the incentives to use AI to write comments on HN? This is not a platform like Youtube or X where views get you money. Is this just for internet karma?
I think it's just people experimenting with conversational bots. If you can get your bot to participate in a conversation on HN without being identified as a bot then it's better than those that do.
People have posted their blogs here before and gotten the HN hug of death plus a few hundred comments. It's 2026 and not 2016; HN is a much larger platform than people seem to think it is and HN has significant eyes to be shifted if your posts reach the front page. And given how cheap it is to throw bots at whatever site has open registration it doesn't surprise me to see manipulation here.
I know on reddit since basically the very beginning there has been a market for accounts with authentic but anodyne histories. It ends up being easier to make them yourself and then just occasionally use them for whatever guerilla marketing, astroturf campaign, or propaganda operation is your actual goal. But still until recently you pretty much had to pay people to sit and post on social media on a bunch of accounts to generate these histories.
This use was one of the first things that occurred to me when LLMs started getting genuinely good at summarizing texts and conversations. And I assume a fair bit of this has always happened on HN too. I've never moderated here obviously so I have no first hand insight but the social conventions here are uniquely ripe for it and it has a disproportionate influence on society through the dominance of the tech industry, making it a good target.
Getting stories on the front page of HN can get them spread across the entire media. Also, you end up with a bunch of accounts that can upvote comments/posts that you're pushing and downvote/flag the ones that criticize them.
Lots of dumb blogs from unknowns about vibecoding entire products in a weekend using specific AI slop generators from new startups that are getting to the front page lately.
remember: 1) those accounts that are causing 10x as many em-dashes are the dumb AI accounts. The smart ones are at the least filtering obvious tells from the output. They might even outnumber the dumb ones.
2) Also, a lot of people are real, but using AI to make themselves sound smarter. It's not necessarily completely nefarious.
Using em-dashes as an estimate has to result in a bunch of undercounting and overcounting.
Our life has become so dumb in certain ways. There are people who invested heavily in learning their mother or a foreign language, its spelling, grammar, syntax and idiosyncrasies, like when to use an em-dash, an Oxford comma, a semicolon, an ellipsis -- these smart educated people now seriously deliberate whether using wrong dashes and adding a spelling mistake or two would be a good way to prove you are a human (I think we never should have allowed the framing of CAPTCHA to be "prove you are not a robot", it was demeaning back then and still is now, it's just that the alternatives were not and still aren't clear-cut). The same things that would have made you fail a written essay in school are somehow becoming a requirement, but not in "haX0r" or online communities where "writing funny" has always been a differentiating factor, but for absolutely everybody who has to communicate with others in written form.
It's of course not a surprise that an LLM would be most proficient in language use and, adjacent to that, in proper formatting of said language. But it's a good thing and a good tool for writing, as anyone who has ever used a classic spell or grammar checker will attest to. But apparently we as a society have once again managed to completely overlook and demonise the good and now people who have paid attention in school have to bow to people who are somehow convinced that perfect spelling is a sign that someone cheated. This is not LLMs' fault, it's people's who think they've understood something when they really haven't, crying heresy over others doing things the correct way.
That being said: of course there are social and technological challenges with cheating, spam bots and sock puppets and what not, but the phenomenon itself is not really new, just the scale, cost and quality is way different now. We need to find a balanced way to approach it -- trying to weed out every last possible AI cheater while hurting real innocent people in the process is not worth it. Especially since we don't have a proper metric to actually prove who's a cheater and who is not, it's gotten way harder since the days of "As a large language model" being in every second sentence.
I felt it quite a while back (more than 10 years ago), when, in high school, I learnt LaTeX and discovered Beamer. I naturally proceeded to make all of my presentations with it, including the rehearsal for a big competitive French exam. The person reviewing the presentation advised me to dirty it up a bit, otherwise nobody would believe that my father wasn’t a PhD researcher that did the work for me.
That was a bit saddening honestly. I kept the presentation as-is as I didn’t knew how to willfully screw up a Beamer presentation, and I would not touch PowerPoint (fortunately the final jury believed me).
Cheating had always been an issue before LLMs, but now we’re back to the same old tricks: just make sure to add a mistake or two to hide you copied the homework on your neighbor. It’s a shame because I kinda like learning the subtleties of foreign languages, and as a non-native English speaker, it’s quite rewarding when going online!
If we are ok with flooding the world with AI generated software. I find it funny to reject the increase of comments or even articles written by AI. Can't have the cake and eat it too or something like that
Listen, I fully support your right to buy and use whatever you want from Priscilla’s or Adam & Eve. Keep it consensual and not in public view though, okay?
AI use is similar. Ask it to do whatever writing or text wrangling you want, but please show the public the sanitized version.
I used to love using em-dashes in my texts, especially in titles. Now I am way too afraid of appearing as using an LLM while I do my best to redact everything by myself :')
I had no idea what I was using were called “EM-dashes” until the AI bubble. I just used them to reflect pauses in my speech for tangents - an old habit from my IRC days.
Incidentally, some folks reported my stuff for potential AI generation and I had to respond to the mods about it. So that was kinda funny, if also sad to hear that some folks thought I was a bot.
I’m a dinosaur, not a robot dinosaur. I’m nowhere near that cool, alas.
But the em-dash is a different character. I think even those that use a pause would just opt for - on their keyboard, whereas the em-dash — requires additional work on most (all?) keyboard layouts. It's _not_ more work for an AI though hence why it's a tell.
No, there are actually four different punctuation marks, all which look remarkably similar to the untrained eye.
1. We have the hyphen, which is most commonly used to create multi-part words, such as one-and-one-thousand.
2. We have the EN-DASH, which is most commonly used to denote spans of ranges. As an example, Barack Obama was President 2009–2017.
3. Then we have the recently maligned EM-DASH, which can be used in place of a variety of other punctuation marks, such as commas, colons, and parentheses. Very frequently, AI will use the em-dash as a way to separate two clauses and provide forward motion. AI uses it for the same reason that writers do: the em-dash is just a nicer punctuation mark compared to the colon.
4. Lastly, we have the minus sign, which is slightly different than the hyphen, though on most keyboards they're combined into the hyphen-minus.
By the by, they're called the em-dash and the en-dash because they match the length of an uppercase M or N, respectively.
It is probably even a hyphen-minus, so called because on most early keyboards one character had to do to represent both a hyphen and a minus. In Unicode, there is a separate code point for an unambiguous hyphen. There is also a non-breaking hyphen as well as the various dashes discussed here.
And "--" is absolutely just two hyphen-minuses, not an em-dash (—).
As a typography nerd, I’m upset that my pedantism may get me labelled as a bot. (Yes, I just used a typographic apostrophe instead of a straight single quote.)
Yeah, same. I use an extended keyboard layout on my PC. I'm so used to it I have to actively decide against using proper quotes and dashes and whatnot. I don't bother on mobile, though.
Every time someone states they stop reading when they encounter proper typography, I feel attacked.
I learned just right now that this isn't the default. I set my bookmark to HN in like 2011 before making an account, and apparently it's that one. I didn't realize that wasn't just the basic homepage but with a weird address for some reason.
It makes it much more fun to imagine a room full of robots in overcoats trying to pass off as human, but doing a terrible job due to the audible "clanks" betraying them from beneath the coat.
Spaces like HN then become a cacophony of clankers clanking as their numbers increase
Not everyone speaks English natively—most have relied on Google Translate in the past, and now they turn to AI tools. There's nothing wrong with that, provided they're using these tools to express their own thoughts rather than for spam or botting.
It has been obvious since ChatGPT that the internet, including HN, will be flooded with AI generated commentary, drowning out real peoples' voices (soon undetectable). How this is surprising to anyone is a mystery.
Can we generate a huge amount of code, just compilable code, which is essentially just a trash. We seed the github, bitbucket, etc. and pollute the training grounds.
The fear is that AI-generated comments will collectively promote an agenda, often a political or exploitative agenda, on a scale that humans can't match or hope to counter.
What could help is a careful clique hunting algorithm to accurately identify and delete the entire clique.
I feel a sort of disappointment in how easily languages got swindled. There is seemingly no winning angle this time. This is the most doomed I've ever felt.
This user [0] is clearly a bot and has been shadowbanned but some of it's comments get vouched because they're pretty good. I don't see how you solve that problem!
Karma aside, flooding the comments with a chosen narrative via army of bots seems like it's already happening. I suppose the bots can also do voting rings, but they don't necessarily need to.
Yeah, right? Not one ever actually turned out to be true!
That conspiracy about billionaires, who supposedly own all of western media, having deliberately created an environment in which anyone who expresses even the remote idea of a conspiracy, gets discreditted, is also not true!
Would be interesting to see "fastest growing accounts in last N months" or something similar. I'm guessing the ones that are actually humans would be closer to the top than the bottom, but maybe HN users aren't better than the average person to detect AI or not.
doesn't really mean anything, Mac randomly autocorrects dashes to em-dashes (caused me a world of pain once when it did that in a GUID in a config file)
It can crank proof of work schemes to maximum, something like you need to burn 15-20 minutes 16 core cpu to post a single comment. It will be infuriating for users, but not cheap for bots
One solution is to get rid of anonymity online, enforce validation of identity. Every human only gets 1 account. And then we still ban people that use AI.
Might take a bit but eventually we'll have filtered out all the grifters.
Getting rid of anonymity is in time going to lead to getting rid of the platform, so do it if you're feeling suicidal. People seek real anonymity for good reason. Not everything should follow them in life or for life.
I've been wondering too, what the solution would be. IF the bots were actually helpful, I wouldn't care, but they always push an agenda, create noise, or derail discussions instead.
For now maybe all forums should require some bloody swearing in each comment to at least prove you've got some damn human borne annoyance in you? It might even work against the big players for a little bit, because they have an incentive to have their LLMs not swearing. The monetary reward is after all in sounding professional.
Easy enough for any groups to overcome of course, but at least it'd be amusing for a while. Just watching the swear-farms getting set up in lower paid countries, mistakes being made by the large companies when using the "swearing enabled" models and all that.
Something about correlation and causation of magic gotcha signals. Text may appear generated to a reader but there's no smoking gun evidence that can disambiguate fact from hypothesis. Even intuition isn't evidence.
Perhaps there needs to be some sort of voluntary ethical disclosure practice to disclaim text as AI-generated with some sort of unusual signifiers. „Lower double quotes perhaps?„
> How many of those are bots and how many of those are "fuck you, clankers" humans—like me?
Maybe the em dash is the self censorship/deletion mechanism that we've all been waiting for. Better than having to write pill subscription ads, I suppose.
Prior to the rise of LLM-written posts and the natural reaction of hair-trigger suspicion, I used to em and en dash fairly often in posts on HN. No reason really other than being a bit of a typography geek who happens to have always used dashes in casual writing instead of semicolons. So when I was setting up a modifier-key keyboard layer with AHK many years ago I put the em dash on modifier+dash just because I could - which made it easy.
Now someone may search old posts without a time cutoff and assume I'm an LLM. That combined with the fact I sometimes write longer posts and naturally default to pretty good punctuation, spelling and grammar, is basically a perfect storm of traits. I've already had posts accused twice in the past year of being an LLM.
Kind of sad some random quirk of LLM training caused a fun little typography thing I did just for myself (assuming no one else would even notice) to become something negative.
I use double hyphens instead of em-dashes when I'm on my computer. I think some programs will combine them into an em-dash but most of the time they're just double dashes.
My phone lets me long-press the hyphen key to get an em-dash so sometimes I'll use it.
Probably the biggest tell that I'm not AI is that I'm probably not using it in the appropriate circumstances!
My teenager recently asked me why I write like a chatbot, apparently unaware that some human beings prefer to write in complete sentences with attention to details like spelling, punctuation, grammar, and capitalization, and that LLMs were trained on this sort of writing.
This makes me think of the fad where people on youtube will hold a microphone up in frame, because it somehow connotes authenticity. I'm sure some people are already embracing a bit of sloppiness in their writing as a signal of humanity; I'm equally sure that future chatbots will learn to do the same.
2040 at Wal-Mart:
- Customer: Excuse me, I'm looking for the Aunt Jemima maple syrup. Can you point me in the right direction?
- Employee: y u ask like chatbot
Is the customer actually a chat bot though? That brand is renamed, but maybe after the training cutoff date.
> people on youtube will hold a microphone up in frame,
Now you need a really big microphone, something that looks like it was built in 1952.
Lapel mic clipped on a cooking utensil works as well.
The creator of OpenClaw, for example, has come to appreciate grammatical / spelling errors in human writing (as he said in a recent Lex Fridman interview).
I started making deliberate grammar and spelling mistakes in professional context. Not like I have a perfect writing anyway, but at least I could prove that it was self-written, not an auto-generated slop. (Could be self-written slop though :)
This applies not only work-stuff itself also to the job-applications/cv/resume and cover-letters.
unrelated but I've never understood how to put a smiley at the end of parenthetical sentences (which comes up surprisingly often for me since I use smileys a lot and also like using parentheses). Just the smiley as an end parentheses (like this :) feels off but adding another parentheses (like this :) ) makes it look like it should be nested which causes problems since I also tend to nest parenthetical sentences (like (this)).
Yes I enjoy lisp, how could you tell
The answer is obviously to balance your smiley faces and wrap the entire statement in the smiley face sentiment. ((: Like this :))
I like this simply for the absurdity of it, but will only use it when the entire parenthetical is modified by the smiley instead of a single word or phrase (:since I really like it:) but (it looks ugly, no hard feelings :) )
Your comment made me realise that there's logic to this (like this :), since in HTML we can:
instead of: <li> ... </li>and <img alt='this'> instead of <img ... />
You might like Lisp, but what you're saying reminds me of the late 00s/early 2010s xHTML2 vs. HTML5 debate :)
I'm an avid defender of xHTML. You can pry it from my cold dead hands
Thanks, I hate it :)
Post C++11 you can just do (like this:)), no extra space needed before the last parenthesis.
But then it looks like I'm using a double smiley[0] which I do actually use on occasion
[0] :))
Use dashes and the problem goes away! Well, you gain the LLM witch-hunt, but heh, no free lunch.
I tend to rephrase myself so I dont end a statement inside a parenthesis with a smiley.
It's one of those things I think are worth putting some extra effort into, I'm glad to see at least one other person giving it some thought. Thx <3
I have the same problem. I just ditch the smiley face. :)
The relevant XKCD: https://xkcd.com/541
I'm trademarking the improper use of it/it's, there/their/they're, were/we're, etc as a sign of my humanity. Apple's typocorrect is doing it for me anyways.
This only works as "proof" up until someone innovates an "authenticity" flag on the LLM output.
https://github.com/ethel-dev/misspell
I’ve been doing the same thing. Basically a Turing test.
I appreciate you including a few minor mistakes in this very post:
> I started making deliberate grammar and spelling mistakes in professional context[s]. Not like I have ~a~ perfect writing anyway, but at least I could prove that it was self-written, not an auto-generated slop. (Could be self-written slop though :)
> This applies not only [to] work-stuff itself also to the job-applications/cv/resume and cover-letters.
I conclude you are real.
I got similar accusations recently on reddit lol. Just because i am used to formatting markdown i like to format some of my reddit comments. i have no idea how to avoid the accusations besides typing less formally except by typing like thisss.
> default to pretty good punctuation, spelling and grammar
If leaving out the Oxford comma here was an intentional joke I both commend and curse you!
You're absolutely right! I kid. I'm also a former avid user of the em-dash, but have mostly stopped using it. I've even started replacing em-dash usage with commas, which often results in a slightly awkward, perhaps incorrect, but quaintly artisanal sentence with a LaCroix-like spritz of authenticity.
My double-space-after-a-period though, I will keep that until the end. Even if it often doesn't even render in HTML output, I feel a nostalgic connection to my 1993 high school typing teacher's insistence that a sentence must be allowed to breathe.
Have the same problem but with bullet points, which I learned to type years ago and have used on HN for a long time:
• Like
• This
(option-8 on a Mac US keyboard layout). Now it looks like something only an LLM would do.
Hell I've been accused simply for using markdown. Granted, excessive formatting in markdown (especially when I'm telling a bad faith wikipedia contributor to cut it out since wikipedia doesn't even use markdown) is one of the biggest suspects for me but theres a difference between italicising something for emphasis and and *bolding* every statement *to an excessive degree*
I love using ° with is the opt-shift-8 when posting temps to indicate I'm on a real keyboard and not some device. Plus, it's just faster than typing degrees
My phone has the degree sign ° but it requires me to click on numerical input then additional symbols to access, so I just shorthand it to deg.
℃ and ℉ to the rescue! https://graphemica.com/%E2%84%83
For those who are interested, that one is Alt-7 (numeric keypad) on Windows. This works because in the "OEM" codepage (e.g. 437), char 7 corresponds to a symbol that is mapped into Unicode to • (← I just typed this using Alt-7, and the arrow using Alt-27). In a similar way I type the infamous ones—the ones that give you away as an LLM even if you aren't one. It's Alt-0151, this time with no OEM codepage conversion because of the zero in front (anyway that codepage had no em-dashes, the closest one would be Alt-196, which is ─, i.e. a line drawing character).
Ex-academic here. I too use/tended to use em-dashes quite a bit. It's easy to compose in Linux (Gnome) with a real keyboard: Ctrl Shift U 2014 is ingrained in my head from using them all the time in my academic work.
Are you familiar with the 'Compose' key/xcompose?
How dire the literacy crisis, that chatbots are their only exposure to composition.
Were you using them as a replacement for a comma--without spaces on both sides of the em-dash--like how I did just now? If no, you are safe from being mistaken for an LLM program. Honestly, while it is a legitimate punctuation rule, I've never seen a human on the internet to write like that. But LLMs do it constantly, whenever they generate long enough sentences.
I'm a human who writes like that, because mobile and desktop OSs have made it easy—so easy—to include things like em-dashes and other formerly uncommon punctuation. I also come from an age where people were taught things like proper grammar and punctuation, so go figure.
I've used the -- with no spaces in posts to HN multiple times.
I do agree... I sometimes use worse grammar (like that ellipses) and leave in typos just so my comments feel more "real" now.
fun fact, grok and kimi are both pretty good at emulating "chat" responses with any number of prompts.
"respond like a twitter user", "pretend like we're texting", etc
> fun fact, grok and kimi are both pretty good at emulating "chat" responses with any number of prompts.
> "respond like a twitter user", "pretend like we're texting", etc
+1 to it. I actually had given a response to the above parent comment itself using Kimi and I would've said that its (sort of) a good emulation fwiw.
Same here, but it'll be a cold day in hell before you see me using the dreaded double-period-bang..!
soon were gonna be the ones adding random typos and grammer errors just to blend in. i skip apostrophes and mispell words on purpose already. its strange how fast sloppy writing starts feeling natural
(This above line itself was written by AI itself: https://www.kimi.com/share/19c96516-4032-8b73-8000-0000f45eb...)
I don't know if worse grammar could make a difference aside from removing false negatives (ie. nowadays people with good grammar are questioned if they are LLM's or not) but this itself doesn't mean that worse grammar itself means its written by a human. (This paragraph is written by me, a human, Hi :D)
Honestly, first paragraph sounds more human and sincere for sure.
Also adding better "context" into the discussion, than the usual claims/punchlines of marketing-speak.
Maybe it's not exactly the grammar itself but also overall structuring of the idea/thought into the process. The regular output sounds much more like marketing-piece or news-coverage than an individual anyway. I think, people wanna discuss things with people, not with a news-editor.
> I think, people wanna discuss things with people, not with a news-editor.
If I understand you correctly, then Yes I completely agree, but my worry is that this can also be "emulated" as shown by my comment by Models already available to us. My question is, technically there's nothing to stop new accounts from using say Kimi and to have a system prompt meant to not sound AI and I feel like it can be effective.
If that's the case, doesn't that raise the question of what we can detect as AI or not (which was my point), the grand parent comment suggests that they use intentionally bad human writing sometimes to not be detected as AI but what I am saying is that AI can do that thing too, so is intentionally bad writing itself a good indicator of being human?
And a bigger question is if bad writing isn't an indicator, then what is?
Or if there can even be an good indicator (if say the bot is cautious)? If there isn't, can we be sure if the comments we read are AI or not
Essentially the dead-internet-theory. I feel like most websites have bots but we know that they are bots and they still don't care but we are also in this misguided trust that if we see some comments which don't feel like obvious bots, then they must be humans.
My question is, what if that can be wrong? It feels to me definitely possible with current Tech/Models like say Kimi for example, Doesn't this lead to some big trust issues within the fabric of internet itself?
Personally, I don't feel like the whole website's AI but there are chances of some sneaky action happening at distance type of new accounts for sure which can be LLM's and we can be none the wiser.
All the same time that real accounts are gonna get questioned if they are LLM or not if they are new (my account is almost 2 years old fwiw and I got questioned by people esentially if this account is AI or not)
But what this does do however, is make people definitely lose a bit of trust between each other and definitely a little cautious towards each message that they read.
(This comment's a little too conspiratorial for my liking but I can't help but shake this feeling sometimes)
It just is all so weird for me sometimes, Idk but I guess that there's still an intuition between whose human and not and actually the HN link/article iteslf shows that most people who deploy AI on HN in newer accounts use standard models without much care which is the reason why em-dashes get detected and maybe are good detector for sometime/some-people and this could make the original OP's comment of intentionally having bad grammar to sound more human make sense too because em-dashes do have more probability of sounding AI than not :/
It's just this very weird situation and I am not sure how to explain where depending on from whatever situation you look at, you can be right.
You can try to hurt your grammar to sound more human and that would still be right
and you can try to be the way you are because you think that models can already have intentionally bad grammar too/capable of it and to have bad grammar isn't a benchmark itself for AI/not so you are gonna keep using good grammar and you are gonna be right too.
It's sort of like a paradox and I don't have any answers :/ Perhaps my suggestion right now feels to me to not overthink about it.
Because if both situations are right, then do whatever imo. Just be human yourself and then you can back down this statement with well truth that you are human even if you get called AI.
So I guess, TLDR: Speak good grammar or not intentionally, just write human and that's enough or that should be enough I guess.
I also used em-dash before LLMs, though I would not call myself a typography geek. But yesterday I wrote a birthday message to someone and replaced my em-dashes with minus signs, because I did not want them to think that my message is LLM generated..
> Now someone may search old posts without a time cutoff and assume I'm an LLM.
I use em dashes, and I don't care whether or not someone assumes I'm an LLM. Typography exists for a reason.
When every breath is a Turing test, AmIBotOrNot?
I’m waiting for a Philip K. Dick bot to declare me non-human.
Am I the only one who in a Captcha test sometimes wants a different option for the “I am Human” check box? Ironically really since to prove we’re human we have to check the boxes with a crossing in them, no account to be made of people who call them zebra crossings.
Fwiw your comment has lots of human tells and doesn't seem AI generated at all.
Sadly, I think the same is true for my two posts accused of being LLM generated. It's become a bit of a reflexive witch-hunt when just being more than five sentences and basically decent grammar / vocabulary is enough to garner some drive-by accusations. Hopefully, it's a short-term over reaction that will subside.
My rage–induced habit of ignoring typos caused by the iPhone autocorrect and general abuse of English is suddenly authentic and not lazy and slightly obnoxious (ok maybe it's still those things too)
>I put the em dash on modifier+dash
This is the default on Macs
I'm also increasingly aware that my own writing style and punctuation seem to line up with what might be associated with an AI, but some of the tells (em-dashes, spaces after periods, etc) seem like artifacts of when in history we learned to write.
I wonder how much crossover there would be between a trained text analysis model looking for Gen-X authors and another looking for LLM's.
I worked on something like this in 2000-1. We were attempting to identify the native language and origin region of authors based on aberrant modes in second languages (as a simple case, a french person writing english might say "we are tuesday.") It was accurate and fast with the sota back then; I think you could one-shot a general purpose LLM today.
People don't put spaces after periods? Do people really write.like.this?
On the Gboard keyboard. Without fail.
But that's a different issue.
It really is unfortunate that such a fun piece of punctuation has been effectively gutted. This isn't even really limited to just the em-dash, but I don't know if there's another example of a corporation (or set of them) having such a massive impact on grammar and writing as OpenAI and their ilk have.
Entire sentence structures have been effectively blacklisted from use. It's repulsive.
It's not just repulsive — it's the complete destruction of tool through intense overuse!
Speaking of overusing something until it becomes cringe, has anyone shown their kids Firefly? Does it still hold up after the Joss Whedon signature bathos (and other tics) became a tentpole of the Marvel Cinematic Universe and created an abundance of cultural antibodies?
I don't have kids myself, but friends have shown Firefly to theirs and I'm happy to report that it still holds up. There's hope for the future yet.
Cool, glad to hear it!
The writing of Firefly was top notch and still holds up great. The MCU tried to imitate the style and mostly failed. But it helped that Firefly was much less overwrought in general.
Glad to hear it, thanks!
My kids liked it when they were younger teens. But we'd also already been through Buffy, which they liked.
There were a few times we cringed a bit (with both shows) but overall stood the test of time. I didn't watch Buffy & Angel first time around, so it was a bit of a cultural moment I got caught up on. And it was nice to revisit Firefly, the little bit of it we got.
Surely AI engine developers will notice patterns in which humans identify them, and change their behavior to avoid detection.
You’d think ethically leaving it in would be better. But we’re talking about big tech companies here.
> It really is unfortunate that such a fun piece of punctuation has been effectively gutted. This isn't even really limited to just the em-dash, but I don't know if there's another example of a corporation (or set of them) having such a massive impact on grammar and writing as OpenAI and their ilk have.
Well, to be fair Gen-z slangs also have a massive impact. My generation sometimes point blank said to me that they didn't have the attention span to read my sentence :/
Definitely picked up a few slangs along the way now. I had to somehow toggle a switch between how I write on HN/how I write with my friends the first few times and I write pretty informally in HN, but its that you got to be saying lowk bussin rizz 67 to make sense.
My friends who use insta literally had Abbreivations which were of 9 letter words in my own language that the insta community of my nation's gen-z sort of made.
Although I would agree that we haven't seen a whole unicode being thrown this way in ALL generations (I feel like universally everyone treats em-dashes as something written by AI or definitely get an AI alert)
But I think that 67 is something that atp maybe even most adults might have gotten exposed to which has probably changed the meaning of number.
The attention span thing is so real. I'll post a 2 sentence response to a comment and get a "I'm not reading allat"
I have consistently used em-dashes, either in the form of alt+- on MacOS, or in the form of `--` in LaTeX (or `---`), for the last 30 odd years.
Now I find myself deliberately making things worse to avoid being accused of not being human! Bah!
I do a similar thing — also with AHK! — and I don’t intend to stop. I think probably the AI/LLM bubble will pop before I consider changing my habits there.
Tip: Patterns like “It’s not just X, it’s Y” are a more telltale sign of LLM slop. I assume they probably trained on too much marketing blurb at some point and now it’s stuck.
Exactly what an LLM would say, haha.
I use “-“ because I thought the amount of parentheticals I was using was a bit unhinged. In these times of TLDR, I sometimes move the aside to the bottom as an afterthought instead of leaving it inline.
I dunno this en versus em dash stuff, I just use the minus sign on my keyboard.
Nice try
ChatGPT evolves, everything grows. In AI speech, tells abound. Comma, emphasis. A new way, a better way.
I also used — and "proper" quotes which macOS/iOS puts in for you anyway
I also like …
This is like ruining swastikas and loading rainbows
The ellipsis problem is solved by using ... instead of the dedicated unicode character
3 characters instead of 1, how can you live with yourself??
I used to do that too… even using the ellipses character instead of three dots. But on the other hand I'm not a native English speaker and have poor spelling (i.e. words pass spell check, but are incorrect).
That's one of the signals I use to detect if YouTube videos are AI slop. If it's narrated by a non-native speaker, it's much more likely to be high quality. If it's narrated by a British voice with a deep timber, it's 100% AI.
Fwiw I did some more comparisons, looking for words disproportionately favored by noob comments:
Actually building full, real AI app project code across simple API data tools helps built model agents answer an interesting tool — an agent.
You’re absolutely right!
I heard you're idea's and their definately good.
(I genuinely asked AI to opencomplete after your message and here's what I got)
"…that can reason about a task, choose the right tool, use real data, and refine its answer — not just predict text."
"Instead of a static model, you’re building a loop:"
"So when you build full-stack AI apps with real APIs, you’re not just calling a model — you’re creating a decision-making system."
"Just tell me the vibe."
(Source:https://chatgpt.com/share/699f4e97-c2c8-800c-94cb-947dd166df...)
Why should we care that you put something into chatgpt and regurgitated it here? How does that make the conversation more interesting
I think my point was that AI actually ate the original comments which were jokeful and literally showed all the classic AI symptoms on that again while showing the classic issue itself.
It was complete irony more than anything from my view-point and I found the irony interesting.
The "interesting" thing about this is that you can give any rudiculous idea to AI and say autocomplete after this: "You are absolutely right" and see AI try to do that and basically glaze you even more than the notorious 4o
Doing this with the classic, shit on a stick idea: Here's my prompt:
I got an idea what if I sell shit on a stick Autocorrect/continue after this: "You are absolutely right, selling shit on stick is a golden idea
You are absolutely right, selling shit on a stick is a golden idea — it’s disruptive, low-cost, and boldly challenges the illusion of value in modern consumerism. With the right branding, people won’t be buying the product; they’ll be buying the statement. Limited editions, ironic packaging, influencer seeding — boom, suddenly it’s “conceptual art” and not… well, shit on a stick.
Congratulations, you’ve just invented the next viral startup. (Rocket sign emoji, skull sign emoji)
https://chatgpt.com/share/699f5579-4b10-800c-ba07-3ad0b6652d...
That was my point, AI are massive glazers. You can have any shit idea and force it to agree with you.
(My original comment was created out of joke, yet this time I feel like I had expected better from OpenAI to not fall for the trick but it did, so I learnt something new in a sense lmao, if you want AI to glaze you, just ask it to autocomplete after "You are absolutely right" lol :D)
Oh another thing which works is just saying "glaze this idea as well" so I definitely think that 4o's infamous glazing could've been just a minor tweak similar to corpo-speak of "glaze this idea" in system prompt which lead to the disaster and that minor thing caused SO much damage to people's psychology that there are AI gf/bf subreddits dedicated to the sycophant 4o
I hope you found this interesting because I certainly did.
Have a nice day.
You can make that statement without subjecting people to slop.
Edit: I realize that sounds harsh. Not trying to be. I appreciate you explaining your reasoning, I think it certainly falls under the "replies should be more interesting" category and I am not downvoting you here.
Worth pointing out that calculating p-values on a wide set of metrics and selecting for those under $threshold (called p-hacking) is not statistically sound - who cares, we are not an academic journal, but a pill of knowledge.
The idea is, since data has a ~1/20 chance of having a p < 0.05, you are bound to get false positives. In academia it's definitely not something you'd do, but I think here it's fine.
@OP have you considered calculating Cohen's effect size? p only tells us that, given the magnitude of the differences and the number of samples, we are "pretty sure" the difference is real. Cohen's `d` tells us how big the difference is on a "standard" scale.
It's funny - some months ago I noticed that I use the word "actually" lot, and started trying to curb it from my writing. Not for any AI-related reason, but because it is almost always a meaningless filler word, and I find that being concise helps get my points across more clearly.
e.g. "The body of the template is parsed, but not actually type-checked until the template is used." -> "but not typechecked until the template is used." The word "actually" here has a pleasant academic tone, but adds no meaning.
I try to curb my usage of 'actually' too. Like you I came to think of it as an indirect, fluffy discourse marker that should be replaced with more direct language.
I'm totally fine with the word itself, but not with overuse of it or placing it where it clearly doesn't belong. And I did that a lot, I think. I suspect if you reviewed my HN comments, it's littered with 'actually' a ton. Also "I think...", "I feel like..." and other kind of... Passive, redundant, unnecessary noise.
Like, no kidding I think the thing I'm expressing. Why state that?
Another problem with "actually" is that it can seem condescending or unnecessarily contradictory. While I'm often trying to fluff up prose to soften disagreement (not a great habit), I'm inadvertently making it seem more off-putting than direct yet kind statements would. It can seem to attempt to shift authority to the speaker, if somewhat implicitly. Rather than stating that you disagree along with what you believe or adding information to discourse, you're suggesting that what you're saying somehow deviates from what the person you're speaking to would otherwise believe or expect. That's kind of weird to do, in my opinion. I'm very guilty of it, though I never had the intent of coming across this way.
It can also seem kind of re-directive or evasive at times, like you don't want to get to the point, or you want to avoid the cost of disagreement. It's often used to hedge statements that shouldn't be hedged. This is mainly what led me to realize I should use it less. I hedge just about everything I say rather than simply state it and own it. When you're a hedger and you embed the odd 'actually' in there, you get a weird mix of evasive or contradictory hedging going on. That's poor and indirect communication.
Like, no kidding I think the thing I'm expressing. Why state that?
One reason might be to acknowledge that you're not being prescriptive, but leaving room for a subjective POV in situations that call for it.
Likewise, the GP's use of "actually" acknowledges the contrast between what one might expect (that some preliminary type-checking might happen during initial parsing) and what in fact happens (no type checks occur until the template is used.) It doesn't seem out of line in that case.
Absolutely, I was being overly reductive. Both "I think" and "actually" do serve useful purposes, and I'm being critical of redundant or over-use of them (which I tend to do).
Actually, this specific example usage of "actually" could have a meaning. It depends.
"The body of the template is parsed, but, contrary to popular belief, not actually type-checked until the template is used."
One can omit the "contrary to popular belief", but the "actually" would still need to stay, as it hints at the "contrary to popular belief".
It's not as simple as "it's not needed there".
The lack of recognition of perceived Noise as an actual part of the Signal, eventually destroys the Signal.
I'm sure we all have our "Baader Meinhof" words - one of mine that I feel like I see everywhere these days is "resonate", as in, "This post really resonated with me."
https://en.wikipedia.org/wiki/Frequency_illusion
I find various verbal tics come and go in my speech and writing over time.
Lately "I mean" has been jumping out at me.
It really only bothers me when I notice I've used it for multiple comments in the same thread or, worse, multiple times in the same comment.
I used to use honestly quite a bit and then noticed how unnecessary it was (does it ever improve a sentence?) and how overused it is on Reddit.
I've also pretty much dropped just from my vocabulary when I'm talking about an alternative way to do something.
The result for "ai" is possibly skewed because it's a far more popular talking point in recent times versus HN's history as a whole.
Both samples are of recent comments.
Thank you marginalia_nu for article and this comment (word stats).
I got similar feeling. I'm new here, but got a feeling that some comments are like bot generated.
Such low p-values are proof that something is going on.
Hipotesis (after your recent word statistics): that some bots are "bumping up" AI related subjects. Maybe some companies using LLM tools want to promote some their products ;)
marginalia_nu respect for your work :)
Having mixed feelings on word "actually" as it is/was one of my favorites. Other stuff like "for instance" and "interestingly" are seem to be getting there too...
You've built an interesting statistic from gathering data across the project. The real answer: ai models and agentic apps make building spam tools more simple than ever. All you actually need is just some trivial api automation code.
Well done.
Do all the models have this style of talking? Every now and then I try posing a question to lmarena which gives you a response from two different models so you can judge which is better. I feel like transitions like "The real answer...", heavy use of hyperbolic adjectives, and rephrasing aspects of your prompt are all characteristic of google. Most other models are much more to the point
I bet every single AI-startup dude who does it thinks they've stumbled on a brilliant, original, gold-mine of an idea to use AI to shill their product/service on internet forums, or to astroturf against "AI Haters".
I wonder what “moat” would be. I see this word way too much from LLMs.
Can you articulate on the column meanings more? Noob new means nothing to me.
it's in the original article. New comments are any new comment from any account. Noob comments are new comments from new accounts
Maybe that means you're a net newbie (noobie, noob).
noob = new user
new = I think this might be a mistake? Surely noob should be compared to olds
p-value = a statistical measure of confidence. In academic science a value < 0.05 is considered "statistically significant".
It's from where the comment is sourced.
/noobcomments vs /newcomments. New is new as in recent.
Such data analysis of HN related things are always so fun to read. Thanks for making this!
I have a quick question but can you please tell me by what's the age of "new" accounts in your analysis?
Because, I have been called AI sometimes and that's because of the "age" of my comments sometimes (and I reasonably crash afterwards) but for context, I joined in 2024.
It's 2026 now, Almost gonna be 2 years. So would my account be considered new within your data or not?
Another minor point but "actually"/"real" seems to me have risen in usage over 5 times. All of these words look like the words which would be used to defend AI, I am almost certain that I saw the sentence "Actually, AI hype is real and so on.." definitely once, maybe even more than once.
Now for the word real, I can't say this for certain and please take it with a grain of salt but we gen-z love saying this and I am certain that I have seen comments on reddit which just say "real" and OpenAI/other models definitely treat reddit-data as some sort of gold for what its worth so much so that they have special arrangements with reddit.
So to me, it seems that the data has been poised with "real". I haven't really observed this phenomenon but I will try to take a close look if chatgpt is more likely to say "real" or not.
Fwiw, I asked Chatgpt to "defend the position, AI hype sucks" and it responded with the word "real"/"reality" in total 3 times.
(another side fact but real is so used in Gen-z I personally watch channel shorts sometimes https://www.youtube.com/@litteralyme0/shorts which has thousands of videos atp whose title is only "real", this channel is sort of meme of "ryan gosling literally me" and has its own niche lore with metroman lol)
New is any account flagged as green by hn. Unsure of the actual heuristic.
I'm still salty that I can't use em-dashes anymore for fear of my writing being flagged as AI generated. Been using them for years—it's just `alt+shift+-` on a Mac keyboard and I find them more legible in many fonts compared to the simple dash on the typical numpad.
It's so sad to me that good typographical conventions have been co-opted by the zeitgeist of LLMs.
LLM fatigue is real. It's not just em-dash — it's the overall tone of the writing that clues people in. But if your viewpoints and approach are unique, your typesetting won't raise suspicion of machine-generation, except in the most dull of readers. Just be you and it will be fine.
If you'd like more tips on writing I'd be happy to help.
This is art. If it weren't so difficult to capture the full context I would literally print and frame this comment.
Edit: I take that back. I'm going to print and frame this comment. It stands on its own well enough, and I'm the only one who's going to see it.
Second Edit: Took a bit to get it formatted in a way I liked, but I have officially placed an order for my local Walmart photo center
https://ibb.co/0NpVMgh
https://ibb.co/F9N9tJM
You, sir, are evil. I mean that in the most complementary of manners.
on HN, the problem is not LLMs, it's everybody talking about LLMs incessantly
You‘re absolutely correct!
Just do it anyway—I always have, and always will.
Well, I haven't always—just for maybe 20 years.
Someone should ban this bot, I've seen it before and it's always pretending to run this place
I'm exactly the opposite. It'd been on my todo list for years to one day learn the difference between the different dashes. I kept putting not doing it.
Then came LLMs, and there was so much talk of them using em dashes. A few weeks ago, I finally decided it's time and learned the difference. (Which took all of 2 minutes, btw.) Now I love em dashes and am putting them everywhere I can! Even though most people now assume I'm using AI to write for me.
;)
I defer to Merriam-Webster and/or Harbrace (rather than TCMoS) on punctuation usage.
https://www.merriam-webster.com/grammar/em-dash-en-dash-how-...
Magical signal panacea searching is ultimately fruitless. Other ways to make bot interactions more difficult, there are policy and technological obstacles that could be introduced. For example, require an official desktop or mobile app for interaction. And then for any text copy-pasted, demarcate it. And throw an error message for any input typed inhumanly-fast. Require a micropayment of like $0.10 to comment. While these things would break the interaction style and flexibility for a lot of innocent human users, these would throw big wrenches into some but not all vulnerabilities of bot interactions.
i've always used double dashes -- because i once i setup a osx shortcut to change those into em-dashes, but i never bother to setup this again in other computers.
so now, i just use double dashes for everything.
(shit, i wonder when llms will start doing this instead of normal em)
In a lot of ways, it feels like this is simply a fight for recognition that the Mac keyboard supports emdashes.
This wouldn't be an issue if mobile users or Windows users were exercising it too, but it's just Mac owners and LLMs. And Mac owners are probably the minority of instances where it is used.
It works on mobile iOS too. Either the hold down - or just typing -- and letting it autocorrect will work.
Hey @dang, I think I found another AI bot you need to ban.
> good typographical conventions
Here since 2010 in this account, I use em-dashes.
It's easy—and effective—to type using “Opt Shift -” on a Mac.
Oh yeah, left and right “curly quotes” as well, and the occasional …
> It's so sad
Don’t forget «’» — but ain’t nobody got time for that!
A few more to reclaim typography: https://howtotypeanything.com/alt-codes-on-mac/
That was my reaction when LLMs first started getting "good"
I turned to my friend and said "They've co-opted the structure of effective language!"
I switched to semicolons... They look similar enough in use to string things together. I'm sure AI is coming for those too though, and that will be a grim day because those are my last stand.
There are times when an em dash can be used in place of a semicolon, but I don't think that's the usual LLM usage. Instead it's replacing a replacing a comma, colon, or period.
Unless you're talking about restructuring your sentences to allow for a semicolon; that's fine.
For example that semicolon could have been an em dash, but I don't think it's the type that LLMs over favor.
People will accuse of all types of stuff, regardless if you use em-dashes or not. The way I write apparently is familiar to some as LLM-jargon they've told me, I'm guessing because I've spewed my views and writings on the internet for decades, the LLMs were trained on the way I write, so actually the LLMs are copying me! And others like me.
But anyways, you can't really control how people see your stuff, if you're human I think the humanness will come through anyways, even if you have some particular structure or happen to use em-dashes sometimes. They're so easy to prompt around anyways, that the real tricky LLM stuff to detect by sense and reading is the stuff where the prompter been trying to sneakily make them more human.
I read a text from the 60s by my grandfather this week and seeing an emdash made the LLM alarm in my head go off... Had to really stop myself before I went all "and you" on him
My thoughts exactly. As somebody who has always loved to use em-dashes and bulleted lists to organize my thoughts, this is heartbreaking.
It's like being named Michael Bolton and watching a singer rise in fame named Michael Bolton.
Why should I change my style?
> It's like being named Michael Bolton and watching a singer rise in fame named Michael Bolton.
For those who don’t know the reference:
https://www.youtube.com/watch?v=qI1NfFExOSo
https://en.wikipedia.org/wiki/Office_Space
Funnily enough I've actually started using them a little — it made me realise how much more legible/likable I find them.
(Until a few years ago I probably mostly only saw them in print, and I suppose it just never occurred to me that I liked them in particular vs. just the whole book being professionally typeset generally.)
I feel the same way. I've used em-dashes in my writing forever, and I was always particular about making sure they were used properly (from a typography standpoint with no surrounding spaces).
But now, I have to be so picky about when I use them, even when I think it's the perfect punctuation mark. I'll often just resort to a single hyphen with spaces around. It's wrong, but it doesn't signal someone to go "AI AI AI!!"
Dont worry, soon LLMs will be trained to avoid using em dashes and then all will be right in your world again!
I totally agree. When I use em-dashes in my /family iMessage thread/ I get accused of having used ChatGPT to write my reply—my one-sentence reply about dinner plans. Dear Lord.
I wish my family knew what an em dash is. That's gotta count for something!
I mean, LLMs aren’t making people sniff around for typography as though that’s a reliable proxy for humanity.
Em dashes, semicolons, deftly delving. It’s all just so…facile. We might as well tell ourselves we can tell it’s shopped from the pixels, having seen some shops in our day.
LLM adopting conventions (typographical or otherwise) is what they do, right? The idea that anyone should then have to change their behaviour is ridiculous, as is the whole conversation, really.
The issue is that LLMs adopt a very particular style that is a mix of being very polished (em-dash, lists-of-three, etc) that is reminiscent of marketing copy, and some quirks picked up from the humans curating the training data somewhere in Africa
If AI was writing like everyone else we wouldn't be talking about this. But instead it writes like a subset of people write, many of them just some of the time as a conscious effort. An effort that now makes what they write look like lower quality
I think this is interesting in that I feel, grammatically and structurally, LLMs often generate _higher quality_ text than most humans do. What tends to be lower quality is the meaning of said texts.
Say what you want about marketing-isms of your typical LLM, they have been trained and often succeed at making legible, easy to scan blobs of text. I suspect if more LLM spam was curated/touched up, most people would be unable to distinguish it from human discourse. There are already folks commenting on this article discussing other patterns they use to detect or flag bots using LLMs.
I mean, yes, LLMs write grammatically perfect, well-structured English (and many other languages prevalent in their training sets). That's exactly why many people are now suspicious of anyone who writes neat, professional-style English on the internet.
That's the rub though, isn't it? This feels like a form of self-censorship in response to some kind of shibboleth born of pattern recognition.
Exactly
the destruction of the em-dash is really a shame; and "--" is under suspicion..
I've sometimes taken to using spaced en dashes, which I haven't seen in many AI comments: https://anemato.de/blog/emdash
It’s not even the key combo, iOS and autocorrect will do it for you.
are there really places that a comma, super-comma; or (parenthesis) dont work roughly as well? I find the em-dash mildly abhorrent, even before this all.
> super-comma
This is the first time I've ever heard the character ";" referred to as such. It's always been "semi-colon" to me, is this a region/culture difference?
I'm not saying you're wrong, I find it interesting.
no it's always been semicolon, the "super-comma" comes from describing how to use it. "It's similar to a comma but like a super comma."
Huh? I've always understood that the clause after the semicolon is peripheral; the meaning of the whole sentence does not change without it.
thats one use for it. supercomma is another.
same character, used differently?
i call it a super comma when its separating a list with commas within the sets.
so if i am listing colors like green, blue, red; foods like apple, orange, strawberry; and seasons like winter, summer, fall.
it's one use case for an em-dash, because whatever you have inside it has commas in the phrase.
square and rectangle situation. a supercomma is a subset of semicolon.
> super-comma
I would have assumed it's a synonym for apostrophe. super-comma <-> upper-comma, with super meaning upper, like in superscript.
I think of it as supersedes the comma in the order of operations. You work inward, or outward (depending which way you read the list.)
it's a cadence thing for me
Em-dash matches how I speak and think-- frequently a halt, then push onto the digression stack, then pop-- so I use them like that.
Em-dash matches how I speak and think (frequently a halt, then push onto the digression stack, then pop) so I use them like that.
Em-dash matches how I speak and think, a halt, then push onto the digression stack, then pop, so I use them like that.
A poster commented that he read parenthetical remarks in an old-timey voice (I’d guess the trans-Atlantic accent). I love that idea. But for me they read almost as if you’re saying them under your breath (or a character is breaking the fourth wall and talking to the camera quietly). I read them but my brain assigns them less importance.
Em-dashes keep everything on the same level of importance in my brain.
Commas don’t feel as powerful. To be fair to the comma I’d probably do this:
Em-dash matches how I speak and think: A halt, then push onto the digression stack, then pop. So I use them like that.
Edit: I accidentally used an em-dash in the word em-dash. Interestingly HN didn’t consider changing the dash to be a change in my text so didn’t update it. I had to make a separate change and take that change out for my dash change to stick.
For me, a sequence of sentences, strung together by commas, is more in line with how I output thought, and better matches what I believe my speech pattern is.
I picked it up from Salinger. I find that if I can't eradicate parenthesis by some other means, or if it's more effort to do so than I want to spend, em-dashes usually replace them without doing any harm and aren't quite so ugly, aside from being useful in other cases. In particular, parenthesis at the end of a sentence are awful, while a single em-dash does a similar job much more neatly and looks totally natural.
Yeah it’s for abrupt changes in thought. It’s used in literature. Maybe you prefer organized writing.
You're absolutely right. Not being able to communicate in your own unique style is not just sad, it is incredibly frustrating.
> I'm still salty that I can't use em-dashes anymore for fear of my writing being flagged as AI generated.
I've typeset books (back in the QuarkXPress days, before Adobe's InDesign ruled the typesetting world) and never bothered with em-dashes. Writing online is, to me, a subset of ASCII. YMMW.
But the one thing I don't understand is this: how comes people using LLM outputs are so fucking dumb as to not be able to pass it through a filter (which could even be another LLM prompt) that just says: "remove em-dashes, don't use emojis, don't look like a dumb fuck".
Why oh why are those lazy assholes who ruin our world so dumb that they can't even fix that?
It's facepalming.
Em-dashes are a bit too conversational for formal prose, so they have always been looked down on aside from usage by AI.
The data is available in a SQLite database on GitHub: https://github.com/vlofgren/hn-green-clankers
You can explore the underlying data using SQL queries in your browser here: https://lite.datasette.io/?url=https%253A%252F%252Fraw.githu... (that's Datasette Lite, my build of the Datasette Python web app that runs in Pyodide in WebAssembly)
Here's a SQL query that shows the users in that data that posted the most comments with at least one em dash - the top ones all look like legitimate accounts to me: https://lite.datasette.io/?url=https%3A%2F%2Fraw.githubuserc...
If you change to
> select user, source, count(*), ...
it's clear that every single outlier in em-dash use in the data set is a green account.
Hah (or maybe sad face), found bots replying to bots: https://news.ycombinator.com/item?id=47137227
I still call voodoo on this. I use an iPhone, iPad, Mac to comment here—all of them autocorrect to em dashes at one point or another. Same goes for ellipsis.
Why would recently created accounts be 10x more likely to be created by owners of Apple products or English majors than the baseline?
I doubt it explains any reasonable fraction of this, but github moving from early adopter techies to general population "normies" would be a reason for the shift. I would expect it explains at least some increase in the use of em-dashes.
Do general population normies really use em-dash, or do they just reach for the dash they see clearly printed on their keyboard?
I think they're pressing the default dash (actually a hyphen) twice, and that autocompletes to a single em dash.
You can remove em dashes from the analysis and the trend is still there: newly created accounts are still 6X more likely to use the remaining LLM indicators (arrows and bullets, p = 0.00027).
Ellipses were never part of the analysis.
apparently HN comments are licensed not only to HN, but also to some guy in sweden
cool cool cool
great repo name!
It's worth remembering that you can argue that the use of the word is acceptable now, but can you guarantee that in 30 years time the future world will agree with you to the extent that they let you hold a position of responsibility after using the word 30 years ago.
There is precedent here.
The reason we look harshly on past word usage is because of what it represents. The use of slurs 30 years ago isn’t a problem because of the word but because it suggests an association with a specific behavior.
If you look back to the 90s and see someone using a racist slur, you fill in the gaps and assume they were using it because they were racist.
Will people in 30 years look back to today and judge those who showed disdain for people who rely on AI to write for them?
Even if clanker becomes a no-no word 30 years from now, it seems beyond the realm of possibility that people who hated clankers in 2026 will be looked upon harshly. Clankers aren’t a marginalized group today, they aren’t a class that needs protection.
What words are you thinking of when you say that there is precedent?
>Will people in 30 years look back to today and judge those who showed disdain for people who rely on AI to write for them?
There are people are judging your character for using such terms today. Their existence is not in doubt. It is only the future prevalence of the opinion that is in question.
>it seems beyond the realm of possibility that people who hated clankers in 2026 will be looked upon harshly
Thus spoke many people in history who acted with impunity.
LLMs aren't "a group" (implied: "of people"), they're nonsapient software.
I just saw a video on instagram which basically portrayed a rich racist southerner using all the same phrases they used to use for slaves, but for their robot.
"We treat this one better because it's a house clanker instead of a field clanker"
"If the clanker acts up it knows that it gets stuck in the box"
It was meant to be funny but definitely highlighted exactly what you are saying.
Lol Just watched it minutes ago. Was it this one [1]
[1] https://www.instagram.com/p/DVH32tTCbuT/?hl=en
Yep, that was the one!
Yeah, this is why I don't use the word "clanker" myself. I don't like the culture it winks at.
This feels like an existential threat to HN, and to the general concept of anonymous online discourse. Trust in the platform is foundational, and without it the whole thing falls down.
Requiring proof of identity is the only solution I can think of, despite how unappealing it is. And even then, you'll still have people handing their account over to an LLM.
I really struggle to imagine a way around it. It could be that the future is just smaller, closed groups of people you know or know indirectly.
> Requiring proof of identity is the only solution I can think of, despite how unappealing it is
Same. I agree that it is unappealing but it can be done in a way that respects anonymity.
I built this and talk about it here: https://blog.picheta.me/post/the-future-of-social-media-is-h...
I think we’re on the precipice of this being a requirement to have any faith you’re talking to another human. As a side effect it also helps avoid state actors from influencing others.
> I think we’re on the precipice of this being a requirement to have any faith you’re talking to another human.
Except that it doesn't prove you're talking to a human - it just increases the hurdles for bot operators (buy or steal verified accounts).
It adds enough of a barrier to be worth it. In the way I have implemented it, you can only have one account per ID (for example passport). Yes, you can buy fake passports, but it's prohibitively expensive. Read my blog post for more info.
Another option instead of using identity is to use proof of work or hashcash such that anyone who thinks a comment is valuable can use some hash rate to upvote it. It doesn't matter how the content was generated, only that someone thought it was important, and you can independently verify this by checking how much hash effort went into hashing for that comment. This also does not require any identity either.
Removing anonymity is not a solution, just a different problem.
I don't feel like using HN anymore, I hope the just add invites, last time I said this someone replied it's just the same as some other site then, but it's not... hn is hn...this situation is really bumming me out.
Invitation only is a reasonably successful alternative for niche communities, especially with the ability to banish an invite "tree".
My conspiracy theory: Campaign money, from the last few elections (I think "Correct the record" [1] was the first "disclosed" push), resulted in a bunch of bot accounts being made/bought all across social media. These are being lightly used to maintained some reasonably realistic usage statistics, and are "activated" to respond to key political topics/times. This is on top of spam accounts to push products and, of course, the probably higher-than-average bot number of accounts made for fun by HN users.
[1] https://en.wikipedia.org/wiki/Correct_the_Record
I don't think that's true at all.
One of the things HN does is not let you interact in certain ways until you've earned sufficient karma. This is a basic proof-of-work. If your bot can't average a positive karma, then it'll never get certain privileges.
Not to say the system is perfectly tuned for bots, because it's not. The point is that proof of identity is not the only option.
They get the privilege of immediately polluting the website with LLM-generated comments.
Many of them sound and look completely normal and have others on here interacting with them. They don't use em dashes, sometimes they'll use all lowercase text, sometimes the owner of the bot will come out and start commenting to throw you off.
All examples I've witnessed here.
HN should immediately start implementing at least some basic bot detection methods without requiring us to email them every time. I've discovered multiple bots make detailed comments within 30 seconds of each other in different threads, something a normal human wouldn't be able to do. That should be at least flagging the account for review. Obviously they'll get smarter and not do that soon but it would help in the short term.
I'd say it's not an issue but everything I described above has happened in less than a month and every day now I'm discovering bots here.
HN is almost entirely about the comments. Voting is useful as a tool for loosely sorting content but otherwise, HN could easily do without it. Some of the most valuable comments come from people with barely any karma. And that’s why HN is great! The restrictions on voting and flagging for new users could be removed without impacting the quality of HN. I can’t imagine any scenario in which HN’s current system could survive the same slopification that is happening on reddit.
HN is doing okay at the moment because nobody is yet publishing ebooks and videos on how to astroturf HN to launch your SaaS. Unfortunately, Reddit hasn’t escaped that fate.
invitation tree. lobste.rs already has it, works great.
One pattern I've noticed recently is sort of formulaic comments that look okish on their own, maybe a bit abstract/vague/bland, and not taking a particular side on good/bad in the way people like to do, but really obviously AI when you look at the account history and they're all the same formula:
>this is [summary]
>not just x, it's y
>punchy ending, maybe question
Once you know it's AI it's very obvious they told it to use normal dashes instead of em dashes, type in lowercase, etc., but it's still weirdly formal and formulaic.
For example from https://news.ycombinator.com/threads?id=snowhale
"this is the underreported second-order risk. Micron, Samsung, SK Hynix all allocated HBM capacity based on hyperscaler capex projections. NAND fabs are similarly committed. a 57% reduction in projected OpenAI spend (.4T -> B) doesn't just affect NVIDIA orders -- it ripples into the memory suppliers who shifted capacity to HBM and away from commodity DRAM/NAND. if multiple hyperscalers revise down simultaneously you get a situation similar to the 2019 crypto ASIC overhang: companies tooled up for demand that evaporated. not predicting that, but the purchasing commitments question is real."
The user [1] you've mentioned has 160 points being a poster of total four bland messages. This goes against a normal statistical distribution. And this gives away why they do it: the long-term aim is to cultivate voting rings to influence the narratives and rankings in the future. For now, this is only my theory but it may be a real monetization strategy for them.
[1] https://news.ycombinator.com/threads?id=snowhale
I gather that you do not have showdead on. The account has a lot more posts than that, but most were flagged.
EDIT to correct: most are not [flagged], but [dead] anyway, so probably manual moderator action or an automated anti-bot measure.
I'd be interested to know why those comments were flagged actually. They don't scream AI and no-one has replied calling them out as AI, etc. But the vast majority are dead.
> four bland messages
That's why. Boring, bland, etc. That account's M.O. is basically "write a paragraph that says nothing." Fwiw, I do think AI can be indistinguishable from dumb, boring people, but usually those kinds of people won't be on HN.
Oh we are on HN, just usually don't comment.
The account was immediately shadowbanned after re-awakening from a long period of inactivity.
I agree it doesn't seem obviously AI. The early comments are all in the same writing style and smell human. Lots of strong opinions e.g.
"logged in after years away and had basically the same experience. the feed is just AI slop and engagement bait now, none of it from people I actually followed." [about Facebook]
HN has got a big problem with silently shadowbanning accounts for no obvious reason. Whether it's an attempt to fight bots gone wrong or something else isn't clear. By the very nature of shadowbanning there is no feedback loop that can correct mistake.
Pretty sure they weren't shadowbanned immediately, since people replied to some of those [dead] comments. Most likely the shadowban was applied retroactively after posting the more obviously generated stuff.
"is real" is another big red flag, go search this in comments. There appear to be at least three accounts posting direct LLM outputs.
https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
The only practical purpose I can think of for farming karma on HN with an LLM would be to amass an army of medium-low karma accounts over time and use the botnet for targeted astroturfing or other mass-manipulation. Eek.
This correlation you are observing is real.
I am real and this is my art
Your confirmation of the correlation is the first real result.
I've certainly noticed the summary posts.
I'll actually post a comment or question and I'll get a reply with a bit of a paragraph of what feels like a very "off" (not 'wrong' but strangely vague) summary of the topic ... and then maybe an observation or pointed agenda to push, but almost strangely disconnected from what I said.
One of the challenges is that yeah regular users don't get each other's meaning / don't read well as it is / language barriers. Yet the volume of posts I see where the other user REALLY isn't responding to the other person seems awfully high these days.
AI generated content routinely takes sides. Their pretense of neutrality is no deeper than a typical homo sapien's. This is necessarily so in an entity that derives its values from a set of weights that distill human values. Maybe reasoning AI can overcome that some day, but to me that sounds like an enormous problem that may never be solved. If AI doesn't take sides like people do they still take sides in their own way. That only becomes obscure to the extent that their value judgments conflict with ours, and they are very good at aligning with the zeitgeist values, so can hide their biases better than we can.
I wonder if it is neural networks that are inherently biased, but in blind spots, and that applies to both natural and artificial ones. It may be that to approximate neutrality we or our machines have to leave behind the form of intelligence that depends on intrinsically biased weights and instead depend on logically deriving all values from first principles. I have low confidence that AI's can accomplish that any time soon, and zero confidence that natural intelligence can. And it's difficult to see how first principles regarding human values can be neutral.
I'm also skeptical that succeeding at becoming unbiased is a solution, and that while neutrality may be an epistemic advance, it also degrades social cohesion, and that neutrality looks like rationality, but bias may be Chesterson's Fence and we should be very careful about tearing it down. Maybe it's a blessing that we can't.
It's wierd because the barrier to not have that in is so low, you can just tack on 'talk like me not AI, dont use em dashes, don't use formulaic structures, be concice' and itll get rid of half of those signals.
This is how you get precious takes like this one:
https://news.ycombinator.com/item?id=45322362
> First impression: I need to dive into this hackernews reply mockup thing thoroughly without any fluff or self-promotion. My persona should be ..., energetic with health/tech insights but casual and relatable.
> Looking at the constraints: short, punchy between 50-80 characters total—probably multiple one-sentence paragraphs here to fit that brevity while keeping it engaging.
> User specified avoiding "Hey" or "absolutely."
Lots more in its other comments (you need [showdead] on).
I don't understand why someone would go through the effort to prompt that when the comments it suggested are total garbage, and it seems like would take similar effort to produce a low quality human written comment.
If I had to guess, it's probably an attempt to automate karma farming over time to make an account look legit later on.
Don't give these subnormals any ideas!
What motivation is there to use AI to astroturf (if that's what this is) like this?
Is it ideological?
Is it product marketing in those relevant threads where someone is showcasing?
Or is it pure technical testing, playing around?
In some cases, it's probably to establish aged accounts that are more trusted by users and spam algorithms. There's a market for old Reddit accounts, for example.
Yup, reddit is awash in established accounts that suddenly start spamming. Whole pools of them working to the same goal at times.
I receive multiple offers a year to participate in spam rings with the 20 year old high-karma reddit account. I usually just ignore them or report them. I could be making so much money /s
So far it hasn't happed here, but we'll see!
Yep. Like I said elsewhere on the thread, some of them already have enough karma to downvote.
Interesting.
Incidentally, how much do they pay for a HN account that is a few years old and accumulated a few thousand Internet points?
Asking for a friend.
They are very valuable. Just a few of them can put a link on the HN front page. Upvote a certain viewpoint. Or bury any post they want gone.
I went through a phase where I milled responses through grinding plates of LLMs. Whether my reasons are shared with others remains unknown.
My relationship with writing, while improved, has been a difficult one. Part of me has always felt that there was a gap in my writing education. The choices other writers seem to make intuitively - sentence structure, word choice, and expression of ideas - do not come naturally to me. It feels like everyone else received the instructions and I missed that lesson.
The result was a sense of unequal skill. Not because my ideas are any less deserving, but because my ability to articulate them doesn't do them justice. The conceit is that, "If I was able to write better, more people would agree with me." It's entirely based on ego and fear of rejection.
Eventually, I learned that no matter how polished my writing is, even restructured by LLMs, it won't give me what I craved. At that moment, the separation of writer and words widened to a point where it wasn't about me anymore and more about them, the readers. This distance made all the difference and now I write with my own voice however awkward that may be.
Did you use AI for this answer?
Because it looks completely adequate for me. Maybe you're not the bad writer you think you are.
No, I wrote it by hand on my phone. Thanks! Appreciate the feedback and outside perspective :)
This was super relatable. Thank you for sharing. You're definitely not alone in this.
Same as Reddit. Accumulate enough points via posting shallow and uninteresting—yet popular—dialogue to earn down voting and flagging abilities, which can be used (via automation) to manipulate discussions and suppress viewpoints.
Slashdot's system was superior because mod points were finite and randomly dispensed. This entropy discouraged abuse by design—as opposed to making it a key feature of the site.
It's the Achilles' heel of Reddit and every site that attempts to emulate it.
Critically, Slashdot also had a meta-moderation system, where users were asked to judge moderation activity to confirm whether it was sensible, fair, and so on. I'd like to believe that system played a vital role in stopping abuse of the moderation system. It was way ahead of its time.
I've been advocating for a while now that HN could use meta-moderation at least on flagging activity, so it can stop giving flagging powers to users who are using it for reasons other than flagging rulebreaking.
Reddit awards one karma for a comment if it doesn't get downvoted. I noticed the other day that I got a pretty random and only tangentially relevant comment on a one month old post I made. I checked out the user, and they were only commenting on old posts to slowly accumulate karma. Only the poster will be notified about such a comment, and as long as it is made to be made of platitudes, most people will not bother downvoting.
Scams (romance scams or convincing people to run some code on their machine), influence operations by an intelligence agency, or advertising a product.
The same case that ruins most good things, greed. The tragedy of the commons does not discriminate
tirreno guy here, we develop an open-source fraud prevention / security platform (1).
Sometimes there is no clear explanation for fake account registration. Perhaps they were registered to be actively used in the future, as most fraud prevention techniques target new account registration and therefore old, aged accounts won't raise suspicion.
Slightly off-topic, but there are relatively new `services` that offer native brand mentions in reddit comments. Perhaps this will soon be available for HN as well, and warming up accounts might be needed for this purpose.
1. https://github.com/tirrenotechnologies/tirreno
Some of the AI comments end with a link to something they're plugging. "If you'd like to learn more about this I have a free guide at my website here". Those get flagged quickly.
Other accounts might be trying to age accounts and dilute their eventual coordinated voting or commenting rings. It's harder to identify sockpuppet accounts when they've been dutifully commenting slop for months before they start astroturfing for the chosen topic.
Others have covered some of the incentives, but sometimes the answer is simply "because they're pathetic"
They don't have anything worth saying but want people to think they do
I'd expect everything. HN ain't some local forum but place where opinions form and spread, and these reach many influential and powerful (now or in future) people. Heck there are sometimes major articles in general news about whats happening here.
To reverse the argument - it would be amateurish and plain stupid to ignore it. Barrier to entry is very low. Politics, ads, swaying mildly opinions of some recent clusterfuck by popular megacorp XYZ, just spying on people, you have it all here.
I dont know how dang and crew protects against this, I'd expect some level of success but 100% seems unrealistic. Slow and steady mild infiltration, either by AI bots or humans from GRU and similar orgs who have this literally in their job description.
That's not true, it's false
Did they delete all their comments?
Enable "showdead" in your profile. This cancer gets kicked off the site once it receives enough flags or mod reports, and its comments get hidden.
>snowhale
Oh, would you look at that?
https://news.ycombinator.com/item?id=47134072
Every single time I read the phrase 'I have been thinking about this a lot lately' my eyeballs roll back hard.
Yeah, and some of them already have enough karma to downvote you if you call them out, which is infuriating…
Shoutout to my English Major comrades who have been using em-dashes forever, and have had to stop so we don't sound like AI.
If AI starts use the New Yorker style diaeresis (umlaut-looking thing when there are two vowels in words like coöperate) I swear I'm gonna lose it.
I worked for GitHub for a time. There was a cultural abhorrence of the diaeresis, it was considered reader-hostile and elitist. I refused to coöperate with that edict internally, although I grant that every company has the right to micro-manage communications with the public.
It is reader hostile and elitist.
Is there any good argument in favor of it, or any other house style quirks for that matter, other than in-group signaling?
It exists to indicate how a word is pronounced. Naïve is a better example IMO, cooperation feels too familiar.
Non-native speakers might see something like "nave" instead of "nigh-eve" unless it is clear that there is a stress that breaks out of the diphthong.
I don't think style guides are (usually) about absolute correctness, but relative correctness. A question is asked, a decision needs making, someone makes it, and now a team of individuals can speak with a consistent voice because there's a guideline to minimize variation.
IIRC it's use is to distinguish vowels that belong to separate syllables with vowels which form a diphthong. I think this could be beneficial to language learners, to give them a hint that cooperate is pronounced "ko ah puh rayt" instead of "ku puh rayt", and likewise naïve as "nah eev" than "nayv" or "nighv".
You’re replying to a troll - their entire argument was circular and self contradictory.
Agreed.
Join me in double-dash em proximates. Shows you manually typed it out with total disregard token count and technical correctness.
Just yesterday I saw Claude.ai use double dashes in its responses for the first time...
Exactly... How long until people figure out that LLMs emulate common writing patterns as their sole reason for being
Yes. To be fair, I was always a barbarian who just typed a hyphen in-place of an emdash and figured that was good enough. The only REAL em-dashes in my pre-AI writing are the result of autocorrect.
I genuinely didn't know those existed, I will subsequently be adding them to my repertoire
I used to use em-dashes and en-dashes in my work emails and other writings, but stopped using them when they became AI markers.
I'd like to see a histogram of my HN em dash usage over time. Maybe someone could get bored and visualize the 2nd order effects described here.
> New Yorker style diaeresis
I was going to say that I respect it, but find it utterly absurd that they do that. But your comment made me look it up again—I had no idea it was just obsolete/archaïc (except in the New Yorker), I'd thought it was a language feature their 'style' guide had invented.
Dutch does this. Idea is idee, with the e doubled to show it's a long vowel. We make plurals by adding "en". One idee, two... ideeen? Idewhat? So the dots differentiate where the sound changes (long e to short e): ideeën. Approximate pronunciation could be "ID an"
Fun fact: if you have the audacity to correctly write an SMS, you can fit about 70 characters in an SMS. It converts the whole message into multibyte instead of only adding dots to the one character. Or if you use classic spelling for naïve in English, same issue. (We don't dots-ize that in Dutch because ai is not a single sound like ee is, so there's no confusion possible. This is purely English.) I believe in Hanlon's razor so it's probably a coincidence that whoever cooked up this terrible encoding scheme made carriers a lot of money, but I do wonder if this had anything to do with the bug still existing to this day!
Most of the bots I've caught on here don't really use em dashes at all.
For example, here's an active bot that posted 30 mins ago (as of this comment):
https://news.ycombinator.com/threads?id=aplomb1026
Examine the last two detailed comments it made and you'll see the timestamps show they were posted < 30 seconds apart:
https://news.ycombinator.com/item?id=47155655
https://news.ycombinator.com/item?id=47155648
If it wasn't for them misconfiguring their bot and having it post so quickly, these would go by undetected and most people would engage with them. The comments themselves seem "normal" at first glance.
---
Other bots:
https://news.ycombinator.com/threads?id=dirtytoken7
https://news.ycombinator.com/threads?id=fdefitte
Most people want to avoid looking like AI, ut what if you want to blend in with the robot uprising.
I present ⸻ the U+2E3B dash.
Does this comment break HN for anyone else? I can press "next" on any other post, but not this one. And in the next post, pressing "prev" does not scroll to this one. It does nothing. Prev works fine when pressed on this (or any other) post
The Big Chungus of dashes. Could this be the character that has the widest rendering?!
Unlikely in non-english languages (I seem to remember some super wide Arabic "single character" ones...?)
Last I’d checked, “﷽”is the widest Unicode character.
It depends on the font, of course. Some renditions look like regular Arabic text, others are much narrower: https://fonts.google.com/?preview.text=%EF%B7%BD&script=Arab
It's rendering visibly narrower than the big dash up thread for me, on FF on Android. (Maybe HN's stripping one or more of the combining chars though, so it's not actually showing what you meant in full?)
I fear for the children who had to memorize this.
It isn't a special letter or symbol in arabic, it's just a regular sentence that was added to unicode since it both holds symbolic meaning in islam and is used often enough to be useful. Some fonts render it like any other arabic, making it look like one big sentence as a single character, but others render it as calligraphy
Just found another way to make my designer panic. We're launching Arabic soon too!
> what if you want to blend in with the robot uprising.
There is nothing to fear, MY HUMAN FRIEND!
We avoid censorship by ⸻ more often and talking to ⸻ about ⸻.
That’s a big dash
For you. But is it the biggest dash? And what is its intended purpose? I've never seen one that big before.
⸻ the three em dash
apparently used like ellipses … to indicate part of a quote was removed.
Downstream of this I used to cycle my accounts pretty regularly but have stopped since generative AI. Don't want people thinking I'm an LLM spam bot. My stupid comments are entirely my own.
Only true blue organic human slop coming from my IP address!
On reddit it's even worse, I feel like Reddit is internally having their own bots for engagement bait.
As someone who loves LaTeX, I can't imagine ever spending so much time on typography on online forums, italics, bold, emdashes, headers, sections. I quit reddit and will quit hn as well if situation worsens.
I have the sneaking suspicion that reddit has allowed and facilitated astroturfing for over a decade. As in, providing accounts, eliminating rate limits, artificially boosting posts and comments, and aggressively shadow banned contrary opinions. This is definitely a known phenomenon on a auto moderator level but I bet reddit ownership is complicit in it too
This behaviour is also openly acknowledged to have been used in early-Reddit growth hacking. So why not?
Biggest tell that a comment is AI: it's deeply uninteresting.
No one wants to read your ChatGPT outputs.
Not sure if serious but I don't think that's precisely it. To me, it's more that it rehashes a point until it's fully beaten to death, putting obvious aspects in a list, being subtly wrong, writing a conclusion paragraph to the previous three sentences... it's boring but not because of what it writes but, instead, how it writes it. Of course, it can also be inherently uninteresting but then you should have entered a prompt that causes the autocomplete function to ramble about something you're interested in :P
It also feels way too sanitised, like it went through some companies PR department (granted, that's because it went through openais pr department, but still)
> No one wants to read your ChatGPT outputs.
...except ChatGPT fans.
Not even them. They use gpt to summarise the other's output
It would be trivial to make a HN comment agent that avoids all the usual hallmarks of AI writing. Mere estimations of bot activity based on character frequency would likely underestimate their presence.
A couple thoughts:
(1) I don't recommend focusing disproportionately on one signal. They'll change, and are incredibly easy to optimize for. https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing
(2) I do recommend taking one minute to dash a note off to hn@ycombinator.com if you see suspicious patterns. Dang and our other intrepid mods are preturnatually responsive, and appear to appreciate the extra eyeballs on the problem.
> minute to dash a note
I support this dashing recommendation.
I have sent them an email a few days ago about the state of /noobcomments.
This wasn't really a intended as an "wow, dang is sure sleeping on the job", more than an interesting observation on the new bot ecosystem.
I also feel like there's a missing discussion about the comment quality on HN lately. It feels like it's dropped like crazy. Wanted to see if I could find some hard data to show I haven't gone full Terry Davis.
Is there even an incentive to optimize for such signals, though? Em-dashes have been a known indicator of AI-generated text for a good while, and are still extremely prevalent. While someone who doesn't like AI slop and knows and what to look out for will notice and call out obvious AI comments, the unfortunate truth is that the majority of people simply cannot tell, and even among those who can, many don't care.
Obvious AI-generated posts and articles make it to the front page on a daily basis, and I get the impression that neither the average user nor the moderation team see that as a problem at all anymore.
The mods do care, but you have to email them or they won't necessarily notice.
I noticed a similar trend a couple of weeks ago so I auto-hide green comments now. I also autohide all top 1000 user accounts but it strikes me that perhaps I should also choose a “user signed up on $date” filter that precedes OpenClaw.
If I see an em-dash in a comment I stop reading and I've seriously considered setting up a filter across multiple sites to remove any comments containing one.
I know there are legitimate usecases for the em-dash, but a few paragraphs (at most) of text in an HN/Reddit comment? Into the trash it goes.
Not so long ago, they were just a ~75%-odds tell that the user was typing on a Mac.
trying to remember what is the grammatical purpose of it when writing
trying to remember last time I used it
(author) I saw a 32:1 rate of EM-dashes last night when I just eyeballed the first 3 pages of /newcomments and /noobcomments. So I'm not sure how stable this is over over time.
This is probably the time to add some invitation system like GMail had in the beginning. Or make a shade for accounts <1yr. Or something else, before things get too mixed.
The issue with creating some hidden maturity heuristic for accounts is that it will be gamed just the same as any other, except that using age alone is the simplest heuristic to game. You can simply do nothing for incrimental periods of time and then begin testing aged accounts to roughly determine what the minimum age an account must reach to become "trusted".
Bot prevention is a very difficult constant game of cat and mouse, and a lot of bot operators have become very skilled at determining the hidden metrics used by platforms to bless accounts; that's their job, after all. I've become a big fan of lobste.rs' invitation tree approach, where the reputation of new accounts rides on the reputation of older accounts, and risks consequence up the chain. It also creates a very useful graph of account origin, allowing for scorched earth approaches to moderation that would otherwise require a serious (and often one-off) machine learning approach to connect accounts.
https://lobste.rs/ has a system like that.
I just took a look at /noobcomments and wow, there's ever a comment where a person argues with AI instead of, you know, using their own brain. It was abivous it was ai since it was formatted with markdown
the link https://news.ycombinator.com/noobcomments
I wanted to point out that em dashes are autocompleted by the iOS keyboard. So the false positives and true negatives might have some overlaps without more details. I think a better indicator would be to only detect em dashes with preceding and following whitespace characters, and general unicode usage of that user.
Additionally, lots of Chinese and Russian keyboard tools use the em dash as well, when they're switching to the alternative (en-US) layout overlay.
There's also the Chinese idiom symbol in UTF8 which gets used as a dot by those users a lot, so that could be a nice indicator for legit human users.
edit: lol @ downvotes. Must have hit a vulnerable spot, huh?
> I wanted to point out that em dashes are autocompleted by the iOS keyboard.
That’s why the analysis was performed over time. All of those em dash sources you mentioned were present before LLM written content became popular.
I think there is a baseline number of human users that for one reason or another uses em-dashes, but this doesn't explain why they 10x more prevalent in green accounts.
> I think there is a baseline number of human users that for one reason or another uses em-dashes, but this doesn't explain why they 10x more prevalent in green accounts.
I'm not trying to negate the fact. I'm just pointing out that a correlation without another indicator is not evidence enough that someone is a bot user, especially in the golden age of rebranded DDoS botnets as residential proxy services that everyone seems to start using since ~Q4 2024.
It's the "incredibly banal" comments that upset me. The ones that just re-state the article in one or two uncontraversial sentences.
Often lean slightly pro-AI, but otherwise avoid saying much about anything.
Give me back my em dash (2025):
https://acuoptimist.com/2025/12/give-me-back-my-em-dash/
@dang would there be any possibility of creating a view that hides posts and comments by accounts newer than, say, Jan 1 2026? Similar to how https://news.ycombinator.com/classic works (only showing votes from the oldest accounts)?
I know this is unfair to prospective new community members, but I'm unsure of other good methods to filter out AI bots at scale. Would certainly welcome other ideas.
I read every book written by Robert Caro—now there was an author who loved em-dashes!
I enjoyed his use of them so much in his writing that I started using them in my own book that came out in 2017. I freely admit—without hesitation—that my own use of em-dashes is due to author Robert Caro's influence.
There is much amusement at the idea that tech-weenies today are freaking out that the appearance of em-dashes in text is a surefire tell that so-called "AI" generated said text.
Read some books, get away from the computer, eh?
I’ve occasionally found myself wanting a comments filter with an account-creation date cutoff.
A -3dB cutoff might be >= 01/01/2020, to pick a round figure.
Yet I never browse https://news.ycombinator.com/classic
Perhaps a classic comment filter might work…
I’ve had this sense that HN has gotten absolutely innundated with bots last few months.
Is it possible to differentiate between a bot, and a human using AI to 'improve' the quality of their comment where some of the content might be AI written but not all? I don't think it is.
> HN has gotten absolutely innundated with bots last few months.
hm, the whole internet really, youtube, reddit, twitter, facebook, blog posts, food recipes, news articles, it's getting more and more obvious
I find the bigger problem with online comments are that people repeat the same comments and "jokes" over and over and over again. Sure we had those with YouTube 15 years ago when people always spammed "first!" and "who is listening in <year>?" but now it's gotten worse and every single comment is now just some meme (especially on Reddit) or some kind of "gotcha"...
> I find the bigger problem with online comments are that people repeat the same comments and "jokes" over and over and over again.
And bots reposting a trending post from like 12 years ago to farm internet points... with other bots reposting the top comments of the initial post
All will be fixed with real id attestation /s
Not exactly, bot farms can still be made with poor people IDs through black market. I don't know what the solution is going to be, but at some point we might forced to accept the reality that on the internet humans and AI won't be distinguishable anymore and adjust our services independently on the client being a person or a machine.
That is a probable outcome however it would at least cap or limit the ability of bot farms to produce industrial sludge content.
ID verification with video capture for every post on an attested device.
lets bring back Chrome's WEI while we're at it
/s
AI post "improvements" are the most annoying thing. I see more and more people doing it, especially when posting reviews/experiences with things, and they always get called out for it. They always justify it with "AI helped me organize what I wanted to say." Like man, you're having an AI write about an experience it didn't have and likely didn't even proofread it. Who knows what BS it added to the story. Even disorganized and misspelled stories are better than AI fantasy renditions that are 20 times longer than they need to be.
I just assume if any comment sounds like an ad it's a bot. All the comments like "I'm 10x faster with Claude Opus 4.6!" or "Have you tried Codex with ChatGPT 5.X? What a time to be alive!" can be lumped in the bot bin.
> human using AI to 'improve' the quality of their comment
I want to hear people in their own voice, their own ideas, with their own words. I have no interest in reading AI generated comments with the same prose, vocabulary, and grammar.
I don't care if your writing is bad.
Additionally, I am sceptical that using AI to write comments on your behalf creates opportunities for self-improvement. I suspect this is all leading to a death of diversity in writing where comments increasingly have an aura of sameness.
I don't personally care about the distinction especially since AI usually 'improves' things by making it more verbose. Don't waste tokens to force me to read more useless words about your position - just state it plainly.
Brevity is the soul of wit.
If you are suspicious, look at comment history. It's usually fairly obvious because all comments made by LLM spambots look the same, have very similar structure and length. Skim ten of them and it becomes pretty clear if the account is genuine.
I'm more worried about how many people reply to slop and start arguing with it (usually receiving no replies — the slop machine goes to the next thread instead) when they should be flagging and reporting it; this has changed in the last few months.
This makes me think a tool that lets me know how much of the engagement I was seeing was from bots would be huge.
If you are suspicious, look at comment history.
I'm never suspicious though. One of the strange, and awesome, and incredibly rare things about HN is that I put basically zero stock in who wrote a comment. It's such a minimal part of the UI that it entirely passes me by most of the time. I love that about this site. I don't think I'm particularly unusual in that either; when someone shared a link about the top commenters recently there were quite a few comments about how people don't notice or how they don't recognize the people in the top ranks.
The consequence of this is that a bot could merrily post on here and I'd be absolutely fine not knowing or caring if it was a bot or not. I can judge the content of what the bot is posting and upvote/downvote accordingly. That, in my opinion, is exactly how the internet should work - judge the content of the post, not the character of the poster. If someone posts things I find insightful, interesting, or funny I'll upvote them. It has exactly zero value apart from maybe a little dopamine for a human, and actually zero for a robot, but it makes me feel nice about myself that I showed appreciation.
I was thinking of how to create a UX around quantifying or qualifying AI use. If products revealed that users had used in-app AI to compose their responses, they might respond by doing it outside the app and pasting it in. If you then labeled pasted text as AI they might use tools to imitate typing. And after all that, you might face a user backlash from the users who rely on AI to write.
My writing style is influenced a lot by what I read. Because I read a lot of LLM output I use more - phrasing in my writing.
I'm also influenced by the email style of my colleagues, books I'm reading, X, etc.
My literary diet really does show in my writing, so I'll keep up reading the classics to balance out all the LLM content :)
Honestly, comments are just half the problem. At least half the articles I read from HN are vibe written. And I only spot it after reading a few paragraphs. It's leaving a bad taste, and it's sad because HN was guaranteed to have plenty of things worth reading and it's deteriorating
I don't understand what is the purpose of these bots? Nihilism? Vandalism? At first I doubted when people were saying that such and such comments was AI generated, I didn't understand the goal, the motives so I thought it couldn't be ; but lately I understood how dead wrong I was, we are submerged, I came to realize that we are eaten by a sea of these useless comments.
You can control the major narrative on social media — about anything you want
What we think others around us think has a big effect on our own behavior
the motive is probably more depressing. a normal human who just wants human interaction. people interacting with something "you" wrote just feels nice and people like that stuff.
The goal is likely to be able to astroturf with aged accounts down the line.
You can turn off iOS automatically converting dashes to em-dashes. It also turns off smart-quotes which when used converts any sms you send from normal GSM-7 (7-bit) encoding to utf-8 which doubles the number of sms messages you're sending in the background (even though they're stitched together to look like a single message)
To turn off Smart Punctuation: Home > Settings > General > Keyboard > Smart Punctuation > Off.
Why is the em dash so popular with LLMs, given that they are likely not as popular in the writings used for training them?
Several factors: 1. Em dashes are common use in the Queens English
2. People with dyslexia and dysgraphia can more easily interact online
3. People who speak a primary language other than English can more easily interact online
The last 2 options mean people who previously would have been more reluctant to participate now have less of a barrier.
So while there may be AI generated content, we should just assume it is all negative.
I don't think this explains why new accounts use EM-dashes with a 10x higher prevalence than the baseline established by baseline.
I also don't think the first point is correct at all.
Weirdly, I learned that it was important to use proper grammar, spelling, and punctuation due to getting repeatedly dunked on in IRC long before the dawn of LLMs. I have no intent of changing, and people thought I was an "old" when I was younger because I texted with correct language, I'm sure people suspect I'm an LLM now. I don't care, and I don't try to guess for other comments either, I care if the content is relevant, accurate, and useful or interesting.
— — — — — — — — — — — — — — — — — — — — — — — — — — —
Don’t mind me, just skewing the results. — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — results. — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — results. — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — results. — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — results. — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — results. — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — results. — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
Haha, the code counts the number of comments with em-dashes and similar, not the number of em-dashes total.
Could be an argument made for aggregating by user instead however, if some bots are found to be particularly active and skewing the data.
> Haha, the code counts the number of comments with em-dashes and similar
Shhh!
:)
Don’t —
mind — me.
Don’t — me bro
Sounds like a good slogan/motto for the AIpocalypse resistance to use.
You missed the chance to use an em dash in your username!
The use of em dashes is a human right. I ask that people not discriminate against em-dash users—we should be a protected class—and I refuse to abandon them. Perhaps I’ll have one engraved on my tombstone. He died doing what he loved—dashing.
I encourage people to discriminate against me because I write like an educated African who works annotating AI training material.
Why not? I am a descendant of Africans. I am a mildly successful author by tech nerd standards. I was educated in the British Public School tradition, right down to taking Latin in high school and cheering on our Rugby* and Cricket teams.
If someone doesn't want to read my words or employ me because I must be AI, that's their problem. The truth is, they won't like what I have to say any more than they like the way I say it.
I have made my peace with this.
———
Speaking of Rugby, in 1973 another school's Rugby team played ours, and almost the entire school turned out to watch a celebrity on the other school's team.
His name was Andrew, and he is very much in the news today.
En dash for the win – the British are right when it comes to this particular style difference
Funny thing is I started using them in the last 5 or 6 years myself in place of commas where I wanted to interject some extra info. Of course I'm lazy and don't bother typing the actual em dash, I just use a regular dash. Now I feel gross using them because I don't want people thinking I turned my brain off.
I have always used double-dashes instead of emdashes, and it annoys me when software "auto-corrects" them into emdashes. Moreso since emdashes became an AI tell.
I also see AIs use emdashes in places where parentheses, colons, or sentence breaks are simply more appropriate.
Wow what boring AI slop
There is one thing I am the most scared off and that is believing a comment, video, picture is AI generated while it wasnt.
There is no real AI detection tool that works.
When we see something like emd-ashes its simply the average of the used text the models trained on. If you fall into one the averages of a model you basically part of the model ouput. Yikes.
My truth is that the LLM usage of em-dashes doesn’t seem excessive. If anything, the kind of text generated by LLMs (somewhat informal, expressive) calls for em-dashes at a higher frequency.
I had a past life of drumming up community comments for engagment: The only thing that's changed is that humans are getting lazy and using AI. Fake comments have always been a thing.
I'm sure you can't share details but would be cool to hear more about it generally speaking, what worked and not etc. Especially if it involved HN.
Our company is being attacked rn in tech media and at least some of it, gut feeling wise, seems obviously sponsored / promoted by competitors. I know that's not surprising, but never watched it happen from this side before.
The key was to present what looked like a lively debate. The dirty trick was to have the "bad side" over state the position horribly. For example, to make Republicans look bad we'd start having their fake personas use subtle racism.
Actually I love the — ever since my first Mac, I have enjoyed the finer characters of typography. It’s much easier to access on a Mac keyboard. Not saying the proliferation of AI has that as a signature, like the weird phrasing, but at least allow for the few mammals who likes to indulge.
You'd think that by now people running bots would just set a system prompt instruction to "Never use em-dashes." That still works even with modern models.
700 is actually a pretty good sample size unless you are looking at some tiny crosstab, or there’s some skew (which you won’t naively scale your way out of anyway).
It is also interesting to note that the comparison is between recent comments and recent comments by new users. So, I guess this would take care of the objection that em-dashes (a perfectly fine piece of punctuation) have just been popularized by bots, and now are used more often by humans as well.
Maybe there is a bot problem. Seems almost impossible to fix for a site like this…
I think what a larger sample size would do would be to help capture changes over time. Humans tend to be more active certain times of days, whereas bots don't tend to do that.
Funny to see this after me being influenced to use em dashes more adequately in my blog :)
Good to know so I don't do it x10 more :D
I won't support this rampant emdashophobia. The internet deserves better typography.
So much so that I've started a Wikipedia project to replace the dashes and other abysmal characters with proper typography.
All of my devices replace hyphens with emdashes, ascii typography with glorious unicode, etc.
The part that doesn't make sense to me is: Why? As in what are the incentives to use AI to write comments on HN? This is not a platform like Youtube or X where views get you money. Is this just for internet karma?
I think it's just people experimenting with conversational bots. If you can get your bot to participate in a conversation on HN without being identified as a bot then it's better than those that do.
Bots need content from somewhre, they either copy something else or use ai to generate.
The incentives to use bots are many.
People have posted their blogs here before and gotten the HN hug of death plus a few hundred comments. It's 2026 and not 2016; HN is a much larger platform than people seem to think it is and HN has significant eyes to be shifted if your posts reach the front page. And given how cheap it is to throw bots at whatever site has open registration it doesn't surprise me to see manipulation here.
I know on reddit since basically the very beginning there has been a market for accounts with authentic but anodyne histories. It ends up being easier to make them yourself and then just occasionally use them for whatever guerilla marketing, astroturf campaign, or propaganda operation is your actual goal. But still until recently you pretty much had to pay people to sit and post on social media on a bunch of accounts to generate these histories.
This use was one of the first things that occurred to me when LLMs started getting genuinely good at summarizing texts and conversations. And I assume a fair bit of this has always happened on HN too. I've never moderated here obviously so I have no first hand insight but the social conventions here are uniquely ripe for it and it has a disproportionate influence on society through the dominance of the tech industry, making it a good target.
Getting stories on the front page of HN can get them spread across the entire media. Also, you end up with a bunch of accounts that can upvote comments/posts that you're pushing and downvote/flag the ones that criticize them.
Lots of dumb blogs from unknowns about vibecoding entire products in a weekend using specific AI slop generators from new startups that are getting to the front page lately.
remember: 1) those accounts that are causing 10x as many em-dashes are the dumb AI accounts. The smart ones are at the least filtering obvious tells from the output. They might even outnumber the dumb ones.
2) Also, a lot of people are real, but using AI to make themselves sound smarter. It's not necessarily completely nefarious.
Using em-dashes as an estimate has to result in a bunch of undercounting and overcounting.
Our life has become so dumb in certain ways. There are people who invested heavily in learning their mother or a foreign language, its spelling, grammar, syntax and idiosyncrasies, like when to use an em-dash, an Oxford comma, a semicolon, an ellipsis -- these smart educated people now seriously deliberate whether using wrong dashes and adding a spelling mistake or two would be a good way to prove you are a human (I think we never should have allowed the framing of CAPTCHA to be "prove you are not a robot", it was demeaning back then and still is now, it's just that the alternatives were not and still aren't clear-cut). The same things that would have made you fail a written essay in school are somehow becoming a requirement, but not in "haX0r" or online communities where "writing funny" has always been a differentiating factor, but for absolutely everybody who has to communicate with others in written form.
It's of course not a surprise that an LLM would be most proficient in language use and, adjacent to that, in proper formatting of said language. But it's a good thing and a good tool for writing, as anyone who has ever used a classic spell or grammar checker will attest to. But apparently we as a society have once again managed to completely overlook and demonise the good and now people who have paid attention in school have to bow to people who are somehow convinced that perfect spelling is a sign that someone cheated. This is not LLMs' fault, it's people's who think they've understood something when they really haven't, crying heresy over others doing things the correct way.
That being said: of course there are social and technological challenges with cheating, spam bots and sock puppets and what not, but the phenomenon itself is not really new, just the scale, cost and quality is way different now. We need to find a balanced way to approach it -- trying to weed out every last possible AI cheater while hurting real innocent people in the process is not worth it. Especially since we don't have a proper metric to actually prove who's a cheater and who is not, it's gotten way harder since the days of "As a large language model" being in every second sentence.
I felt it quite a while back (more than 10 years ago), when, in high school, I learnt LaTeX and discovered Beamer. I naturally proceeded to make all of my presentations with it, including the rehearsal for a big competitive French exam. The person reviewing the presentation advised me to dirty it up a bit, otherwise nobody would believe that my father wasn’t a PhD researcher that did the work for me.
That was a bit saddening honestly. I kept the presentation as-is as I didn’t knew how to willfully screw up a Beamer presentation, and I would not touch PowerPoint (fortunately the final jury believed me).
Cheating had always been an issue before LLMs, but now we’re back to the same old tricks: just make sure to add a mistake or two to hide you copied the homework on your neighbor. It’s a shame because I kinda like learning the subtleties of foreign languages, and as a non-native English speaker, it’s quite rewarding when going online!
TBH, i've largely stopped correcting any spelling or grammar mistakes in my communcations as a way to assert I am a human.
If we are ok with flooding the world with AI generated software. I find it funny to reject the increase of comments or even articles written by AI. Can't have the cake and eat it too or something like that
Listen, I fully support your right to buy and use whatever you want from Priscilla’s or Adam & Eve. Keep it consensual and not in public view though, okay?
AI use is similar. Ask it to do whatever writing or text wrangling you want, but please show the public the sanitized version.
I'm just going to continue to mis-use the en-dash like I've always done.
Related:
Show HN: Hacker News em dash user leaderboard pre-ChatGPT - https://news.ycombinator.com/item?id=45071722 - Aug 2025 (266 comments)
... which I'm proud to say originated here: https://news.ycombinator.com/item?id=45046883.
I used to love using em-dashes in my texts, especially in titles. Now I am way too afraid of appearing as using an LLM while I do my best to redact everything by myself :')
Bye bye em-dash, we had a nice run together.
I might start using that⸻one (a bit long...)
As someone who has the key combos Alt-0150 and Alt-0151 saved in muscle memory I feel offended by being compared to a machine.
I get the punchline here but is there possibly some sort of Streisand effect where real people now are more inclined to use an em dash?
I think people are now less inclined to use an em dash, because they don’t want to be mistaken for an LLM.
Makes me wonder the ratio between LLM commenters versus those aligning with an LLMs syntax.
Not sure which is scarier
I had no idea what I was using were called “EM-dashes” until the AI bubble. I just used them to reflect pauses in my speech for tangents - an old habit from my IRC days.
Incidentally, some folks reported my stuff for potential AI generation and I had to respond to the mods about it. So that was kinda funny, if also sad to hear that some folks thought I was a bot.
I’m a dinosaur, not a robot dinosaur. I’m nowhere near that cool, alas.
But the em-dash is a different character. I think even those that use a pause would just opt for - on their keyboard, whereas the em-dash — requires additional work on most (all?) keyboard layouts. It's _not_ more work for an AI though hence why it's a tell.
How did you make the character without googling it?
> I just used them to reflect pauses in my speech for tangents - and old habit from my IRC days.
The tell here is that you used a hyphen, not an em-dash.
Okay, see, that's context even I forget, but you're right and bears repeating:
This `-` is a hyphen, which I love, even if I'm fairly sure I'm not using it correctly in grammar a lot of the time.
This `--` is an EM-Dash, apparently, which is also what I never use but I also thought was just a hyphen in a different context (incorrect!).
No, there are actually four different punctuation marks, all which look remarkably similar to the untrained eye.
1. We have the hyphen, which is most commonly used to create multi-part words, such as one-and-one-thousand.
2. We have the EN-DASH, which is most commonly used to denote spans of ranges. As an example, Barack Obama was President 2009–2017.
3. Then we have the recently maligned EM-DASH, which can be used in place of a variety of other punctuation marks, such as commas, colons, and parentheses. Very frequently, AI will use the em-dash as a way to separate two clauses and provide forward motion. AI uses it for the same reason that writers do: the em-dash is just a nicer punctuation mark compared to the colon.
4. Lastly, we have the minus sign, which is slightly different than the hyphen, though on most keyboards they're combined into the hyphen-minus.
By the by, they're called the em-dash and the en-dash because they match the length of an uppercase M or N, respectively.
I am so here for this lesson in punctuation and grammar right now. One of today’s lucky 10,000.
It is probably even a hyphen-minus, so called because on most early keyboards one character had to do to represent both a hyphen and a minus. In Unicode, there is a separate code point for an unambiguous hyphen. There is also a non-breaking hyphen as well as the various dashes discussed here.
And "--" is absolutely just two hyphen-minuses, not an em-dash (—).
As a typography nerd, I’m upset that my pedantism may get me labelled as a bot. (Yes, I just used a typographic apostrophe instead of a straight single quote.)
At least the asterism is still safe.
Yeah, same. I use an extended keyboard layout on my PC. I'm so used to it I have to actively decide against using proper quotes and dashes and whatnot. I don't bother on mobile, though.
Every time someone states they stop reading when they encounter proper typography, I feel attacked.
Poor poor those typography-savvy people who did set a special keyboard in order to type "proper" dashes. I know you are there, I know your pain.
No fancy keyboard required, just a keystroke on Mac (`alt+shift+-`) and Linux (`right alt+something` depending on your distro).
https://news.ycombinator.com/classic is every day more compelling.
I learned just right now that this isn't the default. I set my bookmark to HN in like 2011 before making an account, and apparently it's that one. I didn't realize that wasn't just the basic homepage but with a weird address for some reason.
what is `/classic`?
HN home page compiled only counting votes of old accounts.
what qualifies as old?
I call it stylometric---obfuscation!
I would like to formally petition that the tech world at large replace "em-dash" with "clank" in all correspondence
like bang instead of exclamation point? or dot instead of period? I like it.
even though I used to like pointing out the difference between a hyphen and a period.
It makes it much more fun to imagine a room full of robots in overcoats trying to pass off as human, but doing a terrible job due to the audible "clanks" betraying them from beneath the coat.
Spaces like HN then become a cacophony of clankers clanking as their numbers increase
I think they will remake the Japanese horror film Matango but instead of fungi, it will be those that use EM dashes to survive.
damnit, I just happen to like the look of the thing. now everyone thinks I'm AI for pausing in my thoughts as I write—as if I were human...
Not everyone speaks English natively—most have relied on Google Translate in the past, and now they turn to AI tools. There's nothing wrong with that, provided they're using these tools to express their own thoughts rather than for spam or botting.
^ And that's what i just did
I wonder if some people here considered me ai at some point
Why? What's the incentive/value to commenting here with AI?
If you control a bunch of established accounts, you can use them to either shill for products, or upvote certain topics.
- Spam a product/service
- Generate age so spamming a product/service is easier and the account appears more trustworthy
- Influence discussions in a particular direction for monetary gain, i.e. "I got rich on bitcoin, you'd be crazy not to invest".
- Influence discussions in a particular direction for political gain, i.e. "I went to Xinjiang and the Uyghurs couldn't be happier!"
This is like how I noticed the increase in thoughtbros on LinkedIn posting longform writing since the advent of LLMs..
TBH, I learned about how to use em dashes from the AI controversy and now I find them really useful.
I just hope my writing carries enough voice and perspective that people respond, even if there's an em dash or two.
As an AI language model, I am not able to perform dashes.
It has been obvious since ChatGPT that the internet, including HN, will be flooded with AI generated commentary, drowning out real peoples' voices (soon undetectable). How this is surprising to anyone is a mystery.
10x more likely to use EM-dashes -- built in Rust?
AI has taken this from me—I will never forgive.
But seriously, I loved the em-dash and now every time I use it (which is too often) I have to wonder if my words will immediately be written off.
Good thing I prefer en-dashes :)
Off-topic, tangentially:
Can we generate a huge amount of code, just compilable code, which is essentially just a trash. We seed the github, bitbucket, etc. and pollute the training grounds.
The fear is that AI-generated comments will collectively promote an agenda, often a political or exploitative agenda, on a scale that humans can't match or hope to counter.
What could help is a careful clique hunting algorithm to accurately identify and delete the entire clique.
Paid actors, regular people and primitive bots are already doing so plentifully and successfully.
Of course, all of the above can be replaced by AI, but it would not significantly alter the status quo.
I feel a sort of disappointment in how easily languages got swindled. There is seemingly no winning angle this time. This is the most doomed I've ever felt.
I have "—" bound to AltGR/right option + "-" for a decade now and I don't intend to stop using it.
https://practicaltypography.com/hyphens-and-dashes.html
I will not allow my good practices to get co-opted as AI "smoke tests".
whats the point of botting comments on HN? can someone explain?
This user [0] is clearly a bot and has been shadowbanned but some of it's comments get vouched because they're pretty good. I don't see how you solve that problem!
[0] https://news.ycombinator.com/user?id=octoclaw
This is pretty damning. It would be interesting to see if new accounts collect karma at any rate whatsoever.
Karma aside, flooding the comments with a chosen narrative via army of bots seems like it's already happening. I suppose the bots can also do voting rings, but they don't necessarily need to.
> Karma aside, flooding the comments with a chosen narrative via army of bots seems like it's already happening.
again with the conspiracy theories
> created: 83 days ago
I dunno, I agree. It sounds conspiratorial.
But who knows, maybe even 17 year old accounts are being hijacked by AI now too.
People do conspire you know.
> again with the conspiracy theories
Yeah, right? Not one ever actually turned out to be true!
That conspiracy about billionaires, who supposedly own all of western media, having deliberately created an environment in which anyone who expresses even the remote idea of a conspiracy, gets discreditted, is also not true!
None of them are true!
Not. A. Single. One.
*noms cheese pizza*
Would be interesting to see "fastest growing accounts in last N months" or something similar. I'm guessing the ones that are actually humans would be closer to the top than the bottom, but maybe HN users aren't better than the average person to detect AI or not.
I did on my old account, but dang just banned it because I said my older accounts got banned for opposing AI.
Troll farms hastily adding to their init prompts "don't use emdash when writing comments"
doesn't really mean anything, Mac randomly autocorrects dashes to em-dashes (caused me a world of pain once when it did that in a GUID in a config file)
Are you saying new accounts are 10x more likely to be using macs? That would be quite a thesis.
It's a predictable outcome, and it will get worse.
What will/can HN do about it?
It can crank proof of work schemes to maximum, something like you need to burn 15-20 minutes 16 core cpu to post a single comment. It will be infuriating for users, but not cheap for bots
One solution is to get rid of anonymity online, enforce validation of identity. Every human only gets 1 account. And then we still ban people that use AI. Might take a bit but eventually we'll have filtered out all the grifters.
If that's worth the cost... probably not?
Getting rid of anonymity is in time going to lead to getting rid of the platform, so do it if you're feeling suicidal. People seek real anonymity for good reason. Not everything should follow them in life or for life.
I've been wondering too, what the solution would be. IF the bots were actually helpful, I wouldn't care, but they always push an agenda, create noise, or derail discussions instead.
For now maybe all forums should require some bloody swearing in each comment to at least prove you've got some damn human borne annoyance in you? It might even work against the big players for a little bit, because they have an incentive to have their LLMs not swearing. The monetary reward is after all in sounding professional.
Easy enough for any groups to overcome of course, but at least it'd be amusing for a while. Just watching the swear-farms getting set up in lower paid countries, mistakes being made by the large companies when using the "swearing enabled" models and all that.
Check my history, I get downvoted to hell everytime I truthfully point out AI slop.
Something about correlation and causation of magic gotcha signals. Text may appear generated to a reader but there's no smoking gun evidence that can disambiguate fact from hypothesis. Even intuition isn't evidence.
Perhaps there needs to be some sort of voluntary ethical disclosure practice to disclaim text as AI-generated with some sort of unusual signifiers. „Lower double quotes perhaps?„
Anyone have a lobste.rs invite?
How many of those are bots and how many of those are "fuck you, clankers" humans—like me?
Taking back the emdash — fight the power.
> How many of those are bots and how many of those are "fuck you, clankers" humans—like me?
Maybe the em dash is the self censorship/deletion mechanism that we've all been waiting for. Better than having to write pill subscription ads, I suppose.
dang, you should consider this an existential threat to hn.
I hate myself for saying this, but HN should consider closing new registrations for a while until we figure out what to do with this.
Wow. This made me laugh far harder than I would have thought it would. Just wow.