I see delightful unintended layer of irony in a fact that what the page really uses for the Emoji display are in fact image elements: JavaScript on that page replaces Unicode text with them. So the main heading content
It’s Not Wrong that “[that formidable facepalm]”.length == 36
(sic syntactically wrong quotes) is in reality (for JS-capable and enabled clients) presented as
<h1 class="wp-block-post-title">It’s Not Wrong tha
t (for HN) “<img draggable="false" role="img"
class="emoji" alt="[that formidable facepalm]" sr
c="https://s.w.org/images/core/emoji/16.0.1/svg/1f
926-1f3fc-200d-2642-fe0f.svg">”.length == 36</h1>
(arbitrary line breaks added for convenience). Here the "true" `.length` of the (scare)quoted content is: 144.
---
This comment is brought to you thanks: "View Selection Source" context menu entry in Firefox.
Grumpy old guy: Can we just stop with new unicode characters? We don't need to be able to capture every human thought or concept in a character. Feels like the Unicode Consortium is chaired by Funes the Memorious.
The main pain for me is that most tools now sexualize the emojis: they force me to choose between :man-facepalm:, :woman-facepalm: and just :facepalm:.
For. Every. Single. Emoji.
I don't remember a case when I really wanted a sexualized version, I always want to express just an emotion. Just remove all the prefixed versions, and leave the pure one.
Since we now have :man-pregnant: emoji, in six skin colors no less, I fully expect all the animal emojis to also be gender-ized first, and pregnant-ized later.
I don’t want emojis in the headlines please. It makes it difficult to read and becomes an arms race for attention.
Also I’m not even sure it was a good idea to put them in text. Emojis are a special case that breaks a lot. Now you have to worry about multiple colors, etc.
Generally I agree with you, but in rare cases like the article that this one is meta-commentary to it might perhaps have been justifiable to allow it? I see the "slippery slope" risk though.
It definitely wasn't a good idea. Emojis aren't text, they don't belong in Unicode at all. It would be one thing to encode, say, Egyptian hieroglyphics. That would be legitimate. But putting emojis into Unicode was against the purpose of the standard and a huge mistake.
The line is really fuzzy honestly. We have some universal pictograms that are known and reasonably well understood around the world and the way they are used is pretty much a writing system. An icon of a man or women on a bathroom door? Well you may write it in one of a million different styles (fonts) but the general idea is used around the world as a common writing system. I'd say that belongs in unicode.
The real problem is that the alphabets of certain writing systems are unbounded. Emojis are completely unbounded. That's the only reason to have concern with it in unicode. Unicode is a limited set by definition and emojis are an unbounded set.
In my opinion the job of the Unicode Consortium would have been to encode what has significant and organic usage. Similarly to how Wikipedia only includes what has significant organic and externally validated coverage. If they'd stuck to that mission the line would have been a lot less fuzzy.
The problem with that is, of course, that "significant" is subjective.
The modern Western society is very occupied with the questions of racial and gender identity, and it is generally accepted in that society that this topic is "significant". And since it's that society that the Unicode Consortium is working within, this explains how you get six different colors of "man-pregnant" emoji in the world where there possibly haven't been six different-colored pregnant men.
Significant is only subjective in the heat of the moment and not much in retrospect. What I am arguing for is that the Unicode Consortium should only add characters with what Wikipedia would call notability.
I would like to stress that I am not arguing against the addition U+1FAC3 PREGNANT MAN or U+1FAC4 PREGNANT PERSON, there are good reasons to add these, but do we need mundane arbitrary everyday items line U+1FAA9 MIRROR BALL? I'd say no.
Actually, I don't think there's a good argument to add either U+1FAC3 PREGNANT MAN or U+1FAC4 PREGNANT PERSON: expressing gender can already be done with a modifier like how skin/hair color and professions are also expressed, and we already have U+1F930 PREGNANT WOMAN.
For example, this is the unicode sequence for bearded lady:
U+1F9D4 person with beard
U+200D zero width joiner
U+2640 female sign
So the the "pregnant man" could have simply been this expression:
U+1F930 pregnant person (woman is implied by lack of modifier)
U+200D zero width joiner
U+2642 male sign
But no, instead we must have this combinatorial explosion of compositions because Unicode can't decide if it wants to be a symbol library or an expression library. Now, we have U+1F40F ram and U+1F411 ewe, U+1F404 cow and U+1F402 bull, U+1F9D2 child and U+1F466 boy and U+1F467 girl (but a baby boy must be expressed as U+1F476 U+200D U+2642), U+1F468 man and U+1F469 woman and U+1F9D1 person, and U+1F385 Santa Claus and U+1F936 Mrs Claus but also U+1F9D1 U+200D U+1F384 non-gendered Claus.
Neither are digits, or control characters, strictly speaking. We really shouldn't have been able to have CR and LF explicitly embedded in the text files.
Neither are high and low surrogates - those are big ranges of code points that are illegal except for one specific (and not recommended) encoding (utf-16). Yet, there they will remain in Unicode.
Digits definitely are a form of text though. Unicode is for writing systems, which definitely includes writing numbers
CR & LF are in there for backwards-compatibility with ASCII. Similarly, the first emoji were include in Unicode for compatibility with some encoding systems used for SMS on Japanese mobile carriers. I wish the Unicode folks had drawn a hard line that they weren't going to add any more. If people wanted dingbats, they could go use a dingbats font.
What about "fancy fonts" (foreign characters that look like latin letters)? Japanese / Chinese ideographs? Common pictograms like "stop sign?" Mathematical symbols?
People made emoticons out of ~100 printable ASCII characters. With thousands of "real" Unicode symbols available, they would have gone wild anyway.
As a person with accessibility needs, I'm honestly glad emojis exist. They at least carry semantic meanings (though some people do abuse them in ways inconsistent with those meanings), unlike random combinations of symbols that the internet community has agreed on.
Unicode's original self-declared mission was to encode all characters needed for written communication in the world.
Wikipedia once had a similar issue, where people used it to add all kinds of trivia and original research. There was a fight between the so called inclusionists and deletionists. The latter won and we now have strict guidelines that ensure everything in Wikipedia has to have strong relevant external validation.
In my opinion, the Unicode Consortium would have been well advised to follow Wikipedia's example. If they really only had added characters with significant organic usage we'd seen only a much smaller number of emojis added and in my opinion to nobody's disadvantage.
But this is easy for me to say. I'm curious how emojis help with your accessibility needs. Has it to do with the fact that they take up little screen space or is it something else?
Honestly, those many duplicates of the Latin alphabet in the Unicode tick me off way more than the emojis. Serif, sans serif (also with bold, italic, and bold italic variants as well), fraktur and bold fraktur, cursive and bold cursive, monospace, and double-struck (which could reasonably count as bold monospace TBF). And there are several more proposed but not yet accepted.
If you hate emojis blame the Japanese telecoms that created it. Unicode strives to digitally encoded all human communication and the Japanese were using pictograms for communication before Unicode.
I'll take a stab as an armchair linguist: Hieroglyphs still represent written language, in the sense that each glyph represents an abstract sound, and glyphs can be composed to form words. Emoji do not represent language, in the sense that they do not have a vocalization, and emoji do not combine to form words.
Take, for example, the various skin colors for faces and persons: if emoji were a real ideographic script, the written representation would be a logograph combined with a determinative, not a set of distinct glyphs. The irony of course is that is exactly how it is encoded within Unicode (an emoji codepoint with a skin color modifier). But doing it this way is exactly why emoji is an illegitimate script: it does not represent any non-digital form of writing, and the emoji modifiers do not have any representation of themselves, neither visual nor audible. Nor is the modifier composable in the way that a real language would be: it does not modify animal colors, for example.
Heh. In my code, I always (idiosyncratically, I admit) spell '\x20' as '\x20' (or even just as 0x20, if it's C), unless it's a part of a multicharacter string e.g. "Hello world!": it just feels wrong to have an empty space inside single quotes. Is it really just U+0020 in there? Is it supposed to be U+0020 there? Silly worries, I know, but I just don't like the way ' ' looks.
Amazing! So if I ever were to write a blog post e.g. "Why ' ' != ' '", I guess I better title it "Why '\u2744\ufe0f' != '\x20'" to avoid confusion on HN.
If you are on Chrome, hover your mouse over the link to the article and look at the URL displayed in the bottom left corner (at least that's where Chrome puts it on my machine).
Example YouTube videos that have "no title":
- https://www.youtube.com/watch?v=K93zcgFsynk&ab_channel=Vsauc...
- https://www.youtube.com/watch?v=XrHTI04i9yk&ab_channel=%E2%8...
This is done by using invisible characters such as ZWNJ to get around the title filter.
I see delightful unintended layer of irony in a fact that what the page really uses for the Emoji display are in fact image elements: JavaScript on that page replaces Unicode text with them. So the main heading content
(sic syntactically wrong quotes) is in reality (for JS-capable and enabled clients) presented as (arbitrary line breaks added for convenience). Here the "true" `.length` of the (scare)quoted content is: 144.---
This comment is brought to you thanks: "View Selection Source" context menu entry in Firefox.
Great introduction, where's the rest of the article?
Thanks for fixing the title of this post! [U+1F642 slightly smiling face emoji] (I wrote the post)
Perhaps the title should be changed to:
It’s Not Wrong that (for HN) “[facepalm emoji]”.length == 36
Or some alternative if the above is too long, like:
“[facepalm emoji]”.length == 36
Both seem more accurate than the current:
It’s Not Wrong that (for HN) “ ”.length == 36
That's kind of the whole joke.
"[man with blue shirt facepalm emoji]".length == 36
Checks out.
Only trouble is, he may not be wearing a blue shirt. In my fonts, it’s more teal, and maybe not a shirt.
But I was definitely going to hunt for a description that made it 36 characters long.
It could be worse, in Windows 11 the hand does not seem to be from the same person than the face, see https://em-content.zobj.net/source/microsoft/407/person-face...
Or use "[facepalm emoji]" instead of the emoji itself or something.
Grumpy old guy: Can we just stop with new unicode characters? We don't need to be able to capture every human thought or concept in a character. Feels like the Unicode Consortium is chaired by Funes the Memorious.
The main pain for me is that most tools now sexualize the emojis: they force me to choose between :man-facepalm:, :woman-facepalm: and just :facepalm:.
For. Every. Single. Emoji.
I don't remember a case when I really wanted a sexualized version, I always want to express just an emotion. Just remove all the prefixed versions, and leave the pure one.
Since we now have :man-pregnant: emoji, in six skin colors no less, I fully expect all the animal emojis to also be gender-ized first, and pregnant-ized later.
Thank you I think I will use that the next time I had a really filling meal.
That one is vastly more frequently used to mean "I'm so full after eating", just like the peach emoji is more used to mean butt than the fruit.
In everyday speak "sexualize" means to make it sexy, so unless you are finding the emojis sexy you probably meant "genderize" or something alike.
Yep, I meant "genderize", thank you.
> “[facepalm emoji]”.length
Are there any even mildly-popular languages that use, or allow, curly quotes for strings? I’d kinda like there to be at least one.
Perl has quote operators which come close, but they start with a letter followed by your choice of delimiter:
Like as delimiters? Ruby kind of does, in that:
is one way to write a string literal.Only if you're counting bytes
I don’t want emojis in the headlines please. It makes it difficult to read and becomes an arms race for attention.
Also I’m not even sure it was a good idea to put them in text. Emojis are a special case that breaks a lot. Now you have to worry about multiple colors, etc.
It pisses me off that Unicode retroactively changed the meaning of existing text by turning some symbols into emoji. So much for stability guarantees.
I emailed the site to see what the policy is.
Generally I agree with you, but in rare cases like the article that this one is meta-commentary to it might perhaps have been justifiable to allow it? I see the "slippery slope" risk though.
A huge and growing percent of all realtime communication is happening via text.
It is reasonable and worthwhile to encode some nonverbal information in it, and emojis have won the day.
It definitely wasn't a good idea. Emojis aren't text, they don't belong in Unicode at all. It would be one thing to encode, say, Egyptian hieroglyphics. That would be legitimate. But putting emojis into Unicode was against the purpose of the standard and a huge mistake.
The line is really fuzzy honestly. We have some universal pictograms that are known and reasonably well understood around the world and the way they are used is pretty much a writing system. An icon of a man or women on a bathroom door? Well you may write it in one of a million different styles (fonts) but the general idea is used around the world as a common writing system. I'd say that belongs in unicode.
The real problem is that the alphabets of certain writing systems are unbounded. Emojis are completely unbounded. That's the only reason to have concern with it in unicode. Unicode is a limited set by definition and emojis are an unbounded set.
In my opinion the job of the Unicode Consortium would have been to encode what has significant and organic usage. Similarly to how Wikipedia only includes what has significant organic and externally validated coverage. If they'd stuck to that mission the line would have been a lot less fuzzy.
The problem with that is, of course, that "significant" is subjective.
The modern Western society is very occupied with the questions of racial and gender identity, and it is generally accepted in that society that this topic is "significant". And since it's that society that the Unicode Consortium is working within, this explains how you get six different colors of "man-pregnant" emoji in the world where there possibly haven't been six different-colored pregnant men.
Significant is only subjective in the heat of the moment and not much in retrospect. What I am arguing for is that the Unicode Consortium should only add characters with what Wikipedia would call notability.
I would like to stress that I am not arguing against the addition U+1FAC3 PREGNANT MAN or U+1FAC4 PREGNANT PERSON, there are good reasons to add these, but do we need mundane arbitrary everyday items line U+1FAA9 MIRROR BALL? I'd say no.
Actually, I don't think there's a good argument to add either U+1FAC3 PREGNANT MAN or U+1FAC4 PREGNANT PERSON: expressing gender can already be done with a modifier like how skin/hair color and professions are also expressed, and we already have U+1F930 PREGNANT WOMAN.
For example, this is the unicode sequence for bearded lady:
So the the "pregnant man" could have simply been this expression: But no, instead we must have this combinatorial explosion of compositions because Unicode can't decide if it wants to be a symbol library or an expression library. Now, we have U+1F40F ram and U+1F411 ewe, U+1F404 cow and U+1F402 bull, U+1F9D2 child and U+1F466 boy and U+1F467 girl (but a baby boy must be expressed as U+1F476 U+200D U+2642), U+1F468 man and U+1F469 woman and U+1F9D1 person, and U+1F385 Santa Claus and U+1F936 Mrs Claus but also U+1F9D1 U+200D U+1F384 non-gendered Claus.> Emojis aren't text
Neither are digits, or control characters, strictly speaking. We really shouldn't have been able to have CR and LF explicitly embedded in the text files.
Neither are high and low surrogates - those are big ranges of code points that are illegal except for one specific (and not recommended) encoding (utf-16). Yet, there they will remain in Unicode.
Digits definitely are a form of text though. Unicode is for writing systems, which definitely includes writing numbers
CR & LF are in there for backwards-compatibility with ASCII. Similarly, the first emoji were include in Unicode for compatibility with some encoding systems used for SMS on Japanese mobile carriers. I wish the Unicode folks had drawn a hard line that they weren't going to add any more. If people wanted dingbats, they could go use a dingbats font.
Where do you draw the line?
What about "fancy fonts" (foreign characters that look like latin letters)? Japanese / Chinese ideographs? Common pictograms like "stop sign?" Mathematical symbols?
People made emoticons out of ~100 printable ASCII characters. With thousands of "real" Unicode symbols available, they would have gone wild anyway.
As a person with accessibility needs, I'm honestly glad emojis exist. They at least carry semantic meanings (though some people do abuse them in ways inconsistent with those meanings), unlike random combinations of symbols that the internet community has agreed on.
"Where do you draw the line?"
Unicode's original self-declared mission was to encode all characters needed for written communication in the world.
Wikipedia once had a similar issue, where people used it to add all kinds of trivia and original research. There was a fight between the so called inclusionists and deletionists. The latter won and we now have strict guidelines that ensure everything in Wikipedia has to have strong relevant external validation.
In my opinion, the Unicode Consortium would have been well advised to follow Wikipedia's example. If they really only had added characters with significant organic usage we'd seen only a much smaller number of emojis added and in my opinion to nobody's disadvantage.
But this is easy for me to say. I'm curious how emojis help with your accessibility needs. Has it to do with the fact that they take up little screen space or is it something else?
Fancy fonts are not multi-colored graphics.
Honestly, those many duplicates of the Latin alphabet in the Unicode tick me off way more than the emojis. Serif, sans serif (also with bold, italic, and bold italic variants as well), fraktur and bold fraktur, cursive and bold cursive, monospace, and double-struck (which could reasonably count as bold monospace TBF). And there are several more proposed but not yet accepted.
If you hate emojis blame the Japanese telecoms that created it. Unicode strives to digitally encoded all human communication and the Japanese were using pictograms for communication before Unicode.
Pictograms are not multi-colored images.
Why would Egyptian hieroglyphics be legitimate, and emojis illegitimate?
I'll take a stab as an armchair linguist: Hieroglyphs still represent written language, in the sense that each glyph represents an abstract sound, and glyphs can be composed to form words. Emoji do not represent language, in the sense that they do not have a vocalization, and emoji do not combine to form words.
Take, for example, the various skin colors for faces and persons: if emoji were a real ideographic script, the written representation would be a logograph combined with a determinative, not a set of distinct glyphs. The irony of course is that is exactly how it is encoded within Unicode (an emoji codepoint with a skin color modifier). But doing it this way is exactly why emoji is an illegitimate script: it does not represent any non-digital form of writing, and the emoji modifiers do not have any representation of themselves, neither visual nor audible. Nor is the modifier composable in the way that a real language would be: it does not modify animal colors, for example.
Because they are one color. Like a font, their defining property is shape.
It's been a huge success, so hard to portray it as a mistake.
this is what they said about social media and fentanyl
Well, we wouldn't want to call fentanyl a mistake - it's a miracle of pain relief used in hospital and pre-hospital medicine all over the world.
Jury's still out on social media, but not definitely not emoji. Incontestable success story
These articles are about emojis and how they work, I don't see how having one in the title is a problem.
> " ".length == 36
Heh. In my code, I always (idiosyncratically, I admit) spell '\x20' as '\x20' (or even just as 0x20, if it's C), unless it's a part of a multicharacter string e.g. "Hello world!": it just feels wrong to have an empty space inside single quotes. Is it really just U+0020 in there? Is it supposed to be U+0020 there? Silly worries, I know, but I just don't like the way ' ' looks.
You should read the article...
Is the article about HN stripping emojis (and some other Unicode code points) from the title of another HN submission?
Yes, it is.
Amazing! So if I ever were to write a blog post e.g. "Why ' ' != ' '", I guess I better title it "Why '\u2744\ufe0f' != '\x20'" to avoid confusion on HN.
Edit: as demonstrated by this very comment.
If you are on Chrome, hover your mouse over the link to the article and look at the URL displayed in the bottom left corner (at least that's where Chrome puts it on my machine).
A good text editor will make all whitespace variants visually distinct.