I think the idea of all these is to make the file not be recognised as text (which doesn't allow nulls), ASCII (which doesn't use the high bit), UTF-8 (which doesn't allow invalid UTF-8 sequences).
Basically so that no valid file in this binary format will be incorrectly misidentified as a text file.
I think the idea of all these is to make the file not be recognised as text (which doesn't allow nulls), ASCII (which doesn't use the high bit), UTF-8 (which doesn't allow invalid UTF-8 sequences).
Basically so that no valid file in this binary format will be incorrectly misidentified as a text file.
The author explains their reasoning in the next post: https://hackers.town/@zwol/114155807716413069
Well, the 3rd point follows from the second: all sequences without the high bit set are valid ASCII, and all valid ASCII sequences are valid UTF-8.
As anyone able to break down why those requirements are desirable?
From the top of my head, most are to make it as clear as possible that the file is binary and NOT text:
> MUST be the very first N bytes in the file
For every system to be able to parse it without loading the entire file
> MUST be at least four bytes long, eight is better
To reduce risk of two different binary files on the same system having the same magic number
> MUST include at least one byte with the high bit set
To avoid wrongful identification as an ASCII file (ASCII doesn't use the high bit)
> MUST include a byte sequence that is invalid UTF-8
To avoid wrongful identification as UTF-8 text file
> SHOULD include a zero byte
To avoid wrongful identification as ANY text file
Why is ELF a good example?
- MUST be the very first N bytes in the file -> check- MUST be at least four bytes long, eight is better -> check, but only four
- MUST include at least one byte with the high bit set -> nope
- MUST include a byte sequence that is invalid UTF-8 -> nope
- SHOULD include a zero byte -> nope
So, just 1.5 out of 5. Not good.
By the way, does anyone know the reason it starts with DEL (7F) specifically?
It's (7F) ELF
Hmm, I would expect that to be 31F, if it stood for "ELF" in correct Hexspeak.