I think the idea of all these is to make the file not be recognised as text (which doesn't allow nulls), ASCII (which doesn't use the high bit), UTF-8 (which doesn't allow invalid UTF-8 sequences).
Basically so that no valid file in this binary format will be incorrectly misidentified as a text file.
I think what the author likes is the fact that the first 4 bytes are defined as 0x7F followed by the file extension "ELF" in ASCII, which makes it a quite robust identifier.
And to be fair, including the 4 byte following the magic number make the ELF-format qualify at least 3 out of the 4 'MUST' requirements:
_ 7F 45 4C 46
- 0x04: Either 01 or 02 (defines 32bit or 64bit)
- 0x05: Either 01 or 02 (defines Little Endian or Big Endian)
> For every system to be able to parse it without loading the entire file
It also solves the ambiguity problem, zip files have the magic numbers at the end, and most other files like pdf have the magic numbers at the beginning, so you can have a file that is both a pdf and a zip file.
I think the idea of all these is to make the file not be recognised as text (which doesn't allow nulls), ASCII (which doesn't use the high bit), UTF-8 (which doesn't allow invalid UTF-8 sequences).
Basically so that no valid file in this binary format will be incorrectly misidentified as a text file.
The author explains their reasoning in the next post: https://hackers.town/@zwol/114155807716413069
Well, the 3rd point follows from the second: all sequences without the high bit set are valid ASCII, and all valid ASCII sequences are valid UTF-8.
Why is ELF a good example?
- MUST be the very first N bytes in the file -> check- MUST be at least four bytes long, eight is better -> check, but only four
- MUST include at least one byte with the high bit set -> nope
- MUST include a byte sequence that is invalid UTF-8 -> nope
- SHOULD include a zero byte -> nope
So, just 1.5 out of 5. Not good.
By the way, does anyone know the reason it starts with DEL (7F) specifically?
I think what the author likes is the fact that the first 4 bytes are defined as 0x7F followed by the file extension "ELF" in ASCII, which makes it a quite robust identifier.
And to be fair, including the 4 byte following the magic number make the ELF-format qualify at least 3 out of the 4 'MUST' requirements:
_ 7F 45 4C 46
- 0x04: Either 01 or 02 (defines 32bit or 64bit)
- 0x05: Either 01 or 02 (defines Little Endian or Big Endian)
- 0x06: Set to 01 (ELF-version)
- 0x07: 00~12 (Target OS ABI)
Still not a shiny example though...
It's (7F) ELF
Hmm, I would expect that to be 31F, if it stood for "ELF" in correct Hexspeak.
As anyone able to break down why those requirements are desirable?
From the top of my head, most are to make it as clear as possible that the file is binary and NOT text:
> MUST be the very first N bytes in the file
For every system to be able to parse it without loading the entire file
> MUST be at least four bytes long, eight is better
To reduce risk of two different binary files on the same system having the same magic number
> MUST include at least one byte with the high bit set
To avoid wrongful identification as an ASCII file (ASCII doesn't use the high bit)
> MUST include a byte sequence that is invalid UTF-8
To avoid wrongful identification as UTF-8 text file
> SHOULD include a zero byte
To avoid wrongful identification as ANY text file
>> MUST be the very first N bytes in the file
> For every system to be able to parse it without loading the entire file
It also solves the ambiguity problem, zip files have the magic numbers at the end, and most other files like pdf have the magic numbers at the beginning, so you can have a file that is both a pdf and a zip file.