Some files in a computer system are written for humans and contain text.
% file /etc/hosts
/etc/hosts: ASCII text
But many other files are made for the computer to execute, and it isn’t possible to read them using a tool like cat.
% cat /bin/ls | head
����@�
��Z������
This is because they are binary files
% file /bin/ls
/bin/ls: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e:Mach-O 64-bit executable arm64e]
/bin/ls (for architecture x86_64): Mach-O 64-bit executable x86_64
/bin/ls (for architecture arm64e): Mach-O 64-bit executable arm64e
Yes, the ls executable in MacOS can be run either on M or Intel chipsets 🤯. This is a technique called Universal Binary, which merges the executable of both architectures. Apple uses it to transition between architectures, which is indeed convenient, but it also makes the files larger in size.
However, it is possible to read them using a tool like hexdump
hexdump -C /bin/ls | head
00000000 ca fe ba be 00 00 00 02 01 00 00 07 00 00 00 03 |................|
00000010 00 00 40 00 00 01 1c c0 00 00 00 0e 01 00 00 0c |..@.............|
00000020 80 00 00 02 00 01 80 00 00 01 5a f0 00 00 00 0e |..........Z.....|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
The left letter of each pair is the high 4 bits and the second letter the lower
4 bits. Not all bytes represent a visible character, so I’m going to take 40
,
which represents the @
symbol. When split, the hexadecimal 4
can be
represented as 0100
in binary and 0
as 0000
. Merged back together forms
the binary number 01000000
, or 64
in decimal. We can validate this on an
ASCII table like the one below.
DEC | HEX | BIN | ASCII Symbol |
---|---|---|---|
63 | 3F | 00111111 | ? |
64 | 40 | 01000000 | @ |
65 | 41 | 01000001 | A |
Table source: https://www.ascii-code.com/
Hexdumpje#
To understand better how this works, I wrote a basic version of hexdump. The source code can be found on https://github.com/mauromorales/hexdumpje
Reply by Email