Skip to main content
  1. Posts/

Reading Binary Files

·372 words·2 mins·
Low-Level Programming

Some files in a computer system are written for humans and contain text.

% file /etc/hosts
/etc/hosts: ASCII text

But many other files are made for the computer to execute, and it isn’t possible to read them using a tool like cat.

% cat /bin/ls | head
����@�
      ��Z������

This is because they are binary files

% file /bin/ls
/bin/ls: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e:Mach-O 64-bit executable arm64e]
/bin/ls (for architecture x86_64):    Mach-O 64-bit executable x86_64
/bin/ls (for architecture arm64e):    Mach-O 64-bit executable arm64e

Yes, the ls executable in MacOS can be run either on M or Intel chipsets 🤯. This is a technique called Universal Binary, which merges the executable of both architectures. Apple uses it to transition between architectures, which is indeed convenient, but it also makes the files larger in size.

However, it is possible to read them using a tool like hexdump

hexdump -C /bin/ls | head
00000000  ca fe ba be 00 00 00 02  01 00 00 07 00 00 00 03  |................|
00000010  00 00 40 00 00 01 1c c0  00 00 00 0e 01 00 00 0c  |..@.............|
00000020  80 00 00 02 00 01 80 00  00 01 5a f0 00 00 00 0e  |..........Z.....|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The left letter of each pair is the high 4 bits and the second letter the lower 4 bits. Not all bytes represent a visible character, so I’m going to take 40, which represents the @ symbol. When split, the hexadecimal 4 can be represented as 0100 in binary and 0 as 0000. Merged back together forms the binary number 01000000, or 64 in decimal. We can validate this on an ASCII table like the one below.

stateDiagram-v2 40 --> 4 40 --> 0 4 --> 0100 0 --> 0000 0100 --> 01000000 0000 --> 01000000 01000000 --> 64 64
DECHEXBINASCII Symbol
633F00111111?
644001000000@
654101000001A

Table source: https://www.ascii-code.com/

Hexdumpje
#

To understand better how this works, I wrote a basic version of hexdump. The source code can be found on https://github.com/mauromorales/hexdumpje

Reply by Email

Related

My Personal Experience Using AI
·1428 words·7 mins
Developer Tools

There’s been a gigantic buzz around AI for a while now. Unless you’re living under a rock, it’s hard not to get hit by this topic. So, a month or two back, I decided to finally give it an honest shot and see if AI can bring any benefits to my work or personal life.

Remote Setup with EdgeVPN
·208 words·1 min
System Administration

Last week I started using my old 13" laptop and left the bulky 15" workstation permanently at my desk. This setup gives me portability without loosing power when I’m connected to my home network. Today, I decided to configure EdgeVPN on both devices to also have this setup while on the road.

Deploying a Go Microservice in Kubernetes
·1427 words·7 mins
Application Development

Most of my experience with web applications is with monoliths deployed to a PaaS or using configuration management tools to traditional servers. Last week, I submerged myself in a different paradigm, the one of microservices. In this post, I’m going to share what I learned by deploying a Go application on top of Kubernetes.