reading binaries by hand PART 1
RSS github
ELF PE Zig 0.16.0

Some Ramainder from Previous Post:

The first 64 bytes of the file... It says everything and done :)

where the section table is, where the program header is how many there are, how many bytes each.

Actually here 64 ı used because of 64bit ELF file on 32bit it is a 52 when we are reading. Here 52 some weird coming right ? This is why extern struct coming -> C_ABI compliance Now you know reason.. We are saying compiler dont change anything here dont add any padding. Padding issue is really dangerous if the compiler inserts even one byte between fields, every offset calculation is wrong and you're reading garbage. and now you know why ı am crying for 2 weeks.. ( align issue also is one reason )

How We MUST Read

First 16 bytes is coming to e_indent so ı will show here as e_indent
First as above ı say first part of elf e_indent[0..4] here we saw \x7FELF Okay it is ELF File
Reading e_indent[4] it is showing class byte → ELF32 or ELF64
Reading e_indent[5] is endian part 1 is Little Endian and 2 is Big Endian
Reading e_ident[6] is EI_VERSION and always 1 I dont know why always 1 writing here data always 1 on spec.
Reading e_indent[7] is EI_OSABI here also generally 0 = System V (most common)
Other part is padding data

Let is read with : Format-Hex .\limon -Offset 0x00 -Count 16

Format-Hex .\limon -Offset 0x00 -Count 16
          Offset Bytes                                           Ascii
                 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
          ------ ----------------------------------------------- -----
0000000000000000 7F 45 4C 46 02 01 01 00 00 00 00 00 00 00 00 00 ▪ELF············

As you see here we read limon First 4 bytes 7F 45 4C 46 as ascii DEL ELF 4. byte is 02 is 64 bit 5. bit is 1 Little endian 6. bit is 1 (as always) and 7.bit is 1 System V

Quick note: on PE files(windows) this section is 2 step what ı mean ? first bytes is MZ when see MZ go to 0x3C why ? because here we have offset of PE\0\0 MZ is past capitality. normaly we need to check PE0\0. ELF files starting ascii DEL is almpost impossible on another file types to start with this code. So for ELF is okay but MZ can be on starting point of any other file type. soooo 2 step is needed for PE files. Explanation some weird but on PE files part will be more clear dont wory :)

Already first 16 bytes ı have already. Let is continue with another fields of ELF32Header or Elf64Header file Let is write : Format-Hex .\limon -Offset 16 -Count 16

Format-Hex .\limon -Offset 16 -Count 16
          Offset Bytes                                           Ascii
                 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
          ------ ----------------------------------------------- -----
0000000000000010 02 00 3E 00 01 00 00 00 00 DC 01 01 00 00 00 00 ▪ > ▪    Ü▪▪············

Address 0x10 address we have 02 it is e_type ET_EXEC
Address 0x12 address we have 3E it is e_machine 3E mean x86-64 (AMD64)
Address 0x14 address we have 01 it is e_version 01 mean ELF 1
Address 0x18-... address we have DC 01 01 it is e_entry DC 01 01 0x0101DCis address of start point of application exec
But be careful not 0xDC0101 reading little endian so 0x0101DC and how many byte we need to read here on 32 bit this part is u32 on 64bit u64

For another fields also ı will write this way but section and program table part ı will not go this way but logic is this Read here check from spec what it mean that is all

Okay let is continue for other parts of fields: Let is write :
Format-Hex .\limon -Offset 32 -Count 32 here all waiting part ı am getting

Format-Hex .\limon -Offset 32 -Count 32
          Offset Bytes                                           Ascii
                 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
          ------ ----------------------------------------------- -----
0000000000000020 40 00 00 00 00 00 00 00 58 7A 3E 00 00 00 00 00 @···········
0000000000000030 00 00 00 00 40 00 38 00 09 00 40 00 16 00 14 00 ··@········
ELF header — offset 0x20..0x3F (last 32 bytes)
Offset  Hex                       Field         Value
────────────────────────────────────────────────────────────────────
0x20    40 00 00 00 00 00 00 00   e_phoff       0x40 — program header table @ byte 64
0x28    58 7A 3E 00 00 00 00 00   e_shoff       0x3E7A58 — section header table offset
0x30    00 00 00 00               e_flags       0 (no arch flags)
0x34    40 00                     e_ehsize      64 bytes — ELF header size
0x36    38 00                     e_phentsize   56 bytes — per program header entry
0x38    09 00                     e_phnum       9 segments
0x3A    40 00                     e_shentsize   64 bytes — per section header entry
0x3C    16 00                     e_shnum       22 sections
0x3E    14 00                     e_shstrndx    20 — string table index

And some explanation is coming:
e_phoff / e_shoff — there are two different tables: Program Header Table (for runtime, the kernel reads this) and Section Header Table (for linking/debugging). These two fields give their byte offsets within the file. PHT starts immediately after the ELF header (0x40 = 64, the exact end of the header), while SHT is much far away then this (0x3E7A58 ≈ 4 MB).
e_flags — There are no x86-64 specific flags, so it's 0. Architectures like ARM can have things like ABI version, hard/soft float, etc. here.
e_ehsize — How many bytes is the ELF header itself? 64. This is a constant, always 64 in 64-bit ELF.
e_phentsize / e_phnum — Each entry in the PHT is 56 bytes, there are 9 in total. So the PHT = 9 × 56 = 504 bytes. The kernel reads these to determine which segment to load where and which are executable/readable/writable.

Note 1:
Here e_phentsize value is 38 and e_phnum adress is 38 e_phentsize is not offset value by chance came this way!!!

Note 2:
For e_phentsize / e_phnum 56 is not coming from this table e_phentsize value is 56 so we multipley by 56 and changing on 32bit and 64bit this is why get class value from file is important !!!

e_shentsize / e_shnum — Each entry in SHT is 64 bytes, with a total of 22 sections. Sections like .text, .data, .bss, .symtab, .rodata are listed here. The kernel doesn't look at these; it uses tools like ld / objdump / gdb.
e_shstrndx — The names of the 22 sections (strings like .text, .data) are stored in a separate section. This field answers the question "Which section index is the string table in?": The section at index 20 is .shstrtab. Again when ı want to learn a name of section ı will go to 20. index. why 20 ? -> 0x14 is 20 on decimal

Last Note for this part:)
This offset issues can be some diffifult so ı wrote a small application for this showing table sizes and printing offsets of file so directly you can check this value of offset For windows app is here and for linux here when you run app yu will see all size and offsets for this headers...
Output is this way:

offset
    elf.Elf32Header
    SIZE  : 52 bytes
ALIGN : 4
-----------------------------
e_ident       offset=0x00  size=16
e_type        offset=0x10  size=2
e_machine     offset=0x12  size=2
e_version     offset=0x14  size=4
e_entry       offset=0x18  size=4
e_phoff       offset=0x1C  size=4
e_shoff       offset=0x20  size=4
e_flags       offset=0x24  size=4
e_ehsize      offset=0x28  size=2
e_phentsize   offset=0x2A  size=2
e_phnum       offset=0x2C  size=2
e_shentsize   offset=0x2E  size=2
e_shnum       offset=0x30  size=2
e_shstrndx    offset=0x32  size=2
elf.Elf64Header
SIZE  : 64 bytes
ALIGN : 8
-----------------------------
e_ident       offset=0x00  size=16
e_type        offset=0x10  size=2
e_machine     offset=0x12  size=2
e_version     offset=0x14  size=4
e_entry       offset=0x18  size=8
e_phoff       offset=0x20  size=8
e_shoff       offset=0x28  size=8
e_flags       offset=0x30  size=4
e_ehsize      offset=0x34  size=2
e_phentsize   offset=0x36  size=2
e_phnum       offset=0x38  size=2
e_shentsize   offset=0x3A  size=2
e_shnum       offset=0x3C  size=2
e_shstrndx    offset=0x3E  size=2
. . .

Sooo this way you can know from which offset how many byte and what you will get!

And ı am continue to write Biber also on me looking like that :

.\zig-out\bin\Biber .\limon
CLASS      ELF64
DATA       LittleEndian
TYPE       ET_EXEC  (executable)
MACHINE    x86_64
Entry      0x0101DC00
P_OFFSET   0x40      (9 entries x 56 bytes = 504 bytes)
S_OFFSET   0x3E7A58  (22 entries x 64 bytes = 1408 bytes)
FLAGS      0x0

Program Table

Here already only what is what ı will explain a little faster ...

OKAY let is start: Format-Hex .\limon -Offset 0x40 -Count 504

Program Header Table — raw hex (0x40..0x237)
          Offset Bytes                                           Ascii
                 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
          ------ ----------------------------------------------- -----
0000000000000040 06 00 00 00 04 00 00 00 40 00 00 00 00 00 00 00  ▸seg0 PT_PHDR R
0000000000000050 40 00 00 01 00 00 00 00 40 00 00 01 00 00 00 00
0000000000000060 F8 01 00 00 00 00 00 00 F8 01 00 00 00 00 00 00
0000000000000070 08 00 00 00 00 00 00 00 01 00 00 00 04 00 00 00  ▸seg1 PT_LOAD R
0000000000000080 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00
0000000000000090 00 00 00 01 00 00 00 00 FC CB 01 00 00 00 00 00
00000000000000A0 FC CB 01 00 00 00 00 00 00 10 00 00 00 00 00 00
00000000000000B0 01 00 00 00 05 00 00 00 00 CC 01 00 00 00 00 00  ▸seg2 PT_LOAD R+X  ← kod/entry
00000000000000C0 00 DC 01 01 00 00 00 00 00 DC 01 01 00 00 00 00
00000000000000D0 39 94 06 00 00 00 00 00 39 94 06 00 00 00 00 00
00000000000000E0 00 10 00 00 00 00 00 00 01 00 00 00 06 00 00 00  ← seg2 p_align=0x1000 | ▸seg3 p_type=PT_LOAD p_flags=R+W
00000000000000F0 40 60 08 00 00 00 00 00 40 80 08 01 00 00 00 00
0000000000000100 40 80 08 01 00 00 00 00 08 00 00 00 00 00 00 00  segment 3 p_paddr | p_filesz=0x8 ← .bss
0000000000000110 C0 0F 00 00 00 00 00 00 00 10 00 00 00 00 00 00
0000000000000120 01 00 00 00 06 00 00 00 48 60 08 00 00 00 00 00  ▸seg4 PT_LOAD R+W  ← .bss
0000000000000130 48 90 08 01 00 00 00 00 48 90 08 01 00 00 00 00
0000000000000140 90 4D 00 00 00 00 00 00 B8 F0 00 00 00 00 00 00
0000000000000150 00 10 00 00 00 00 00 00 07 00 00 00 04 00 00 00  ▸seg5 PT_TLS R
0000000000000160 40 60 08 00 00 00 00 00 40 70 08 01 00 00 00 00
0000000000000170 40 70 08 01 00 00 00 00 00 00 00 00 00 00 00 00
0000000000000180 1C 00 04 00 00 00 00 00 08 00 00 00 00 00 00 00
0000000000000190 52 E5 74 64 04 00 00 00 40 60 08 00 00 00 00 00  ▸seg6 PT_GNU_RELRO R
00000000000001A0 40 80 08 01 00 00 00 00 40 80 08 01 00 00 00 00
00000000000001B0 08 00 00 00 00 00 00 00 C0 0F 00 00 00 00 00 00
00000000000001C0 01 00 00 00 00 00 00 00 50 E5 74 64 04 00 00 00  ▸seg7 PT_GNU_STACK NX
00000000000001D0 C0 84 01 00 00 00 00 00 C0 84 01 01 00 00 00 00
00000000000001E0 C0 84 01 01 00 00 00 00 8C 0A 00 00 00 00 00 00
00000000000001F0 8C 0A 00 00 00 00 00 00 04 00 00 00 00 00 00 00
0000000000000200 51 E5 74 64 06 00 00 00 00 00 00 00 00 00 00 00  ▸seg8 PT_GNU_PROPERTY R+W
0000000000000210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000000000000220 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00
0000000000000230 00 00 00 00 00 00 00 00

Every program header has these areas:

(1)On part 1 actually ı said but again this permissions not file permission they are on memory area permissions!!

For example address 0000000000000070 is segment1 : when this file is running kernel is thinking for this part read 0x69439 bytes from offset 0x1CC00 of file and write to 0x0101DC00 address of memory with R+X permissions.

And at this point ı need to say p_memsiz >= p_filesize what is mean? it is .bss issue for example you have a array of [256]u8 here 4*256 byte but on binary this size only placeholder on runtime app will need this area so ı run time for memory we need to get this value soo memsize directly will be bigger than p_size Look at the segment 3 p_filesz is only 8 byte but p_memsize is 4032 byte it is coming from .bss

on Biber ı am reading this part hust like that:

limon — full summary
CLASS      ELF64
    DATA       LittleEndian
    TYPE       ET_EXEC  (executable)
    MACHINE    x86_64
    Entry      0x0101DC00
    P_OFFSET   0x40      (9 entries x 56 bytes)
    S_OFFSET   0x3E7A58  (22 entries x 64 bytes)
    FLAGS      0x0
    
    PROGRAM HEADER
      Type           Flg  VAddr               PAddr               FileSize      MemSize
      --------------------------------------------------------------------------------------
      PHDR           R--  0x0000000001000040  0x0000000001000040  0x00000001F8  0x00000001F8
      LOAD           R--  0x0000000001000000  0x0000000001000000  0x000001CBFC  0x000001CBFC
      LOAD           R-X  0x000000000101DC00  0x000000000101DC00  0x0000069439  0x0000069439  ← entry
      LOAD           RW-  0x0000000001088040  0x0000000001088040  0x0000000008  0x0000000FC0  ← .bss
      LOAD           RW-  0x0000000001089048  0x0000000001089048  0x0000004D90  0x000000F0B8  ← .bss
      TLS            R--  0x0000000001087040  0x0000000001087040  0x0000000000  0x000004001C  ← thread local
      GNU_RELRO      R--  0x0000000001088040  0x0000000001088040  0x0000000008  0x0000000FC0  ← ro after load
      GNU_EH_FRAME   R--  0x00000000010184C0  0x00000000010184C0  0x0000000A8C  0x0000000A8C  ← exception info
      GNU_STACK      RW-  0x0000000000000000  0x0000000000000000  0x0000000000  0x0001000000  ← NX stack

Now let is check output:
P_OFFSET is 0x40 so program header starting from Here
Entry is 0x0101DC00 so it is starting point of our application here our code is running what ı mean ? on this memory area must have execute right so ı am ceheking second load address is 0x000000000101DC00 and permission is R-X READ and EXECUTE as expected.

For this part probably is enought to explain how you can output is correct this way and of course read spec all the time!

And already time is some lete ı hope tomorrow ı will add part 3 See you soon :)

References

Everything above can be verified against these primary sources.

When in doubt, go to the spec — not a blog post.

It is only what ı understand can be wrong !

ELF

PE

General