x64 ASM Fundamentals Intro – What’s Assembly? What’s x64? What’s going on?


The next few posts will be the beginning of a series of posts detailing how to go about writing 64 bit Intel Assembly code, which will in turn give you the tools to go about reverse engineering 64 bit applications (and 32 bit applications by extension 😊 ).

Learning to reverse engineer applications opens up a wealth of possibilities for you. You’ll gain the ability to – 

  • Introspect the inner workings of modern compiled binaries
  • Identify vulnerabilities in compiled binaries
  • Solve crackme challenges
  • Write the most efficient programs possible
  • Write your own shellcode
  • Communicate with the “bare metal” of your CPU
  • Look super smart
  • Intuitively understand at a high level what a piece of ARM assembly code is doing, without fully learning ARM assembly

With just a few basic nuggets of information, you’ll be able to understand the inner workings of so many applications.

What does 64 bit mean? Why does it matter?

A 64 bit CPU is capable of addressing 64 bit wide values in RAM, and because of this is capable of working with extremely large numbers..

A 64 bit CPU can count from either -9,223,372,036,854,775,807 to 9,223,372,036,854,775,807 (signed integer, from negative to positive) or from 0 to 18,446,744,073,709,551,615 (unsigned integer, from 0 to max value). These values are important, as they map to  263−1 and  264−1 respectively.

Essentially, 64 bit means that the CPU is capable of addressing a value which is 64 bits (or 8 bytes) wide.

This huge increase in the addressable range is the reason why 64 bit CPUs can address 16 exabytes of RAM, whereas 32 bit CPUs can only address 4 gigabytes of RAM maximum (The maximum unsigned 32 bit integer is 4,294,967,295, or 232−1)

Memory addresses in 64 bit computing are 8 bytes wide whereas in 32 bit assembly they are 4 bytes wide.

For example an address in 64 bit computing might look like 0x4142434445464748, whereas in 32 bit computing they look like 0x41424344. They are double the width of 32 bit addresses. This is an important distinction, and a dead giveaway when looking at a memory address.

What’s Assembly?

Basically, Assembly (henceforth abbreviated to ASM because I’m lazy), or more formally Intel’s ASM, is a programming language which is comprised of a large series of mnemonics which are assembled into opcodes, which are the raw instructions which are parsed by your computer’s processor (CPU) in order to execute an application.

I realise that this is a confusing, scary statement but it’s actually very simple. We frequently discuss how computers talk in “ones and zeroes”, which is true at the very, very lowest level. One layer above that though, your CPU is capable of reading a sequence of opcodes (operation codes) from RAM which give the CPU instructions. These opcodes are actually hexadecimal instructions, rather than ones and zeroes.

For example, telling the computer to move a value from a register to another register looks like the following in assembly –

mov eax, 0x0

That’s the mnemonic for this particular opcode –

b8 00 00 00 00

The the assembly instructions are assembled using an assembler into the opcodes listed above.

A disassembler converts those opcodes back into human readable mnemonics again.

I realise that this is a little scary, but we’re going to see some nice basic examples soon which explains things even further.

How is ASM created?

This is a key question.

You typically write your programs in a high level language such as C, C++, Go, Rust etc.

Programs written using these languages are compiled into an executable binary file, such as firefox, google-chrome, apt-get, cat.

This compilation process takes high-level-language and essentially reduces it down to ASM / opcodes (low-level-language) which the CPU can execute.

When you compile a C program with GCC, the following actions happen internally –

  • The code is preprocessed
  • The code is compiled, this turns the preprocessed instructions into ASM mnemonics
  • The code is then assembled which converts the ASM mnemonics down to opcodes
  • The code is then linked , which is the process of mashing all of the opcodes together with any dependent library code in order to produce an binary file which can be executed by the operating system.

This link explains the above 4 stages in much more detail, I highly recommend taking a look.

This is the typical process for generating ASM. It’s also possible to manually write, assemble and link your own program using the ASM language. We’re going to get into this in a couple of article’s time.

A practical example

Let’s generate some assembly. We’re not going to understand what it all means yet, but it will highlight how assembly is made.

Write a basic C program, like the following –

#include <stdio.h>

int main(int argc, char** argv){
    printf("ohexfortyone.com is the best!");

If you’re not too familiar with C programs, then just copy and paste the above. I won’t tell anyone, I promise 😉

Next up, run the following command (replace 001-helloworld.c with whatever you named your file, obviously) –

gcc -save-temps -masm=intel 001-helloworld.c -o 001-helloworld

This command should generate three new files – 

The .elf file is the final assembled binary. 

The .i file is the pre-processed file. It’s essentially the same as the .c file but with added info to help the compiler.

The .s file is the compiled set of ASM mnemonics. The contents of this file are – 

	.file	"001-helloworld.c"
	.intel_syntax noprefix
	.section	.rodata
	.string	"ohexfortyone.com is the best!"
	.globl	main
	.type	main, @function
	push	rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	mov	rbp, rsp
	.cfi_def_cfa_register 6
	sub	rsp, 16
	mov	DWORD PTR -4[rbp], edi
	mov	QWORD PTR -16[rbp], rsi
	lea	rdi, .LC0[rip]
	mov	eax, 0
	call	printf@PLT
	mov	eax, 0
	.cfi_def_cfa 7, 8
	.size	main, .-main
	.ident	"GCC: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0"
	.section	.note.GNU-stack,"",@progbits

THIS ^ is x64 ASM language, in all of its terrifying and verbose glory. The vast majority of this code is honestly irrelevant, and we’ll cover why shortly. Only this little section is relevant to our interests –

lea	rdi, .LC0[rip]
mov	eax, 0
call	printf@PLT
mov	eax, 0

These 5 lines tell the CPU to Load the Effective Address of a piece of data into a register (variable!), move the number 0 into another register (VARIABLE!) and call the printf function.

After calling printf, the code leaves the current function, with a return value of 0.

Simple, right? 😊

Just to prove that this isn’t magic, and I’m not lying, we’re going to disassemble the .elf file we just created to prove that the ASM above is what the application is comprised of.

On a Linux box, run “objdump -M intel-mnemonic -d 001-helloworld.elf” and scroll down until you see the main function. 


Notice that the assembly matches what we printed above? This stuff isn’t magic.

Also notice those weird characters to the left of our mnemonics? Those are the opcodes which the CPU executes. We can prove that by running XXD on the file and scrolling down to offset 0x0000064a (where the main function starts)

See, as mentioned above, at 0x64a (the beginning of the highlighting) we can see the opcodes listed above! 55, 48 89 e5, 48 83 ec 10, etc.

Again, this stuff isn’t magic by any means.

Closing thoughts

If you’ve stuck with it this far then good on you. If you’ve stuggled to keep up then honestly, that’s perfectly OK.

This stuff is hard. There’s no getting around it. Some very smart people thought up the beautiful, elegant and efficient assembly language many years ago, and there’s definitely a learning curve involved when trying to learn to read AND write it.

Don’t panic if you didn’t understand this introduction, the next 5 or so blog posts will start at the very, very basics of the assembly language and slowly ramp up to reverse engineering more complicated programs. It’s OK to not understand everything yet.

This stuff is important and will make you a more valuable InfoSec professional going forward. Keep at it, comment on here if you’re confused or DM me @ohexfortyone on the Twitters. You’ve got this.

Thanks for reading.

Add a Comment

Your email address will not be published. Required fields are marked *

I accept that my given data and my IP address is sent to a server in the USA only for the purpose of spam prevention through the Akismet program.More information on Akismet and GDPR.