Est. 1985 Advanced

Assembler (ARM)

Assembly language for the ARM architecture, the most widely deployed instruction set in the world, powering billions of mobile devices, embedded systems, and increasingly desktop and server platforms.

Created by Sophie Wilson and Steve Furber at Acorn Computers

Paradigm Assembly, Imperative, Low-level
Typing None (untyped)
First Appeared 1985
Latest Version ARMv9 (2021, with ongoing annual extensions)

Assembler (ARM) is the assembly language for the ARM processor architecture, the most widely deployed instruction set in the world by unit volume. First designed at Acorn Computers in Cambridge, England, the ARM instruction set was born from a desire to bring the elegance of RISC design principles to a practical, low-power processor. Since the first ARM1 chip powered up successfully on April 26, 1985, the architecture has grown from a niche design for British home computers to the foundation of virtually all smartphones, most embedded systems, and an increasing share of desktop and server computing. ARM assembly programming remains essential for firmware development, performance-critical embedded code, and low-level systems work across these platforms.

History & Origins

The Acorn RISC Machine (1983-1985)

In the early 1980s, Acorn Computers was a successful British computer company, best known for the BBC Micro. Looking to design a more powerful processor for their next generation of computers, Acorn engineers evaluated existing 16-bit and 32-bit processors but found them unsuitable — too complex, too slow, or too expensive for their needs.

Sophie Wilson, a computer scientist at Acorn who had previously designed BBC BASIC, began studying the academic RISC research emerging from the University of California, Berkeley. The Berkeley RISC project, led by David Patterson, demonstrated that a processor with a small set of simple, uniform instructions could outperform complex processors with large instruction sets. Wilson was also deeply influenced by her experience with the MOS 6502 processor used in the BBC Micro — a simple, efficient chip with fast interrupt response and a hardwired (non-microcoded) design.

In October 1983, Wilson began designing the ARM instruction set, while Steve Furber took charge of the hardware implementation. The project was remarkably small: the core design team consisted of just Wilson and Furber, with support from a handful of other Acorn engineers. Wilson wrote an instruction set simulator on a BBC Micro to validate the design before any silicon was fabricated.

The first ARM1 chip was fabricated by VLSI Technology and delivered to Acorn on April 26, 1985. In a testament to the quality of the design, the ARM1 worked correctly on its very first power-up. The chip contained approximately 25,000 transistors — remarkably few even by 1985 standards (the contemporary Intel 80386 had approximately 275,000 transistors) — and ran at 6 MHz. The low transistor count was a direct consequence of the RISC philosophy: simple instructions meant simple decode logic, which meant fewer transistors, less power consumption, and less heat.

From Acorn to ARM Ltd (1986-1990)

The ARM2, introduced in 1986, added a 32-bit multiplier and ran at 8 MHz with approximately 30,000 transistors. It powered the Acorn Archimedes, launched in 1987 as the first commercial ARM-based personal computer. The Archimedes was marketed as the “first RISC-based home computer” and was notable for its performance relative to contemporary machines.

A pivotal moment came when Apple Computer became interested in the ARM architecture for a new handheld device project (which would become the Newton). Rather than simply licensing the chip, Apple proposed a joint venture. In November 1990, Advanced RISC Machines Ltd was founded as a partnership between Acorn Computers, Apple Computer, and VLSI Technology. Apple invested US$3 million for a 30% stake. The new company started with just 12 engineers and adopted a business model that would prove transformative: rather than manufacturing chips, ARM would license its processor designs as intellectual property to other companies, who would then manufacture their own ARM-based chips.

Growth and Market Dominance (1990s-2000s)

The fabless licensing model allowed ARM to scale rapidly across the semiconductor industry without the enormous capital costs of chip fabrication. The ARM7TDMI core, released around 1994, became one of the most widely licensed processor cores in history. It introduced the Thumb compressed instruction set (ARMv4T), which used 16-bit instructions to achieve approximately 65% of the code size of equivalent 32-bit ARM instructions — a crucial advantage for memory-constrained embedded systems.

The ARM7TDMI found its way into an enormous range of products. The Nokia 6110, one of the first ARM-powered GSM phones, demonstrated ARM’s suitability for mobile devices — a market that would eventually become ARM’s dominant application. Texas Instruments, Samsung, and other semiconductor companies began licensing ARM cores for their own system-on-chip designs.

ARM Holdings went public with an IPO on April 17, 1998, on both the London Stock Exchange and NASDAQ.

In 2004, ARM introduced the Cortex product line, which organized ARM cores into three families: Cortex-A for high-performance applications (smartphones, tablets, laptops), Cortex-R for real-time systems (automotive, storage controllers), and Cortex-M for microcontrollers (IoT, embedded sensors). This segmentation allowed ARM to serve vastly different markets with cores optimized for each use case.

Design Philosophy

ARM assembly reflects several core design principles that have remained consistent since the original 1983 design:

RISC simplicity: ARM is a load-store architecture — data processing instructions operate only on registers, and separate load (LDR) and store (STR) instructions transfer data between registers and memory. This clear separation simplifies the processor pipeline and enables efficient execution.

Fixed-width instructions: In 32-bit ARM state (A32), all instructions are exactly 32 bits wide and word-aligned. This simplifies instruction fetch and decode compared to variable-length instruction sets like x86, where instructions can range from 1 to 15 bytes.

Low power consumption: The original ARM design philosophy prioritized simplicity and low transistor count, which naturally resulted in low power consumption. This characteristic — a byproduct of good engineering rather than a primary design goal — became ARM’s most important competitive advantage as mobile computing emerged.

Orthogonal design: ARM instructions follow consistent patterns. Most data processing instructions can operate on any general-purpose register, use any addressing mode, and optionally update the condition flags. This regularity makes ARM assembly relatively readable and predictable compared to architectures with more specialized instructions.

Key Features

Register File

A32 (32-bit ARM state):

RegisterPurpose
R0-R12General-purpose registers
R13 (SP)Stack pointer
R14 (LR)Link register (holds return addresses)
R15 (PC)Program counter
CPSRCurrent Program Status Register (N, Z, C, V condition flags)

AArch64 (64-bit ARM state, ARMv8-A and later):

RegisterPurpose
X0-X3031 general-purpose 64-bit registers (also accessible as 32-bit W0-W30)
X30 (LR)Link register
SPStack pointer (not a general-purpose register)
PCProgram counter (not directly accessible as a general-purpose register)
NZCVCondition flags

Conditional Execution

One of the most distinctive features of classic ARM assembly (A32) is that almost every instruction can be conditionally executed. A 4-bit condition code field in each instruction allows the processor to skip execution based on the current state of the condition flags, without requiring a branch instruction:

1
2
3
4
@ A32: Compare and conditionally execute
CMP     R0, #10
ADDGT   R1, R1, #1      @ Add 1 to R1 only if R0 > 10
ADDLE   R1, R1, #2      @ Add 2 to R1 only if R0 <= 10

This reduces branch instructions and can improve pipeline efficiency by avoiding branch mispredictions. AArch64 largely replaced per-instruction conditional execution with conditional select and conditional compare instructions.

Barrel Shifter

ARM’s barrel shifter allows one operand to be shifted or rotated before it is used in an arithmetic or logical operation, all within a single instruction:

1
2
3
4
5
6
7
8
9
@ Multiply R1 by 5 using shift and add in one instruction
ADD     R0, R1, R1, LSL #2    @ R0 = R1 + (R1 << 2) = R1 * 5

@ Available shift operations:
@ LSL - Logical Shift Left
@ LSR - Logical Shift Right
@ ASR - Arithmetic Shift Right
@ ROR - Rotate Right
@ RRX - Rotate Right with Extend

The barrel shifter is one of the features that gives ARM assembly its characteristic elegance — operations that require multiple instructions on other architectures can often be expressed in a single ARM instruction.

Multiple Load/Store

ARM provides instructions to load or store multiple registers in a single instruction, which is particularly useful for function prologues, epilogues, and block data transfers:

1
2
3
4
5
@ Save registers on function entry
STMFD   SP!, {R4-R11, LR}     @ Push R4-R11 and LR onto the stack

@ Restore registers on function exit
LDMFD   SP!, {R4-R11, PC}     @ Pop and return (loading into PC returns)

Thumb and Thumb-2

The Thumb instruction set, introduced with ARMv4T, provides 16-bit compressed instructions that achieve approximately 65% of the code size of equivalent ARM code. This was designed for systems where memory is constrained and code density matters more than raw performance.

Thumb-2, introduced with ARMv6T2 and standard in ARMv7, mixes 16-bit and 32-bit instructions, providing nearly the performance of full ARM code with the density benefits of Thumb. In ARMv7 and later, Thumb-2 is the recommended execution state for most code.

A32 Hello World Example

A minimal ARM assembly “Hello, World!” targeting Linux (using the GAS assembler):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
@ hello.s - ARM Linux Hello World (GAS syntax)
.global _start

.section .data
message:
    .ascii "Hello, World!\n"
    msg_len = . - message

.section .text
_start:
    mov     r7, #4          @ sys_write
    mov     r0, #1          @ stdout
    ldr     r1, =message    @ pointer to message
    mov     r2, #msg_len    @ message length
    swi     #0              @ software interrupt (syscall)

    mov     r7, #1          @ sys_exit
    mov     r0, #0          @ exit code 0
    swi     #0

AArch64 Hello World Example

The same program for AArch64 (ARMv8-A, 64-bit):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
// hello.s - AArch64 Linux Hello World (GAS syntax)
.global _start

.section .data
message:
    .ascii "Hello, World!\n"
    msg_len = . - message

.section .text
_start:
    mov     x8, #64         // sys_write
    mov     x0, #1          // stdout
    ldr     x1, =message    // pointer to message
    mov     x2, #msg_len    // message length
    svc     #0              // supervisor call (syscall)

    mov     x8, #93         // sys_exit
    mov     x0, #0          // exit code 0
    svc     #0

Evolution

From 32-bit to 64-bit (ARMv8-A)

The most significant architectural change in ARM’s history came with ARMv8-A, announced in October 2011. Rather than simply extending the existing 32-bit instruction set, ARM designed AArch64 as a substantially new 64-bit instruction set alongside the existing 32-bit AArch32 state. Key changes in AArch64 include:

  • 31 general-purpose 64-bit registers (up from 16 in A32)
  • New instruction encoding — AArch64 is not binary-compatible with A32
  • Removal of per-instruction conditional execution in favor of conditional select and conditional compare instructions
  • PC is no longer a general-purpose register, simplifying certain hardware optimizations
  • Advanced SIMD (NEON) is mandatory, not optional

The first consumer device with a 64-bit ARM processor was the Apple iPhone 5s (2013), using Apple’s custom A7 chip. This caught much of the industry by surprise and accelerated the transition to 64-bit ARM across the mobile ecosystem.

The Apple Silicon Transition (2020)

Apple’s announcement at WWDC in June 2020 that it would transition its entire Mac product line from Intel x86-64 to ARM-based Apple Silicon was a watershed moment for the ARM architecture. The M1 chip, which shipped in November 2020, demonstrated that ARM processors could compete with x86 processors in desktop and laptop performance while using significantly less power, with many benchmarks showing ARM leading in performance-per-watt.

ARMv9 and Beyond (2021-Present)

ARMv9, announced in March 2021, is the current major architecture generation. It adds:

  • Scalable Vector Extension 2 (SVE2): Variable-length SIMD that allows the same code to scale across implementations with different vector widths
  • Confidential Compute Architecture (CCA): Hardware-based security for protecting data in use
  • Memory Tagging Extension (MTE): Hardware support for detecting memory safety bugs

The ARM architecture continues to receive annual updates with new extensions and refinements.

Current Relevance

ARM assembly remains critically important across multiple computing domains:

Mobile: ARM powers virtually all smartphones and tablets. While most mobile application development uses high-level languages, ARM assembly is essential for operating system kernels, device drivers, hardware abstraction layers, and performance-critical library routines.

Embedded and IoT: ARM Cortex-M microcontrollers dominate the embedded market. Firmware developers working on resource-constrained devices frequently write ARM assembly for startup code, interrupt handlers, and performance-critical routines where every cycle and byte matters.

Desktop and laptop: Apple’s transition to ARM-based Apple Silicon has brought ARM assembly into the mainstream desktop computing world. The M-series chips have demonstrated that ARM can compete with x86 across the full spectrum of desktop workloads.

Servers and cloud: AWS Graviton processors, Ampere Altra, and other ARM-based server chips are gaining adoption in cloud computing. The Fujitsu A64FX processor powered the Fugaku supercomputer, which held the top position on the TOP500 list from June 2020 to June 2022.

Education: ARM assembly, particularly on the Raspberry Pi, is widely used for teaching computer architecture and systems programming. The relative simplicity and regularity of the ARM instruction set makes it more accessible to students than x86.

Why It Matters

The ARM architecture represents one of the most remarkable success stories in computing history. From a small team of two engineers at a British home computer company, ARM grew to become the most widely deployed processor architecture in the world. The licensing model that ARM pioneered — designing processor IP and licensing it to chip manufacturers rather than fabricating chips directly — became a template for the semiconductor industry.

ARM assembly embodies the RISC philosophy in its most commercially successful form. The original design principles — simple instructions, a large register file, load-store architecture, and fixed-width instruction encoding — proved to be not just academically elegant but practically powerful. The low transistor count and power efficiency that resulted from these choices positioned ARM perfectly for the mobile revolution that began with feature phones in the mid-1990s and exploded with smartphones in the late 2000s.

The architecture’s evolution from 32-bit to 64-bit with ARMv8-A, and its expansion from embedded systems into desktops and servers with Apple Silicon and AWS Graviton, demonstrate that good architectural foundations can scale across vastly different computing domains. ARM assembly, once a niche skill for embedded developers and British computer enthusiasts, is now relevant to systems programmers working across the full spectrum of modern computing — from the smallest IoT sensors to the largest cloud data centers.

Timeline

1983
Sophie Wilson begins designing the ARM instruction set at Acorn Computers in October; Steve Furber leads the hardware design for what is initially called the Acorn RISC Machine
1985
First ARM1 silicon delivered from VLSI Technology on April 26 and works correctly on first power-up; the processor contains approximately 25,000 transistors and runs at 6 MHz
1987
Acorn Archimedes launched as the first commercial ARM-based personal computer, using the ARM2 processor at 8 MHz with approximately 30,000 transistors
1990
Advanced RISC Machines Ltd founded in November as a joint venture between Acorn Computers, Apple Computer, and VLSI Technology, with Apple investing US$3 million
1994
ARM7TDMI core released, becoming one of the most widely licensed ARM cores; the 16-bit Thumb compressed instruction set introduced with ARMv4T
1998
ARM Holdings PLC goes public with an IPO on the London Stock Exchange and NASDAQ on April 17
2004
ARM Cortex series introduced, establishing the Cortex-A (application), Cortex-R (real-time), and Cortex-M (microcontroller) product line structure
2011
ARMv8-A architecture announced in October, introducing AArch64 — a new 64-bit execution state with 31 general-purpose registers and a redesigned instruction encoding
2013
Apple A7 chip in the iPhone 5s becomes the first 64-bit ARM processor in a consumer smartphone, implementing ARMv8-A
2020
Apple announces transition from Intel to ARM-based Apple Silicon at WWDC in June; the M1 chip ships in November in MacBook Air, MacBook Pro 13-inch, and Mac Mini
2021
ARMv9 architecture announced in March, adding Scalable Vector Extension 2 (SVE2), Confidential Compute Architecture (CCA), and enhanced security features

Notable Uses & Legacy

Mobile Devices (iOS and Android)

ARM processors power virtually all smartphones and tablets worldwide, including Apple's A-series chips and Qualcomm Snapdragon, Samsung Exynos, and MediaTek Dimensity SoCs used in Android devices.

Apple Silicon (M-series chips)

Starting with the M1 in November 2020, Apple transitioned its entire Mac product line from Intel x86-64 to custom ARM-based Apple Silicon processors implementing ARMv8-A.

Embedded Systems and IoT

ARM Cortex-M microcontrollers are the dominant architecture for embedded systems, used in automotive ECUs, industrial controllers, medical devices, and billions of IoT sensors worldwide.

Game Boy Advance and Nintendo DS

The GBA used an ARM7TDMI at 16.78 MHz and the Nintendo DS contained both an ARM7TDMI and an ARM9 processor; game developers used ARM assembly for performance-critical code.

Raspberry Pi

All Raspberry Pi models use Broadcom SoCs with ARM cores, making the Raspberry Pi one of the most popular platforms for learning ARM assembly programming.

Cloud Computing (AWS Graviton)

Amazon's custom ARM-based Graviton processors power EC2 instances on AWS, reportedly offering competitive performance-per-watt for cloud server workloads.

Language Influence

Influenced By

MOS 6502 Berkeley RISC

Influenced

Thumb instruction set Thumb-2 AArch64

Running Today

Run examples using the official Docker image:

docker pull
Last updated: