ARM: Cortex-M3 Thumb-2 instruction set

From ScienceZero
Jump to: navigation, search

The instruction set of the ARM Cortex-M3 CPU used in the STM32 Microcontroller

Hardware registers

  • R0-R12 General purpose registers
  • R13 Used as stack pointer, is also called SP (can be used as a general purpose register with some restrictions)
  • R14 Used as link register to keep the return address for fast function calls, also called LR (can be used as a general purpose register)
  • R15 This is the program counter, also called PC

Register names

  • Rd Destination register
  • Rn First operand register (the operation is performed on this register using the second operand, so Rd = Rn - Rm)
  • Rm Second operand register
  • SP Stack pointer (R13)
  • LR Link register (R14)
  • PC Program counter (R15)
  • <reglist> means a list of registers like {R0, R3, R7-R10} (R7-R10 is the range R7, R8, R9, R10)

Immediate constants

  • imm<n> means a constant of n bits (a value that is fixed as assemble time and can not be changed during execution)
  • # tells the assembler that the following is an immediate constant


  • <x> means always x
  • <x|y> means either x or y

Optional parameters

  • {x} means x or nothing
  • {x|y} means either x or y or nothing

Condition flags

  • Some instructions will update the condition flags if <S> (set condition flags) is added to the instruction name
  • N Negative Bit 31 of the result
  • Z Zero 1 if all bits of the result are 0
  • C Carry Carry from the ALU adder, otherwise from the last bit shifted out of the barrel shifter
  • V Overflow Overflow from the ALU adder, 0x7fffffff + 0x7fffffff are two positive numbers that gives a negative result and sets the overflow flag
<cond>    Flag state        Integer ALU / Shifter                   Vector Floating Point coprocessor
 EQ      Z = 1             Equal (to zero)                         Equal
 NE      Z = 0             Not equal                               Not equal, or unordered
 CS / HS C = 1             Carry Set / Unsigned higher or same     Greater than or equal, or unordered
 CC / LO C = 0             Carry Clear / Unsigned lower            Less than                           
 MI      N = 1             Negative                                Less than
 PL      N = 0             Positive                                Greater than or equal, or unordered
 VS      V = 1             Overflow                                Unordered (at least one NaN operand)
 VC      V = 0             No overflow                             Not unordered
 HI      C = 1 and Z = 0   Unsigned higher                         Greater than, or unordered
 LS      C = 0 or Z = 1    Unsigned lower or same                  Less than or equal
 GE      N = V             Signed greater than or equal            Greater than or equal
 LT      N <> V            Signed less than                        Less than, or unordered
 GT      Z = 0 and N = V   Signed greater than                     Greater than
 LE      Z = 1 or N <> V   Signed less than or equal               Less than or equal, or unordered
 AL      Any               Always (normally omitted)               Always (normally omitted)  
 * If a two character condition code is added to the end of the instruction name,
   the assembler will generate the correct IT (If-Then) instructions
   E.g. ADDEQ r0,R0,#1 (execute the instruction if the zero flag is set)
        will be converted by the assembler to
        IT EQ
        ADD r0,R0,#1  
<Operand2> may be one of the following:
 #imm8<<imm5                           One byte shifted left by a constant to form a 32 bit value
 #(imm8 imm8 imm8 imm8)                The same byte copied 4 times to create a 32 bit value
 #(   0 imm8    0 imm8)                Same but two bytes are set to zero
 #(imm8    0 imm8    0)                Same with the other two bytes set to zero
 Rm                                    Normal register operation
 Rm, <LSL|LSR|ASR|ROR> #imm5           Register operation with constant shift   
 Rm, RRX                               Register operation with rotate right with extend   
 Update the condition flags after the instruction has executed
 If using this together with condition codes, it is in form of: ADDSEQ R0,R1,R2


 LSL Logical shift left                0xFFFFFF00 LSL #4 = 0xFFFFF000 (shifts in zero at the bottom)
 LSR Logical shift right               0xFFFFFF00 LSR #4 = 0x0FFFFFF0 (shifts in zero at the top)
 ASR Arithmetic shift right            0xFFFFFF00 ASR #4 = 0xFFFFFFF0 (shifts in the original bit 31 at the top)
 ROR Rotate right                      0x12345678 ROR #4 = 0x81234567
 RRX Rotate right with extend          Rotates the operand one bit to the right through the carry as a 33 bit value,
                                       Carry -> operand -> Carry

Thumb-2 instruction set

MOV{S} Rd, <Operand2>                   Move                          Rd = Operand2
MVN{S} Rd, <Operand2>                   Move not                      Rd = 0xFFFFFFFF EOR Operand2
MOV Rd, #<imm16>                        Move wide                     Rd = imm16
MOVT Rd, #<imm16>                       Move top                      Rd[31:16] = imm16,
                                                                      the constant is put in the upper 16 bits of Rd,
                                                                      the lower 16 bits are unaffected

ADD{S} Rd, Rn, <Operand2>               Add                           Rd = Rn + Operand2
ADD Rd, Rn, #<imm12>                    Add wide                      Rd = Rn + Imm12
ADC{S} Rd, Rn, <Operand2>               Add with carry                Rd = Rn + Operand2 + Carry

SUB{S} Rd, Rn, <Operand2>               Subtract                      Rd = Rn - <Operand 2>
SBC{S} Rd, Rn, <Operand2>               Subtract with carry           Rd = Rn – Operand2 - (1 - Carry)
SUB Rd, Rn, #<imm12>                    Subtract wide                 Rd = Rn - imm12
RSB{S} Rd, Rn, <Operand2>               Reverse subtract              Rd = <Operand 2> - Rn
RSC{S} Rd, Rn, <Operand2>               Reverse subtract with carry   Rd = Operand2 – Rn – (1 - Carry)

MUL{S} Rd, Rm, Rs                       Multiply                      Rd = Rn * Rm            Return 32 least significant bit
MLA Rd, Rm, Rs, Rn                      Multiply and accumulate       Rd = (Rn + (Rm * Rs))   Return 32 least significant bit
MLS Rd, Rm, Rs, Rn                      Multiply and subtract         Rd = (Rn - (Rm * Rs))   Return 32 least significant bit
UMULL RdLo, RdHi, Rm, Rs                Multiply unsigned long, 64 bit result
UMLAL RdLo, RdHi, Rm, Rs                Multiply unsigned accumulate long, 64 bit result

SDIV Rd, Rn, Rm                         Signed division               0x80000000 / 0xFFFFFFFF = 0x80000000, Rn / 0 = 0
UDIV Rd, Rn, Rm                         Unsigned division             Rn / 0 = 0

ASR{S} Rd, Rm, <Rs|#imm5>               Arithmetic shift right, canonical form of MOV{S} Rd, Rm, ASR <Rs|#imm5>
LSL{S} Rd, Rm, <Rs|#imm5>               Logical shift left
LSR{S} Rd, Rm, <Rs|#imm5>               Logical shift right
ROR{S} Rd, Rm, <Rs|#imm5>               Rotate right
RRX{S} Rd, Rm                           Rotate right with extent, uses Carry as a 33rd bit

CLZ   Rd, Rm                            Count leading zeros
RBIT  Rd, Rm                            Reverse bits in register, so bit 0 becomes bit 31
REV   Rd, Rm                            Byte-Reverse Word, reverses the byte order in a 32-bit register
REV16 Rd, Rm                            Byte-Reverse Packed Halfword, reverses the byte order in each 16-bit halfword of a
                                        32-bit register
REVSH Rd, Rm                            Byte-Reverse Signed Halfword, reverses the byte order in the lower 16-bit of a
                                        32-bit register, and sign extends to 32 bit
UXTB  Rd, Rm{, <ROR #><0|8|16|24>}      Unsigned Extend Byte, extracts an 8-bit value from a register, zero extends it to 32 bit.
UXTH  Rd, Rm{, <ROR #><0|8|16|24>}      Unsigned Extend Halfword, extracts a 16-bit value from a register, zero extends it to 32 bit

CMP Rn, <Operand2>                      Does the same as SUBS Rd, Rn, <Operand2> but the result is not written to Rd,
                                        only the condition flags are updated 
CMN Rn, <Operand2>                      Rn + <Operand2>
TST Rn, <Operand2>                      Rn AND <Operand2>
TEQ Rn, <Operand2>                      Rn EOR <Operand2>

AND{S} Rd, Rn, <Operand2>               Bitwise AND, Rd = Rn AND <Operand2>
ORR{S} Rd, Rn, <Operand2>               Bitwise OR, Rd = Rn OR <Operand2>
EOR{S} Rd, Rn, <Operand2>               Bitwise Exclusive-OR. Rd = Rn EOR <Operand2>
ORN{S} Rd, Rn, <Operand2>               Or not, Rd = Rn OR NOT <Operand2>
BIC{S} Rd, Rn, <Operand2>               Bit clear, Rd = Rn AND NOT <Operand2>

BFC  Rd, #<lsb>, #<width>               Bit field clear
BFI  Rd, Rn, #<lsb>, #<width>           Bit field insert
SBFX Rd, Rn, #<lsb>, #<width>           Signed bit field extract
UBFX Rd, Rn, #<lsb>, #<width>           Unsigned bit field extract

<Address> can be one of the following   Example                     Action     
 [Rn {, #<-imm8|+imm12>}]               LDR R0, [R1, #8]            R0 = [R1 + 8]           
 [Rn {, #<+-imm8>}]!                    LDR R0, [R1, #8]!           R1 = R1 + 8, R0 = [R1]  
 [Rn], #<+-imm8>                        LDR R0, [R1], #4            R0 = [R1], R1 = R1 + 4   
 [Rn, Rm {, <LSL #0-3>}]                STR R0, [R1, R2, LSL #2]    R0 = [R1 + (R2 * 4)]

LDR Rd, <Address>                       Load 32 bit word from memory
LDRH Rd, <Address>                      Load 16 bit half-word from memory
LDRSH Rd, <Address>                     Load signed 16 bit half-word from memory
LDRB Rd, <Address>                      Load 8 bit byte from memory
LDRSB Rd, <Address>                     Load signed 8 bit byte from memory

STR Rd, <Address>                       Store 32 bit word to memory
STRH Rd, <Address>
STRB Rd, <Address>

<AddressDual> can be one of the following
 [<Rn>{, #+/-<imm8>}]
 [<Rn>], #+/-<imm8>
 [<Rn>, #+/-<imm8>]!

LDRD<c> , <Rt2>, <label>            Load register dual, literal (range -1020 to 1020.)
LDRD<c> <Rt>, <Rt2>, <AddressDual>      Load register dual
STRD<c> <Rt>, <Rt2>, <AddressDual>      Store register dual

LDM{IA|IB|DA|DB} Rn{!}, <reglist>       Load/store multiple, can transfer any list of registers,
                                        ! will update Rn to point to the  address after/before the last register
STM{IA|IB|DA|DB} Rn{!}, <reglist>       IA = increment after (default), IB = increment before,
                                        DA = decrement after, DB = decrement before (Action on address)

IT{pattern} {cond}                      If-then, sets the execution conditions for up to 4 following instructions 
                                        <pattern> can be any combination of up to three T(then) and E(else) letters,
                                        the first instruction following IT is always cond (T)
                                        Instructions that can modify the program counter must be last in an IT block

B <label>                               Unconditional jump
BL <label>                              R14 = address of next instruction, then jump to label
BX Rm                                   Branch and exchange (instruction sets), normal branch on Thumb-2, 
                                        use it to return from a function like BX LR 
BLX Rm                                  R14 = address of next instruction, then jump to Rm
CB{N}Z Rn,<label>                       Compare branch, branch forward if a register is (not) zero
TBB [Rn, Rm]                            Table branch, loads a byte from (Rn + Rm) and adds twice its value to the program counter
TBH [Rn, Rm, LSL #1]                    Loads a half word (16 bit) form (Rn + (Rm << 1)) and adds twice its value to the PC
PUSH <reglist>                          Push registers on the stack pointed to by SP, decrement address before each store,
                                        lowest-numbered register to the lowest memory address
POP  <reglist>                          Restore them again,increment address after each load

MRS Rd, <PSR>                           Rd = PSR (processor status register)
MSR <PSR>_<fields>, Rm                  PSR = Rm (selected bytes only)
MSR <PSR>_<fields>, #<imm8m>            PSR = immed_8r (selected bytes only)

The stack

A stack is a last in first out data structure, it is used to store temporary variables and data. It grows from high to low memory address, SP (R13) points to the last piece of data written. A set of registers will be transferred with the lowest numbered registers at the lowest addresses. Use the PUSH and POP instructions to transfer any set of registers containing R0-R12, LR and PC.

If SP contains 0x8000 and we execute the instruction PUSH {R0,R1,R7} the result will be

 0x8000 .. <- Original address in SP 
 0x7ffc R7
 0x7ff8 R1
 0x7ff4 R0 <- SP points here now

If we now execute POP {R10-R12}

 0x8000 ..        <- SP points here now 
 0x7ffc R7 -> R12
 0x7ff8 R1 -> R11
 0x7ff4 R0 -> R10 <- Original address in SP

C language calling convention

Parameters are passed and returned in R0-R3 A double-word sized type is passed in two consecutive registers. A 128-bit containerized vector is passed in four consecutive registers. The content of the registers is as if the value had been loaded from memory with a single LDM instruction A subroutine must preserve the contents of the registers r4-r8, r10, r11 and SP (and r9 in PCS variants that designate r9 as v6). Return by doing BX LR

Thumb-2 variable instruction length

It is important to have at least half the instructions encoded as 16 bit to get maximum performance from flash memory. IT instructions can also be paired for free with 16 bit instructions.

The general rules for generating the 16 bit form of the instructions

  • Use registers in the range R0-R7
  • Set the condition flags unless the instruction is conditional wherever possible
  • Use immediate constants in the range 0-7 or 0-255

Instructions encoded in 16 bit when using registers R0-R7

 ADR Rd, <label> (range 0 to 1020)
 <ADDS|SUBS|MOVS> Rd, #imm8
 <ADDS|SUBS> Rd, Rn, #imm3
 <ADDS|SUBS> Rd, Rn, Rm
 RSBS Rd, Rn, #0
 <ASRS|LSRS|LSLS> Rn, Rm, #imm5
 CMP Rn, #imm8
 <CMP|CMN|TST> Rn, Rm (Rm can be any register for CMP)
 <LDM|STM> Rn!, <registers>
 <LDR|STR>{H|B} Rt, [Rn{, #imm5}]
 <LDR|STR>{H|B} Rt, [Rn, Rm ]
 LDRS<H|B> Rt, [Rn, Rm]
 LDR Rt, <label> (0-1020)
 <PUSH|POP> <registers>
 IT{x{y{z}}} <cond>
 CB{N}Z Rn, <label> (range 0 to 126)
 B<cond>    <label> (range -256 to 254) 
 B          <label> (range -2048 to 2046)

Instructions encoded in 16 bit using registers R0-R15

 MOV Rd, Rm
 ADD Rd, Rm
 BX  Rm

How to enumerate the legal immediate constants for <Operand2>

'abcdnnnnnnnn' is a 12 bit bitfield to be expanded


 if 'ab' = '00' 
   case 'cd'
     when '00'
       imm32 = 'nnnnnnnn' ( Always encode 0 like this )
     when '01'
       imm32 = '00000000 nnnnnnnn 00000000 nnnnnnnn'
     when '10'
       imm32 = 'nnnnnnnn 00000000 nnnnnnnn 00000000'
     when '11'
       imm32 = 'nnnnnnnn nnnnnnnn nnnnnnnn nnnnnnnn'
   imm32 = ROR('1nnnnnnn', 'abcdn')

Example code

Condition flags

 It is important to make full use of the condition flags to write efficient code.
 This code will set R0 to 0 or -1 depending on if R1 + R2 is 0 or not.
           ADD      R0,R1,R2
           CMP      R0,#0
           BEQ      zero
           MOV      R0,#-1
   zero    ...
 The optimised code using the condition flags becomes easier to read, more compact and faster.	
           ADDS     R0,R1,R2
           MOVNE    R0,#-1
 A branch is better if more than a few lines of code is to be skipped.
           ADDS     R0,R1,R2
           BEQ      zero
           ... Block of code to be skipped ...	
   zero    ...


 The IT instruction will make 1 to 4 following instructions conditional. The letter T specifies <cond> and E specifies inverse of <cond>.
 The first letter of the pattern is always T so the first conditional instruction will always have the condition <cond>. 
           IT       EQ        Read this as If EQual Then ADD R0,R0,#1
           ADD      R0,R0,#1  <- This will only be executed if the Z condition flag is 1
           ITE      EQ        Read this as If EQual Then ADD R0,R0,#1 Else ADD R1,R1,#1
           ADD      R0,R0,#1  <- This will only be executed if the Z condition flag is 1
           ADD      R1,R1,#1  <- This will only be executed if the Z condition flag is 0
 It is easier to let the assembler generate IT instructions automatically, just append the condition to the end of the instruction name.
 The assembler will enforce this form for the code affected by the IT instruction anyway.
           ADDEQ    R0,R0,#1
           ADDNE    R1,R1,#1

Table branch

 The table branch byte instruction loads a byte from (Rn + Rm) and adds twice its value to the program counter.
          TBB      [PC,R0]
   table  dcb      (case0 - table) >> 1  We divide by 2 here because the instruction will multiply by 2
          dcb      (case1 - table) >> 1 
          dcb      (case2 - table) >> 1
          align                          Align here because instructions must start at an even address
   case0  nop                            If R0 = 0 we arrive here  
   case1  nop                            If R0 = 1 we arrive here
   case2  nop                            If R0 = 2 we arrive here

Finding the span of the leftmost and rightmost ones

         CLZ      R1,R0                 R1 now contains the number of zeros to the left of the leftmost 1 in R0
         RBIT     R0,R0                 R0 is now mirrored
         CLZ      R0,R0                 R0 now contains the number of zeros to the right of the rightmost 1 in the original value
         ADD      R0,R1                 R0 now contains the number of bits that are not part of the span
         RSB      R0,R0,#32             R0 now contains the span (R0 = 32 - R0)