ARM cores are found in billions of electronic devices viz. Mobiles, TVs, Set top Boxes, cameras, portable gaming machines etc. This article lists the basic and advanced features of many versions of ARM cores and describes how an OS can use its features.
This article doesn't provide a comprehensive study. I'm planning to describe that in a number of subsequent articles.
ARM (Generic)
- 37 registers
- 30 general purpose +1 cpsr_user + 5 spsr(_irq, _fiq, _svc, _undef, _abort) + 1 pc
- States – ARM, Thumb, Jazzle
- 7 Cpu modes
o User
o SVC
o FIQ
o IRQ
o Undef
o Abort
o System
- 7 Exceptions
o Reset
o Data abort - to return SUBS pc, lr, #8 @execute stage
o FIQ @decode stage
o IRQ @decode stage
o Prefetch abort - to return SUBS pc, lr, #4 @decode stage
o SWI - to return MOVS pc, lr
o Undef
ARM7
- 3 stage pipeline
o Fetch, Decode, Execute
- Von Neumann architecture, Single Instrcution / Data bus only one is active at a time
ARM9
- 5 stage pipeline
o Fetch, decode, execute, memory, writeback
o Fetch – fetch instruction
o Decode – decode instr, read register
o Execute instruction – ALU, shift
o Memory –memory access
o Writeback – register write/ cache write
- Instr / data TCM interface support
- Harvard architecture. Separate instr/ data bus
S – Synthesizable
- Instruction / data cache can be configured
ARM10E
- 6 stage pipeline
o Fetch, issue, decode, execute, memory, write
o Fetch – instr fetch
o Issue – arm/ thumb ins decode
o Decode – register read
o Exec – ALU, shift
o Memory – mem access
o Write – reg write
- Static branch prediction at fetch stage
o Conditional backward is taken
- Fetch stage can fetch 2 instr from inst cache
ARM11 (ARMv6 architecture)
- 8 stage pipeline
o Predicted fetch1
o Predicted fetch2
o Decode
o issue
o ALU / MAC / Data Cache access
o Writeback
Cache
- Data copy a line at a time
- 4 / 64 way set associative cache
- PA tag RAM stores the PA for the cache line
MMU
- VA to PA translation
- TLB stores PA for a VA
OS – MMU
ARMv5 picture (virtually indexed virtually tagged Cache)
1 VA having 2 PA for 2 different processes
In context switch
- The TLB needs to be flushed
- The VA and data must be removed from cache
2 VA having 1 PA for 2 different processes (shared memory)
In context switch
- Different VA are present in cache, confusing effect
- Cache must be flushed
Hence process contex switch is slower than thread switch
ARMv6 picture (virtually indexed physically tagged Cache)
MMU splitted into 2 parts of address map
- TTBR0 and TTBR1
- 8-bit ASID used besides the TLB so TLB flushing is not required during a Context switch
- Page tables has “never execute” bit, that protects from buffer overrun attacks etc.
Symbian EKA2 uses this in multiple memory model
- One mapping for kernel (2GB)
- Remaing for user processes (2GB)
- ASID protects TLB flush during context switch(hence 256 max concurrent processes in Symbian is allowed)
- Physically tagged protects cache flush during context switch
No comments:
Post a Comment