Introduction to Muticore software design explained

Introduction

Multi-core programming is the branch of computer science related to dealing with a processor where more one CPU core is available on the same processor chip. They are interacting together to give you powerful processing power.

For multi-core system, there are several taxonomies for the system based on the building hardware cores for the system and the running software over these cores. In this post, we will describe two major different taxonomies, from point of hardware (Heterogeneous vs Homogeneous systems) and from point of view of the running software over the cores (Symmetric vs Asymmetric Multiprocessing systems).

Figure 1. Generic multicore processor design, explains the general concept of Multicore processor

Heterogeneous vs Homogeneous

Heterogeneous and Homogeneous cores stands for the types of the cores found on the processor chip
In Heterogeneous cores, the cores of the chip are different from each other, while in Homogeneous cores, the cores are identical to each other.

Homogeneous systems

The following example represents a mutli-core system where all the cores are Homogeneous chip, this is Freescale i.MX6 Quad Processors - Quad Core, High Performance, Advanced 3D Graphics, HD Video, Advanced Multimedia, ARM® Cortex®-A9 Core 

Figure 2. Freescale i.MX6 Quad Processor
It is composed of 4 cores ARM Cortex-A9. In this system, boot core0 starts and then starts the other cores, and all of them have the same CPU context.

Heterogeneous systems

The following is an example of Multi-core Heterogeneous chip, from Texas Instruments, this is TCI6630K2L, Multicore DSP+ARM KeyStone II System-on-Chip, 

Figure 3. TCI6630K2L SoC, Multicore DSP+ARM KeyStone

It is composed of 2 cores ARM Cortex-A15 and another 4 cores DSP C66x. In this system, the ARM core will start as boot core0, and then when the software on the ARM is ready it will start the DSP core to start execution of executable written specifically for the DSP using the instructions set of the DSP.

Notice here that each groups of cores (like the ARM cores or the DSP cores) are considered as homogeneous system within the group.

Homogeneous Multicore hardware communication

On the level of the hardware, the SoC designers usually provide several levels of communication channels between the cores. We can list here the available techniques on the ARM Cortex-A9 as generic hint for the communication techniques. Notice that the described techniques here have similar implementations on other hardware platforms like intel. We will not discuss ARM specific techniques for the sake of information capitalization.

Snoop Control Unit (SCU)

The goal of this unit is to provide coherency between the caches of the ARM cores. When each core is dealing only with its internal L1 cache, it will be isolated from the L2 shared cache of the cores, and moreover isolated from the main memory which is the main interface between the cores and the external world. The SCU will enable the cores to synchronize their caches together along with the L2 cache and the memory at certain moments of time according to a well defined hardware protocol called the Snoop Protocol. By synchronizing caches together, threads running on these cores can exchange information through shared memory without bothering the software designer about caches synchronizations. Since the software designer will be sure that any shared variable changed on the L1 cache of one core will be reflected into the other cores. 

Generic Interrupt Controller (GIC)

The goal of the GIC is to handle interrupts coming from the system to the Multi-core ARM Cortex-A9. This unit can operate on one or more ARM cores. The idea is that the cores will configure GIC for which interrupts they are interested to listen to. When an interrupt occurs, the interested cores will be signaled for a generic interrupt signal. This signal is still ambiguous to the core with no information about which peripheral or internal device caused the interrupt. Then each interested core can read the status from GIC to know about the actual parameters for the interrupts which have occurred, like interrupt source, interrupt parameters, etc.

GIC supports 3 types of interrupts to be managed

Software Generated Interrupt (SGI) This interrupt is generated explicitly by software by writing to a dedicated distributor register, the Software Generated Interrupt Register. It is most commonly used for inter-core communication. SGIs can be targeted at all, or at a selected group of cores in the system. Interrupt numbers 0-15 are reserved for this. The software manages the exact interrupt number used for communication.
Private Peripheral Interrupt (PPI) This interrupt is generated by a peripheral that is private to an individual core (like MMU, Core Timer, etc..). Interrupt numbers 16-31 are reserved for this. PPIs identify interrupt sources private to the core, and are independent of the same source on another core, for example, per-core timer.
Shared Peripheral Interrupt (SPI) This interrupt is generated by a peripheral that the Interrupt Controller can route to more than one core. Interrupt numbers 32-1020 are used for this. SPIs are used to signal interrupts from various peripherals accessible across the whole system.

Heterogeneous Multicore hardware communication

On heterogeneous systems, communication between cores is done by SoC specific interfaces. This could be done through shared memory between the cores, or interrupts control unit specific for communication between cores. What is important here is to know that Heterogeneous communication is much complex and SoC specific than the Homogeneous cores.

SMP vs AMP

Overview

SMP stands for Symmetric Multiprocessing.
AMP stands for Asymmetric Multiprocessing.

SMP and AMP are terms related to how the code threads are executing on the cores. When we talk about SMP and AMP we are looking to the system from the point of view of the processing done on the cores. You can consider that SMP and AMP is more software architecture related than hardware architecture related. This means how the running software utilizes the underlying hardware. In the SMP case, the software is aware that the underlying cores are symmetric and so the software uses the same instruction set for the all the cores. This means that the running threads are all identical to each other from the point of view of the instruction set used to write the threads as well as the CPU context to be saved per each thread.

On the other hand, in the AMP case, each core (or a group of cores together) is/are running an isolated group of threads. This means that each threads group may be written in a specific instruction set for the underlying core(s). Each group may have different context data to be saved about the underlying core(s).

Mixed SMP and AMP

In many cases the underlying hardware may contain several groups of identical cores, as with the case of the of the TCI6630K2L SoC. In this case, it is possible that the system be AMP and SMP at the same time. It will be SMP on the level of each identical cores group, and will be AMP on the level of the SoC.

Impact on the software design

Selection of the build system

As the generated machine code will be impacted by the underlying architecture, the designer has to take into account the hardware and software architecture to be used for development. For example, in an SMP system, most of time the generated code will be linked in one executable while in AMP system each core will have its own executable to run. Same for the Homogeneous and Heterogeneous systems where in Homogeneous systems same compiler tool chain can be used for all the SMP and AMP software while in Heterogeneous systems a specific tool chain or specific compilation parameters should be used to generate machine code suitable for each specific underlying hardware. The designer has to pay attention for the capabilities of the used tool chain to generate executable for the selected hardware / software architecture.

Selection of the operating system

Modern software design will use operating system to manage the hardware and software system resources. Selection of a suitable operating for the expected software architecture could be a tedious and difficult task if no prior requirements specifications set for the required system. The designer has to take into account whether the operating system will run in SMP or AMP mode. If the system is Heterogeneous, can the same operating systems be used for the different cores architectures or it is necessary to have specific operating system per each cores architecture. Also modern hardware systems provide Memory Management Unit (MMU) for memory protection, memory address virtualization, and paging; the designer has to decide if the operating system will support MMU and how memory management will be done on AMP and SMP software architectures. Other hardware factors like DMA, Hardware Cryptography units, etc.. will play significant factor, for example for Cryptography, hardware units are available like ARM TrustZone, in SMP system it will be simple to use, however in AMP it will be difficult to use specially if multiple cores are requiring to use it. All these points and more should be addressed during specification of requirements for the operating system to select the most suitable one. Also selection of the operating system will be impacted by the selection of the build system. 

 To Virtualize or not to Virtualize?, that is a big question

Virtualization means that the underlying hardware could be completely abstracted from the running software. In other words, Virtualization software (Hypervisor or a.k.a. Virtual machine) in the ideal case will allow machine code written for a certain CPU architecture to run on a very different hardware architecture, take QEMU as an example for that (http://wiki.qemu.org/Main_Page). It can also enable the running software to assume that it is running on different number of cores from the actual underlying hardware supported number of cores. In this case, the software designer should consider wisely whether or not to use Virtualization. For some multi-core hardware architectures like ARM Cortex-A15 it supports hardware virtualization where the hardware itself will support the Hypervisor to do its virtualization functions, in this case it is called hardware enabled vritualization and its performance is expected to be very optimum. However in most embedded systems hardware enabled virtualization is not available and in this case the performance of the Hypervisor might not be acceptable. In all cases, Hypervisor can allow a system written for SMP for example run on all cores of a Heterogeneous system. That is why this is a big question the designer has to ask for themselves before making a decision, what will be sacrificed versus what will be gained and what the underlying hardware supports and what is not supported. Also what my Hypervisor could support versus what what I am expecting. For example, using an operating system like Linux on Hypervisor like QEMU is very commonly used with architecture like ARM, however, for architecture like Zilog Z80 (8-bit microcontroller), QEMU supports that architecture using an external patch and can emulate your PC to run Zilog Z80 code, but doesn't mean that Linux can run on Zilog Z80. So a Hypervisor is not necessarily the best option for running the software on certain architectures. 


And as usual, thanks for reading :)

Comments

Popular Posts