Interaction of core with the underlying kernel

Core is the root of the component tree. It is initialized and started directly by the underlying kernel and has two purposes. First, it makes the low-level physical resources of the machine available to other components in the form of services. These resources are physical memory, processing time, device resources, initial boot modules, and protection mechanisms (such as the MMU, IOMMU, and virtualization extensions). It thereby hides the peculiarities of the used kernel behind an API that is uniform across all kernels supported by Genode. Core's second purpose is the creation of the init component by using its own services and following the steps described in Section Component creation.

Even though core is executed in user mode, its role as the root of the component tree makes it as critical as the kernel. It just happens to be executed in a different processor mode. Whereas regular components solely interact with the kernel when performing inter-component communication, core interplays with the kernel more intensely. The following subsections go into detail about this interplay.

The description tries to be general across the various kernels supported by Genode. Note, however, that a particular kernel may deviate from the general description.

System-image assembly

A Genode-based system consists of potentially many boot modules. But boot loaders - in particular on ARM platforms - usually support the loading of a single system image only. To unify the boot procedure across kernels and CPU architectures, on all kernels except Linux, Genode merges boot modules together with the core component into a single image.

The core component is actually built as a library. The library description file is specific for each platform and located at lib/mk/spec/<pf>/core.mk where <pf> corresponds to the hardware platform used. It includes the platform-agnostic lib/mk/core.inc file. The library contains everything core needs (including the C++ runtime and the core code) except the following symbols:

_boot_modules_headers_begin and _boot_modules_headers_end

Between those symbols, core expects an array of boot-module header structures. A boot-module header contains the name, core-local address, and size of a boot module. This meta data is used by core's initialization code in src/core/platform.cc to populate the ROM service with modules.

_boot_modules_binaries_begin and _boot_modules_binaries_end

Between those symbols, core expects the actual module data. This range is outside the core image (beyond _prog_img_end). In contrast to the boot-module headers, the modules reside in a separate section that remains unmapped within core's virtual address space. Only when access to a boot module is required by core (i.e., the ELF binary of init during the creation of the init component), core makes the module visible within its virtual address space.

Making the boot modules invisible to core has two benefits. The integrity of the boot modules does not depend on core. Even in the presence of a bug in core, the boot modules cannot be accidentally overwritten. Second, no page-table entries are needed to map the modules into the virtual address space of core. This is particularly beneficial when using large boot modules such as a complete disk image. If incorporated into the core image, page-table entries for the entire disk image would need to be allocated at the initialization time of core.

These symbols are defined in an assembly file called boot_modules.s. When building core stand-alone, the final linking stage combines the core library with the dummy boot_modules.s file located at src/core/boot_modules.s. But when using the run tool (Section Run tool) to integrate a bootable system image, the run tool dynamically generates a version of boot_modules.s depending on the boot modules listed in the run script and repeats the final linking stage of core by combining the core library with the generated boot_modules.s file. The generated file is placed at <build-dir>/var/run/<scenario>/ and incorporates the boot modules using the assembler's .incbin directive. The result of the final linking stage is an executable ELF binary that contains both core and the boot modules.

Bootstrapping and allocator setup

At boot time, the kernel passes information about the physical resources and the initial system state to core. Even though the mechanism and format of this information varies from kernel to kernel, it generally covers the following aspects:

A list of free physical memory ranges
A list of the physical memory locations of the boot modules along with their respective names
The number of available CPUs
All information needed to enable the initial thread to perform kernel operations

Core's allocators

Core's kernel-specific platform initialization code (core/platform.cc) uses this information to initialize the allocators used for keeping track of physical resources. Those allocators are:

RAM allocator: contains the ranges of the available physical memory
I/O memory allocator: contains the physical address ranges of unused memory-mapped I/O resources. In general, all ranges not initially present in the RAM allocator are considered to be I/O memory.
I/O port allocator: contains the I/O ports on x86-based platforms that are currently not in use. This allocator is initialized with the entire I/O port range of 0 to 0xffff.
IRQ allocator: contains the IRQs that are associated with IRQ sessions. This allocator is initialized with the entirety of the available IRQ numbers.
Core-region allocator: contains the virtual memory regions of core that are not in use.

The RAM allocator and core-region allocator are subsumed in the so-called core-memory allocator. In addition to aggregating both allocators, the core-memory allocator allows for the allocation of core-local virtual-memory regions that can be used for holding core-local objects. Each region allocated from the core-memory allocator has to satisfy three conditions:

It must be backed by a physical memory range (as allocated from the RAM allocator)
It must have assigned a core-local virtual memory range (as allocated from the core-region allocator)
The physical-memory range must have the same size as the virtual-memory range
The virtual memory range must be mapped to the physical memory range using the MMU

Internally, the core-memory allocator maintains a so-called mapped-memory allocator that contains ranges of ready-to-use core-local memory. If a new allocation exceeds the available capacity, the core-memory allocator expands its capacity by allocating a new physical memory region from the RAM allocator, allocating a new core-virtual memory region from the core-region allocator, and installing a mapping from the virtual region to the physical region.

All memory allocations mentioned above are performed at the granularity of physical pages, i.e., 4 KiB.

The core-memory allocator is expanded on demand but never shrunk. This makes it unsuitable for allocating objects on behalf of core's clients because allocations could not be reverted when closing a session. It is solely used for dynamic memory allocations at startup (e.g., the memory needed for keeping the information about the boot modules), and for keeping meta data for the allocators themselves.

Kernel-object creation

Kernel objects are objects maintained within the kernel and used by the kernel. The exact notion of what a kernel object represents depends on the actual kernel as the various kernels differ with respect to the abstractions they provide. Typical kernel objects are threads and protection domains. Some kernels have kernel objects for memory mappings while others provide page tables as kernel objects. Whereas some kernels represent scheduling parameters as distinct kernel objects, others subsume scheduling parameters to threads. What all kernel objects have in common, though, is that they consume kernel memory. Most kernels of the L4 family preserve a fixed pool of memory for the allocation of kernel objects.

If an arbitrary component were able to perform a kernel operation that triggers the creation of a kernel object, the memory consumption of the kernel would depend on the good behavior of all components. A misbehaving component may exhaust the kernel memory.

To counter this problem, on Genode, only core triggers the creation of kernel objects and thereby guards the consumption of kernel memory. Note, however, that not all kernels are able to prevent the creation of kernel objects outside of core.

Page-fault handling

Each time a thread within the Genode system triggers a page fault, the kernel reflects the page fault along with the fault information as a message to the user-level page-fault handler residing in core. The fault information comprises the identity and instruction pointer of the faulted thread, the page-fault address, and the fault type (read, write, execute). The page-fault handler represents each thread as a so-called pager object, which encapsulates the subset of the thread's interface that is needed to handle page faults. For handling the page fault, the page-fault handler first looks up the pager object that belongs to the faulting thread's identity, analogously to how an RPC entrypoint looks up the RPC object for an incoming RPC request. Given the pager object, the fault is handled by calling the pager function with the fault information as argument. This function is implemented by the so-called Rm_client (repos/base/src/core/region_map_component.cc), which represents the association of the pager object with its virtual address space (region map). Given the context information about the region map of the thread's PD, the pager function looks up the region within the region map, on which the page fault occurred. The lookup results in one of the following three cases:

Region is populated with a dataspace: If a dataspace is attached at the fault address, the backing store of the dataspace is determined. Depending on the kernel, the backing store may be a physical page, a core-local page, or another reference to a physical memory page. The pager function then installs a memory mapping from the virtual page where the fault occurred to the corresponding part of the backing store.
Region is populated with a managed dataspace: If the fault occurred within a region where a managed dataspace is attached, the fault handling is forwarded to the region map that represents the managed dataspace.
Region is empty: If no dataspace could be found at the fault address, the fault cannot be resolved. In this case, core submits an region-map-fault signal to the region map where the fault occurred. This way, the region-map client has the chance to detect and possibly respond to the fault. Once the signal handler receives a fault signal, it is able to query the fault address from the region map. As a response to the fault, the region-map client may attach a dataspace at this address. This attach operation, in turn, will prompt core to wake up the thread (or multiple threads) that faulted within the attached region. Unless a dataspace is attached at the page-fault address, the faulting thread remains blocked. If no signal handler for region-map faults is registered for the region map, core prints a diagnostic message and blocks the faulting thread forever.

To optimize the TLB footprint and the use of kernel memory, region maps do not merely operate at the granularity of memory pages but on address ranges whose size and alignment are arbitrary power-of-two values (at least as large as the size of the smallest physical page). The source and destinations of memory mappings may span many pages. This way, depending on the kernel and the architecture, multiple pages may be mapped at once, or large page-table mappings can be used.