Core - the root of the component tree - Genode OS Framework Foundations

Core - the root of the component tree

Core is the first user-level component, which is directly created by the kernel. It thereby represents the root of the component tree. It has access to the raw physical resources such as memory, CPUs, memory-mapped devices, interrupts, I/O ports, and boot modules. Core exposes those low-level resources as services so that they can be used by other components. For example, physical memory is made available as so-called RAM dataspaces allocated from core's PD service, interrupts are represented by the IRQ service, and CPUs are represented by the CPU service. In order to access a resource, a component has to establish a session to the corresponding service. Thereby the access to physical resources is subjected to the routing of session requests as explained in Section Services and sessions. Moreover, the resource-trading concept described in Section Trading memory between clients and servers applies to core services in the same way as for any other service.

In addition to making hardware resources available as services, core provides all prerequisites to bootstrap the component tree. These prerequisites comprise services for creating protection domains, for managing address-space layouts, and for creating object identities.

Core is almost free from policy. There are no configuration options. The only policy of core is the startup of the init component, to which core grants all available resources. Init, in turn, uses those resources to spawn further components according to its configuration.

Section Dataspaces introduces dataspaces as containers of memory or memory-like resources. Dataspaces form the foundation for most of the core services described in the subsequent sections. The section is followed by the introduction of each individual service provided by core. In the following, a component that has established a session to such a service is called client. For example, a component that obtained a session to core's CPU service is a CPU client.

Dataspaces

A dataspace is an RPC object1 that resides in core and represents a contiguous physical address-space region with an arbitrary size. Its base address and size are subjected to the granularity of physical pages as dictated by the memory-management unit (MMU) hardware. Typically the granularity is 4 KiB.

Dataspaces are created and managed via core's services. Because each dataspace is a distinct RPC object, the authority over the contained physical address range is represented by a capability and can thereby be delegated between components. Each component in possession of a dataspace capability can make the dataspace content visible in its local address space. Hence, by the means of delegating dataspace capabilities, components can establish shared memory.

On Genode, only core deals with physical memory pages. All other components use dataspaces as a uniform abstraction for memory, memory-mapped I/O regions, and ROM modules.

Region maps

A region map1 represents the layout of a virtual address space. The size of the virtual address space is defined at its creation time. Region maps are created implicitly as part of a PD session (Section Protection domains (PD)) or manually via the RM service (Section Region-map management (RM)).

Populating an address space

The concept behind region maps is a generalization of the MMU's page-table mechanism. Analogously to how a page table is populated with physical page frames, a region map is populated with dataspaces. Under the hood, core uses the MMU's page-table mechanism as a cache for region maps. The exact way of how MMU translations are installed depends on the underlying kernel and is opaque to Genode components. On most base platforms, memory mappings are established in a lazy fashion by core's page-fault resolution mechanism described in Section Page-fault handling.

A region-map client in possession of a dataspace capability is able to attach the dataspace to the region map. Thereby the content of the dataspace becomes visible within the region map's virtual address space. When attaching a dataspace to a region map, core selects an appropriate virtual address range that is not yet populated with dataspaces. Alternatively, the client can specify a designated virtual address. It also has the option to attach a mere window of the dataspace to the region map. Furthermore, the client can specify whether the content of the dataspace should be executable or not.

The counterpart of the attach operation is the detach operation, which enables the region-map client to remove dataspaces from the region map by specifying a virtual address. Under the hood, this operation flushes the MMU mappings of the corresponding virtual address range so that the dataspace content becomes invisible.

Note that a single dataspace may be attached to any number of region maps. A dataspace may also be attached multiple times to one region map. In this case, each attach operation populates a distinct region of the virtual address space.

Access to boot modules (ROM)

During the initial bootstrap phase of the machine, a boot loader loads the kernel's binary and additional chunks of data called boot modules into the physical memory. After those preparations, the boot loader passes control to the kernel. Examples of boot modules are the ELF images of the core component, the init component, the components created by init, and the configuration of the init component. Core makes each boot module available as a ROM session1. Because boot modules are read-only memory, they are generally called ROM modules. On session construction, the client specifies the name of the ROM module as session argument. Once created, the ROM session allows its client to obtain a ROM dataspace capability. Using this capability, the client can make the ROM module visible within its local address space. The ROM session interface is described in more detail in Section Read-only memory (ROM).

Protection domains (PD)

A protection domain (PD) corresponds to a unit of protection within the Genode system. Typically, there is a one-to-one relationship between a component and a PD session1. Each PD consists of a virtual memory address space, a capability space (Section Capability spaces, object identities, and RPC objects), and a budget of physical memory and capabilities. Core's PD service also plays the role of a broker for asynchronous notifications on kernels that lack the semantics of Genode's signalling API.

Physical memory and capability allocation

Each PD session contains quota-bounded allocators for physical memory and capabilities. At session-creation time, its quota is zero. To make an allocator functional, it must first receive quota from another already existing PD session, which is called the reference account. Once the reference account is defined, quota can be transferred back and forth between the reference account and the new PD session.

Provided that the PD session is equipped with sufficient quota, the PD client can allocate RAM dataspaces from the PD session. The size of each RAM dataspace is defined by the client at the time of allocation. The location of the dataspace in physical memory is defined by core. Each RAM dataspace is physically contiguous and can thereby be used as DMA buffer by a user-level device driver. In order to set up DMA transactions, such a device driver can request the physical address of a RAM dataspace by invoking the dataspace capability.

Closing a PD session destroys all dataspaces allocated from the PD session and restores the original quota. This implies that these dataspaces disappear in all components. The quota of a closed PD session is transferred to the reference account.

Virtual memory and capability space

At the hardware-level, the CPU isolates different virtual memory address spaces via a memory-management unit. Each domain is represented by a different page directory, or an address-space ID (ASID). Genode provides an abstraction from the underlying hardware mechanism in the form of region maps as introduced in Section Region maps. Each PD is readily equipped with three region maps. The address space represents the layout of the PD's virtual memory address space, the stack area represents the portion of the PD's virtual address space where stacks are located, and the linker area is designated for dynamically linked shared objects. The stack area and linker area are attached to the address space at the component initialisation time.

The capability space is provided as a kernel mechanism. Note that not all kernels provide equally good mechanisms to implement Genode's capability model as described in Section Capability-based security. On kernels with support for kernel-protected object capabilities, the PD session interface allows components to create and manage kernel-protected capabilities. Initially, the PD's capability space is empty. However, the PD client can install a single capability - the parent capability - using the assign-parent operation at the creation time of the PD.

Region-map management (RM)

As explained in Section Protection domains (PD), each PD session is equipped with three region maps by default. The RM service allows components to create additional region maps manually. Such manually created region maps are also referred to as managed dataspaces. A managed dataspace is not backed by a range of physical addresses but its content is defined by its underlying region map. This makes region maps a generalization of nested page tables. A region-map client can obtain a dataspace capability for a given region map and use this dataspace capability in the same way as any other dataspace capability, i.e., attaching it to its local address space, or delegating it to other components.

Managed dataspaces are used in two ways. First, they allow for the manual management of portions of a component's virtual address space. For example, the so-called stack area of a protection domain is a dedicated virtual-address range preserved for stacks. Between the stacks, the virtual address space must remain empty so that stack overflows won't silently corrupt data. This is achieved by using a dedicated region map that represents the complete stack area. This region map is attached as a dataspace to the component's virtual address space. When creating a new thread along with its corresponding stack, the thread's stack is not directly attached to the component's address space but to the stack area's region map. Another example is the virtual-address range managed by a dynamic linker to load shared libraries into.

The second use of managed dataspaces is the provision of on-demand-populated dataspaces. A server may hand out dataspace capabilities that are backed by region maps to its clients. Once the client has attached such a dataspace to its address space and touches it's content, the client triggers a page fault. Core responds to this page fault by blocking the client thread and delivering a notification to the server that created the managed dataspace along with the information about the fault address within the region map. The server can resolve this condition by attaching a dataspace with real backing store at the fault address, which prompts core to resume the execution of the faulted thread.

Processing-time allocation (CPU)

A CPU session1 is an allocator for processing time that allows for the creation, the control, and the destruction of threads of execution. At session-construction time, the affinity of a CPU session with CPU cores can be defined via session arguments.

Once created, the session can be used to create, control, and kill threads. Each thread created via a CPU session is represented by a thread capability. The thread capability is used for subsequent thread-control operations. The most prominent thread-control operation is the start of the thread, which takes the thread's initial stack pointer and instruction pointer as arguments.

During the lifetime of a thread, the CPU client can retrieve and manipulate the state of the thread. This includes the register state as well as the execution state (whether the thread is paused or running). Those operations are primarily designated for realizing user-level debuggers.

To aid the graceful destruction of threads, the CPU client can issue a cancel-blocking operation, which causes the specified thread to cancel a current blocking operation such as waiting for an RPC response or the attempt to acquire a contended lock.

Access to device resources (IO_MEM, IO_PORT, IRQ)

Core's IO_MEM, IO_PORT, and IRQ services enable the realization of user-level device drivers as Genode components.

Memory mapped I/O (IO_MEM)

An IO_MEM session1 provides a dataspace representation for a non-memory part of the physical address space such as memory-mapped I/O regions or BIOS areas. In contrast to a memory block that is used for storing information, of which the physical location in memory is of no concern, a non-memory object has special semantics attached to its location within the physical address space. Its location is either fixed (by standard) or can be determined at runtime, for example by scanning the PCI bus for PCI resources. If the physical location of such a non-memory object is known, an IO_MEM session can be created by specifying the physical base address, the size, and the write-combining policy of the memory-mapped resource as session arguments. Once an IO_MEM session is created, the IO_MEM client can request a dataspace containing the specified physical address range.

Core hands out each physical address range only once. Session requests for ranges that intersect with physical memory are denied. Even though the granularity of memory protection is limited by the MMU page size, the IO_MEM service accepts the specification of the physical base address and size at the granularity of bytes. The rationale behind this contradiction is the unfortunate existence of platforms that host memory-mapped resources of unrelated devices on the same physical page. When driving such devices from different components, each of those components requires access to its corresponding device. So the same physical page must be handed out to multiple components. Of course, those components must be trusted to not touch any portion of the page that is unrelated to its own device.

Port I/O (IO_PORT)

For platforms that rely on I/O ports for device access, core's IO_PORT service enables the fine-grained assignment of port ranges to individual components. Each IO_PORT session1 corresponds to the exclusive access right to a port range specified as session arguments. Core creates the new IO_PORT session only if the specified port range does not overlap with an already existing session. This ensures that each I/O port is driven by only one IO_PORT client at a time. The IO_PORT session interface resembles the physical I/O port access instructions. Reading from an I/O port can be performed via an 8-bit, 16-bit, or 32-bit access. Vice versa, there exist operations for writing to an I/O port via an 8-bit, 16-bit, or 32-bit access. The read and write operations take absolute port addresses as arguments. Core performs the I/O-port operation only if the specified port address lies within the port range of the session.

Reception of device interrupts (IRQ)

Core's IRQ service enables device-driver components to respond to device interrupts. Each IRQ session1 corresponds to an interrupt. The physical interrupt number is specified as session argument. Each physical interrupt number can be specified by only one session. The IRQ session interface provides an operation to wait for the next interrupt. Only while the IRQ client is waiting for an interrupt, core unmasks the interrupt at the interrupt controller. Once the interrupt occurs, core wakes up the IRQ client and masks the interrupt at the interrupt controller until the driver has acknowledged the completion of the IRQ handling and waits for the next interrupt.

Logging (LOG)

The LOG service is used by the lowest-level system components such as the init component for printing diagnostic output. Each LOG session1 takes a label as session argument, which is used to prefix the output of this session. This enables developers to distinguish the output of different components with each component having a unique label. The LOG client transfers the to-be-printed characters as payload of plain RPC messages, which represents the simplest possible communication mechanism between the LOG client and core's LOG service.

Event tracing (TRACE)

The TRACE service provides a light-weight event-tracing facility. It is not fundamental to the architecture. However, as the service allows for the inspection and manipulation of arbitrary threads of a Genode system, TRACE sessions must not be granted to untrusted components.