Genode on seL4 - Building a simple root task from scratch

This document is a loose collection of notes about the exploration of the seL4 and the port of the Genode system to this kernel. The seL4 kernel is a modern microkernel jointly developed by NICTA and General Dynamics.

http://sel4.systems: Website of the seL4 project

A fresh Genode source-code repository

Following the convention to host each kernel in its dedicated base-<kernel> source repository, the seL4-specific code will go to base-sel4. This way, we can cleanly hold the seL4-specific code apart from generic Genode code.

For the start, the new repository will contain two things: This notes document at doc/ and the port-description file for downloading the seL4 kernel (ports/sel4.port) accompanied with the corresponding hash file (ports/sel4.hash).

Since seL4 is hosted on GitHub, writing a port-description file is easy. We can simply use base-nova/ports/nova.port as a template and adapt it:

 LICENSE   := GPLv2
 VERSION   := git
 DOWNLOADS := sel4.git

 URL(sel4) := https://github.com/seL4/seL4.git
 # experimental branch
 REV(sel4) := b6fbb78cb1233aa8549ea3acb90524306f49a8d2
 DIR(sel4) := src/kernel/sel4

There are two branches of seL4. The master branch is the stable branch that contains the formally verified version of the kernel. The experimental branch contains features that are not yet ready to be included in the master branch. Among those features is the support for virtualization. To anticipate current developments of the seL4 community, I am going with the experimental branch. For the ports file, I pick the commit ID of the most recent commit of the experimental branch at the time of porting. Once, the ports/sel4.port file is in place, we can generate the matching port hash as follows (from the base directory of the Genode source tree):

 # create an empty hash file
 touch repos/base-sel4/ports/sel4.hash

 # update the hash for the current version of the sel4.port file
 ./tool/ports/update_hash sel4

With the sel4.port file in place, we can download the seL4 kernel via:

 ./tool/ports/prepare_port sel4

After invoking this command, the kernel source will be located at contrib/sel4-<hash>/src/kernel/sel4.

Building the kernel for the first time

For getting acquainted with the code base, the README.md file provides a good starting point. It seems to contain exactly the information that I need at this point. As a first test, I am going to build the kernel for the pc99 platform using the Genode tool chain. According to the README, the following command should work

 make TOOLPREFIX=/usr/local/genode-gcc/bin/genode-x86- \
      ARCH=ia32 PLAT=pc99

On the first attempt, the following error occurs:

 Traceback (most recent call last):
   File "./tools/invocation_header_gen.py", line 18, in <module>
     import tempita
 ImportError: No module named tempita
 make: *** [arch/api/invocation.h] Error 1

This problem could be easily solved by installing the python-tempita package. However, further down the road, the build process stops with the following message:

 src/arch/ia32/kernel/boot_sys.c:75:26: error: ‘CONFIG_MAX_NUM_IOAPIC’ undeclared here (not in a function)
 src/plat/pc99/machine/hardware.c: In function ‘maskInterrupt’:
 src/plat/pc99/machine/hardware.c:36:9: error: implicit declaration of function ‘pic_mask_irq’ [-Werror=implicit-function-declaration]
 src/plat/pc99/machine/hardware.c:36:9: error: nested extern declaration of ‘pic_mask_irq’ [-Werror=nested-externs]

A grep for MAX_NUM_IOAPIC reveals that this definition is normally generated by the kernel configuration program as there is a match in src/plat/pc99/Kconfig. At this point, I am wondering if I am on the wrong track altogether. On the seL4 download page, the kernel is mentioned as just one of several "projects". But since it is the only one of the list of projects that I am actually interested in, I wanted to focus on only this one. But apparently, attempts to configure the kernel via make menuconfig are not supposed to work if checking out the repository in a free-standing fashion.

Sooner or later, I would have to look behind the curtain of the seL4 build system. So why not now? Passing BUILD_VERBOSE=1 V=1 to make is quite helpful in this situation. A look into include/config.h reveals that the use of the autoconf-generated header autoconf.h is optional. This is nice. So I can leave the kernel-configuration magic out of the loop and just manually specify the config definitions at the build command line. After a few iterations, I came up with the following command-line arguments:

 make TOOLPREFIX=/usr/local/genode-gcc/bin/genode-x86- \
      ARCH=ia32 PLAT=pc99 BUILD_VERBOSE=1 V=1 \
      LDFLAGS+=-Wl,-melf_i386 \
      LDFLAGS+=-nostdlib \
      LDFLAGS+=-Wl,-nostdlib \
      CFLAGS+=-m32 \
      CONFIG_KERNEL_EXTRA_CPPFLAGS+=-DCONFIG_MAX_NUM_IOAPIC=1 \
      CONFIG_KERNEL_EXTRA_CPPFLAGS+=-DCONFIG_IRQ_IOAPIC=1

The CONFIG_IRQ_IOAPIC flag is needed to specify whether the legacy PIC or the IOPIC should be used. It is picked up by for conditionally compiling the code of src/plat/pc99/machine/hardware.c.

As the result of the build process, we get a freshly baked kernel.elf file.

Of course, we don't want Genode users to manually build the kernel in this way. So we add a "kernel" target to our base-sel4 repository. The kernel target comes in the form of a src/kernel/target.mk file and a library lib/mk/x86_32/kernel.mk. The target.mk file is just a pseudo target with a dependency on the kernel.mk library. There may be multiple versions of the kernel.mk library. The build-system configuration determines the version that is used. E.g., when we set up the build directory for x86_32, the lib/mk/x86_32/kernel.mk one will be used. The kernel.mk file looks as follows:

 SEL4_DIR = $(call select_from_ports,sel4)/src/kernel/sel4

 ifeq ($(called_from_lib_mk),yes)
 all: build_kernel
 endif

 LINKER_OPT_PREFIX := -Wl,

 build_kernel:
   $(VERBOSE)$(MAKE) TOOLPREFIX=$(CROSS_DEV_PREFIX) \
             ARCH=ia32 PLAT=pc99 \
             LDFLAGS+=-nostdlib LDFLAGS+=-Wl,-nostdlib \
             $(addprefix LDFLAGS+=$(LINKER_OPT_PREFIX),$(LD_MARCH)) \
             $(addprefix CFLAGS+=,$(CC_MARCH)) \
             CONFIG_KERNEL_EXTRA_CPPFLAGS+=-DCONFIG_MAX_NUM_IOAPIC=1 \
             CONFIG_KERNEL_EXTRA_CPPFLAGS+=-DCONFIG_IRQ_IOAPIC=1 \
             SOURCE_ROOT=$(SEL4_DIR) -f$(SEL4_DIR)/Makefile

The pseudo target base-sel4/src/kernel/target.mk exists merely for making the result from the seL4 build directory visible in the install directory (bin/).

 TARGET = sel4
 LIBS   = kernel

 $(INSTALL_DIR)/$(TARGET):
   $(VERBOSE)ln -sf $(LIB_CACHE_DIR)/kernel/kernel.elf $@

Genode's build system works in two stages. At the first (light-weight) stage, it merely determines library dependencies. At the second stage, the actual build steps are performed. The condition around the all target ensures that the build_kernel target will be visited only at the second build stage with the current working directory set to the library location and all build variables (such as CROSS_DEV_PREFIX) defined. Fortunately, the seL4 build system supports out-of-tree builds by defining the SOURCE_ROOT variable.

To test drive the kernel target, we first need to create Genode build directory prepared for seL4. Later, we will add seL4 support to the regular create_builddir tool. But for now, it suffices to create one manually:

Create a new directory with an etc/ subdirectory.
Add an etc/specs.conf file with the following content:
```
 SPECS = sel4 x86_32
```
This tells the build system the so-called build specification.
Add an etc/build.conf file with the following content:
```
 GENODE_DIR   := /path/to/your/genode.git
 BASE_DIR     := $(GENODE_DIR)/repos/base
 REPOSITORIES := $(GENODE_DIR)/repos/base-sel4 \
                 $(GENODE_DIR)/repos/base
 CONTRIB_DIR  := $(GENODE_DIR)/contrib
```
GENODE_DIR must point to the root of Genode's source tree. BASE_DIR is used by the build system to find the build-system scripts. The CONTRIB_DIR is needed to enable the build system to find the location of the seL4 source code. The REPOSITORIES declaration lists all source repositories that should be included in the build. Note that the order is important. By listing base-sel4 before base, we allow base-sel4 to override the content of the base repository.
Symlink the <genode-dir>/too/builddir/build.mk file to a Makefile.
```
 ln -s <genode-dir>/tool/builddir/build.mk Makefile
```

With the build directory, we can trigger the kernel build via make kernel. The seL4 build process will be invoked from within the kernel library. Hence, the library's directory var/libcache/kernel will be used as the build directory of the kernel.

Starting the kernel in Qemu

To test-drive the kernel, we need to create a bootable image including a boot loader, the boot-loader configuration, and the kernel. To spare us the manual work, Genode comes with a so-called run tool that automates this procedure. We merely need to supplement the run tool with a script snippet that describes how a boot image is assembled for the seL4 base platform. This snippet has to reside at tool/run/boot_dir/sel4. To create it, we can take inspiration from the versions for the other platforms such as NOVA. It comes down to implementing the function run_boot_dir according to our needs. For the most part, the function contains the generation of the boot-loader configuration, i.e., GRUB's menu.lst file. At the current stage, we simply load the seL4 ELF image as multi-boot kernel. For using the run tool for the initial test, the following small run script at base-sel4/run/test.run does the trick:

 create_boot_directory
 build_boot_image ""
 append qemu_args " -nographic -m 64 "
 run_genode_until forever

To invoke it, all we have to do is issuing make run/test from the build directory. All we see, however, is, well, nothing.

Normally we would expect from the kernel to print a life sign when booting up. This is suggested by a printf in arch/arm/kernel/boot.c. So the lack of output is most certainly a configuration issue. A look in include/plat/pc99/plat/machine/io.h indicates that kernel_putchar, which is used as the back end of printf (src/machine/io.c) is compiled-in only if DEBUG mode is enabled. Adding DEBUG=1 to the sel4 build command (in our lib/mk/x86_32/kernel.mk file) enables this mode.

But enabling the DEBUG option comes with a strange surprise. The kernel build would fail with the following error:

 kernel.o: In function `insert_dev_p_reg':
 kernel_final.c:(.boot.text+0x131a): undefined reference to `putchar'

The message is strange because the seL4 source code is void of any call to putchar. It turned out that the compiler automatically turned a printf with a one-character string ("/n") into a call of putchar. Fortunately, this built-in heuristic can be disabled by passing -fno-builtin-printf to the CFLAGS when compiling seL4. With this fix, we can successfully compile the kernel in debug mode, boot it in Qemu, and observe the first life sign:

 Boot config: parsing cmdline '/sel4'
 Boot config: console_port of node #0 = 0x3f8
 Boot config: debug_port of node #0 = 0x3f8
 Boot config: max_num_nodes = 1
 Boot config: num_sh_frames = 0x0
 seL4 failed assertion '_ndks_end - _ndks_start <= NDKS_SIZE'
   at src/arch/ia32/kernel/boot_sys.c:424 in function try_boot_sys

The NDKS section apparently contains those parts of the image that should always be mapped at the upper part of the virtual address space. The assertion triggers because the section of the binary is larger (40900 bytes) than the assumed limit NDKS_SIZE (12288 bytes). According to the linker script (src/plat/pc99/linker.lds), the section includes the kernel stack, the BSS, and "COMMON" symbols. When looking at the list of those symbols using 'nm | sort', the ksReadyQueues symbol raises my eyebrows because it is quite large (32768 bytes) compared to all others. This symbol belongs to an array, which is dimensioned as CONFIG_NUM_DOMAINS * CONFIG_NUM_PRIORITIES. As we are using the default configuration values of include/config.h, CONFIG_NUM_DOMAINS is 16 and CONFIG_NUM_PRIORITIES is 256. Hence, the array has 4096 entries with 8 bytes for each entry (tcb_queue_t with two pointers). Interestingly, the default value of NUM_DOMAINS in the Kconfig is 1, which would result in a much smaller ksReadyQueues array. In fact, passing the following argument to the kernel build fixes the assertion:

 CONFIG_KERNEL_EXTRA_CPPFLAGS+=-DCONFIG_NUM_DOMAINS=1

Now, the kernel bootstraps successfully, detects one CPU and tries to obtain information about the boot modules, which we don't have provided yet. Hence, the kernel backs out with the following message:

 Boot loader did not provide information about boot modules
 seL4 called fail at src/arch/ia32/kernel/boot_sys.c:711
   in function boot_sys, saying "boot_sys failed for some reason :(

Well, this is expected. So it is time to take the first baby step of providing a root task to the system.

A root task for exercising the kernel interface

At this point, there are two options. We could either start with a very minimalistic hello-world program that is completely unrelated to Genode. Such a root-task from scratch could be used to explore the individual system calls before jumping into Genode. The second option would be to go directly for a Genode program, which includes a proper C++ runtime and Genode's generic linker script.

I went for the latter option and created a simple test program at base-sel4/src/test/sel4/. The target.mk file looks as follows:

 TARGET = test-sel4
 SRC_CC = main.cc
 LIBS   = cxx startup

The main function of the main.cc file does not much except for writing at an illegal (yet known) address:

 int main()
 {
   *(int *)0x1122 = 0;
   return 0;
 }

When attempting to build the program by issuing make test/sel4 from within our build directory, the build system will attempt to compile the C++ runtime, which, in turn, depends on a few Genode headers. Some headers are readily provided by the base/ repository but others are expected to be supplied by the base-<kernel> repository. The compile errors look like this:

     COMPILE  malloc_free.o
 In file included from repos/base/include/parent/capability.h:17:0,
                  from repos/base/include/base/env.h:20,
                  from repos/base/src/base/cxx/malloc_free.cc:17:
 repos/base/include/base/capability.h:21:31:
   fatal error: base/native_types.h: No such file or directory

For now, we can supply a dummy version of this header, which contain preliminary type definitions. To see, which types are expected from which header file, we can take cues from the other base-<kernel> platforms.

At link time, we will observe plenty of undefined references originating from the startup code. Most of those references concern basic data structures such as the AVL tree. Normally, those symbols are provided by the so-called base-common library, which is used both core and non-core programs. We can take a copy of a base-common.mk file from one of the other base platforms as a starting point. I decided to go for the version provided by base-nova. With the base-common library present, we can replace the cxx and startup libs by the base-common library in the LIBS declaration of our test-sel4 program. Because base-common already depends on both cxx and startup, there is no need to specify those libraries twice.

On the attempt to compile the base-common library, we will stumble over further missing headers such as ipc_msgbuf.h. Here the base-linux version of this file becomes handy because it is free-standing.

When compiling lock.cc, the compiler complaints about the missing lock_helper.h file. This platform-specific file that is internally used by the lock implementation is normally expected at base-<kernel>/src/base/lock/. For now, just for getting the binary linked, we provide empty dummies:

 static inline void thread_yield() { }
 static inline void thread_switch_to(Genode::Thread_base *thread_base) { }
 static inline void thread_stop_myself() { }

 static inline bool thread_check_stopped_and_restart(Genode::Thread_base *)
 {
   return false;
 }

The next missing piece is the platform-specific implementation of the IPC API (ipc/ipc.cc and ipc/pager.cc), which are expected to reside at base-sel4/src/base/ipc/. We just take the simplest version of the other base platforms (in this case base-codezero/src/base/ipc/) and strip it down to become a mere dummy.

By adding the base-common library, the number of unresolved references decreased by a great amount. Some functions are still unresolved. There are many references to printf, which is not part of the base-common library because core uses a different back end (typically a kernel debugger) than non-core processes (using a session to core's LOG service). Because our test plays the role of a root task, we can include core's version by adding base/src/base/console/core_printf.cc as source to our target description file.

 SRC_CC += base/console/core_printf.cc
 vpath %.cc $(BASE_DIR)/src

The back end of core_printf.cc has to be provided by a header at base-sel4/src/base/console/core_console.h. For now, we just provide an empty implementation of _out_char. To let the compiler find the header, we need to extend the include search path as follows:

 INC_DIR += $(REP_DIR)/src/base/console

For the unresolved references of env_context_area_ram_session and env_context_area_rm_session, the following dummy will do:

 #include <ram_session/ram_session.h>
 #include <rm_session/rm_session.h>

 namespace Genode {
   Rm_session  *env_context_area_rm_session()  { return nullptr; }
   Ram_session *env_context_area_ram_session() { return nullptr; }
 }

The remaining piece of the puzzle is the Genode::env() function, which is an accessor to the Genode environment. Because the Genode environment does not exist yet, we provide a dummy env called mini_env. Initially, this implementation of the Genode::Env interface merely returns dummy values (null pointers and invalid capabilies).

After this step, the test-sel4 target links successfully. To use it as roottask, we have to modify our test.run script by

Adding a build step
```
 build { test/sel4 }
```
Specifying the test-sel4 binary as boot module so that the our run environment includes it in the boot image and appends the file as module to the boot-loader configuration.
```
 build_boot_image "test-sel4"
```

When issuing make run/test now, we get the following messages from the kernel:

 Detected 1 boot module(s):
   module #0: start=0x169000 end=0x186754 size=0x1d754
              name='/genode/test-sel4'
 ELF-loading userland images from boot modules:
   module #0 for node #0: size=0x27000 v_entry=0x136cc
                          v_start=0x0 v_end=0x27000
                          p_start=0x187000 p_end=0x1ae000
 Moving loaded userland images to final location:
                          from=0x187000 to=0x15e000 size=0x27000

 Starting node #0
 Caught cap fault in send phase at address 0x0
 while trying to handle:
 vm fault on data at address 0x9090c3fb with status 0x4
 in thread 0xe0189880 at address 0x314a

Assuming that v_start and v_end stand for the virtual address range of the loaded binary, those numbers look weird. Normally, Genode binaries are linked at a much higher address, e.g., 0x1000000. By inspecting the binary via the readelf command, it turns out that we haven't yet declared the link address for the seL4 platform. So its time to introduce a so-called spec file for the "sel4" build-spec value. The new file base-sel4/mk/spec-sel4.mk will be incorporated into the build process whenever the SPEC declaration of the build directory has the value "sel4" listed. To define the default link address for the platform, the file has the following content:

 LD_TEXT_ADDR ?= 0x01000000

When issuing make test/sel4 VERBOSE=, we can see the link address specified at the linker command line. Another look at the binary via readelf confirms that the location of the text segment looks good now. Re-executing the run script produces the same result as before though. But the last message is quite helpful:

 vm fault on data at address 0x0 with status 0x4
 in thread 0xe0189880 at address 0x100312b

The last address lies well within the text segment of our binary. It seems that the kernel has kicked off the execution of our root task (presumably at the entry point address as found in the ELF binary). At some point, our "root task" de-references a null pointer. Given that several of the dummy functions that we just created, return null pointers, this is hardly a surprise. So let us have a look how far we have come by inspecting the fault address 0x100312b using

 objdump -lSd bin/test-sel4 | less

In less, we search for the pattern "100312b". We see that the surrounding code belongs to the heap implementation (Heap::_allocate_dataspace). It seems that someone is trying to use Genode::env()->heap(). To confirm this assumption, we can use Qemu's GDB stub to obtain a backtrace. We have to take the following steps:

We want to halt the execution at the point where the fault would happen instead of triggering a page fault. The easiest way to accomplish that is to insert an infinite loop at the right spot. The infinite loop serves us as a poor man's break point. In our case, we add a for (;;); statement right at the beginning of the _allocate_dataspace function in repos/base/src/base/heap/heap.cc. When re-executing the run script after this change, we can see that the fault message won't appear.
We cancel the execution of the run script and start Qemu manually using the command-line arguments that are found in the log output (the message "spawn qemu ...". To enable Qemu's GDB stub, we have to append "-s" as argument.
```
 qemu-system-i386 -nographic -m 64 -cdrom var/run/test.iso -s
```
When qemu is running, we start GDB from another shell (changing to our build directory) as follows:
```
 gdb bin/test-sel4 -ex "target remote :1234"
```

Listing the backtrace via the bt command is quite revealing (output slightly edited for brevity):

 (gdb) bt
 #0  Genode::Heap::_allocate_dataspace
     at genode/repos/base/src/base/heap/heap.cc:57
 #1  0x010030b9 in Genode::Heap::_unsynchronized_alloc
     at genode/repos/base/src/base/heap/heap.cc:181
 #2  0x01003155 in Genode::Heap::alloc
     at genode/repos/base/src/base/heap/heap.cc:199
 #3  0x01010732 in malloc
 #4  0x0100f539 in start_fde_sort
 #5  init_object
     at ../../../../../../contrib/gcc-4.7.2/libgcc/unwind-dw2-fde.c:768
 #6  search_object
     at ../../../../../../contrib/gcc-4.7.2/libgcc/unwind-dw2-fde.c:958
 #7  0x01010375 in _Unwind_Find_registered_FDE
     at ../../../../../../contrib/gcc-4.7.2/libgcc/unwind-dw2-fde.c:1022
 #8  _Unwind_Find_FDE
     at ../../../../../../contrib/gcc-4.7.2/libgcc/unwind-dw2-fde-dip.c:440
 #9  0x0100d4c3 in uw_frame_state_for
 #10 0x0100dff8 in uw_init_context_1
     at ../../../../../../contrib/gcc-4.7.2/libgcc/unwind-dw2.c:1500
 #11 0x0100e3ea in _Unwind_RaiseException
 #12 0x01011f51 in __cxa_throw ()
 #13 0x01013751 in init_main_thread ()
     at genode/repos/base/src/platform/init_main_thread.cc:119
 #14 0x010134fe in _start ()
     at genode/repos/base/src/platform/x86_32/crt0.s:47

We can see that Genode's startup code tries to throw the first C++ exception. There is a lengthy comment at the corresponding code portion that explains the rationale behind throwing an exception right from the startup code. The exception handling, in particular the stack unwinding) is done by the GCC support library, which eventually calls malloc to allocate some backing store for metadata. Genode has no malloc function. But for making the GCC support library happy (which expects a C runtime to be present), our Genode-specific C++ support code in the form of the cxx library comes with a library-local version of malloc (base/src/base/cxx/malloc_free.cc). This malloc implementation uses a separate Genode::Heap instance. The instance has an initial capacity of 512 bytes. If the allocations exceed this capacity, this heap will try to expand using env()->ram_session() as backing store. This is why the _allocate_dataspace function was called. At the current stage, we don't have an implementation of env()->ram_session(). To work around this early allocation issue, we can simply increase the capacity of the initial_block, let's say by factor 10. When re-executing the run script now, _allocate_dataspace won't be called. Hence, we execution won't get stuck in our infinite loop (we can remove the loop either way now). Instead, we get another null-pointer dereference:

 vm fault on data at address 0x0 with status 0x4
 in thread 0xe0189880 at address 0x1003ab7

The procedure to investigate the reason for this page fault is exactly the same as for the first one, using objdump, infinite loops, and Qemu's GDB stub.

This time, the null-pointer dereference occurs in base/src/base/thread/thread.cc on the attempt to allocate so-called thread context for the main thread. Such thread contexts contain the stacks and meta data of threads. They are placed in a dedicated virtual memory area that is manually managed by the process.

For our minimalistic root task, we don't have Genode's address-space management facilities at our disposal yet. So we have to side-step the _alloc_context function somehow. This can be accomplished by using custom version of thread.cc instead of the one provided by the base/ repository. So we remove thread.cc from the base-common library for now and add a new thread.cc file to our test-sel4 target. The following dummy stub will do for now:

 static Thread_base::Context *main_context()
 {
   enum { STACK_SIZE = sizeof(long)*4*1024 };

   static long buffer[STACK_SIZE/sizeof(long)];

   /* the context is located beyond the top of the stack */
   addr_t const context_addr = (addr_t)buffer + sizeof(buffer)
                             - sizeof(Thread_base::Context);

   Thread_base::Context *context = (Thread_base::Context *)context_addr;
   context->stack_base = (addr_t)buffer;
   return context;
 }

 Thread_base *Thread_base::myself() { return nullptr; }

 Thread_base::Thread_base(const char *name, size_t stack_size, Type type,
                          Cpu_session *cpu_session)
 :
   _cpu_session(cpu_session), _context(main_context())
 {
   strncpy(_context->name, name, sizeof(_context->name));
   _context->thread_base = this;

   _init_platform_thread(type);
 }

 Thread_base::Thread_base(const char *name, size_t stack_size, Type type)
 : Thread_base(name, stack_size, type, nullptr) { }

 Thread_base::~Thread_base() { _deinit_platform_thread(); }

Genode's startup code will change the stack of the main thread prior calling the main function. This way, the stack can be placed at a dedicated area of the virtual address space (thread-context area). By not placing the stack close to the program image, stack overflows won't silently corrupt data but trigger a page fault. The code above, however, allocates a dummy context for the main thread in the BSS segment (the buffer used as backing store for the context is a static variable). This code above is intended as an interim solution for the initialization of the main thread only.

When executing our run script again, we will get the following message:

 vm fault on data at address 0x1122 with status 0x6
 in thread 0xe0189880 at address 0x1000190

This message is caused by the main function of our test program! In this function, we deliberately triggered a fault at the address 0x1122 via the statement *(int *)0x1122 = 0;.

Issuing the first system call

We have successfully started our custom root task but we have not interacted with the kernel yet. So it is time to take a look at seL4's system-call interface. The interfaces comes in the form of several header files within sel4's libsel4/ directory. At the first glance, the directory layout looks straight forward. The generic parts reside in libsel4/include/ whereas the architecture-depending parts are located at libsel4/arch_include/. However, when skimming over the headers, it becomes apparent that some of them are generated from XML files. Also, some headers are including a top-level header <autoconf.h>.

To make sel4's kernel interface visible to the Genode world, we use a pseudo library called platform.mk. The platform library is built before all other libraries and targets and thereby gives us a hook to populate the build directory with a custom include-directory structure. Because the selection of the kernel-interface header depends on the architecture, we place the platform.mk file at lib/mk/x86_32. To create the platform library, we can take OKL4's version as a blue print. When the library gets built, the include directory structure will be created as a side effect. However, we cannot implement the removal of the include/ directory of a side effect of a clean rule because library description files have no clean rule (the build system just wipes the respective library directory when cleaning). To complement the creation of the include/ directory structure with a corresponding clean rule, the base-sel4/mk/spec-sel4.mk file can be extended with such a rule, which will be globally visible.

The platform library is used as a mere hook to create the include directory structure within the build directory. To allow a program to actually use those headers, we'd need to extend the include-search path accordingly. One way would be to have each target specify the build-directory's local include path via INC_DIR += $(BUILD_BASE_DIR)/include. Not too bad. However, to make the use of the syscall headers more convenient, we introduce just another library called syscall. The library is solely used for providing a so-called library-import file to all targets that use the library. The import file contains the extension of the include-search path. Additionally, we extend the REP_INC_DIR with the value "include/sel4". This way, we can place custom headers (such as a version of autoconf.h) within one of the Genode source repositories under include/sel4/. Those headers will appear to be located at the inlude root scope for such targets. Speaking of autoconf.h, this header is expected by some sel4 includes to distinguish the debug mode from the non-debug mode. As we want to enable the debug functionality, we supply our version of the autoconf.h file at base-sel4/include/sel4/autoconf.h with the definition:

 #define SEL4_DEBUG_KERNEL 1

Besides the autoconf.h file, the kernel-interface headers also require an stdint.h to be present at the include root scope. So we place a version of this file at base-sel4/include/sel4/stdint.h.

 #include <base/fixed_stdint.h>

 typedef genode_uint32_t uint32_t;

 #ifndef NULL
 #define NULL ((void *)0)
 #endif

For trying out the access to the kernel-interface headers, we let our target use the syscall library by extending the target.mk file with LIBS += syscall. Then, we change the main program as follows.

 /* Genode includes */
 #include <util/string.h>

 /* seL4 includes */
 #include <sel4/arch/syscalls.h>

 int main()
 {
   char const *string = "\nMessage printed via the kernel\n";
   for (unsigned i = 0; i < Genode::strlen(string); i++)
     seL4_DebugPutChar(string[i]);

   *(int *)0x1122 = 0;
   return 0;
 }

Here we see three things. First, we can use Genode's usual utilities such as the string functions, which is quite convenient. Second, we include one of seL4 headers. And third, we try to invoke the seL4_DebugPutChar system call to print a string before triggering the page fault at address 0x1122. Before we can run the program, through, we just have to overcome yet another problem. On the attempt to build it, we get the following message:

     COMPILE  main.o
 In file included from include/sel4/arch/syscalls.h:15:0,
                  from genode/repos/base-sel4/src/test/sel4/main.cc:18:
 include/sel4/types.h:16:28: fatal error: sel4/types_gen.h: No such file or directory

A look at the top-level Makefile of seL4 reveals that this file is generated from a "types.bf" file using a python script called tools/bitfield_gen.py. So we have to add a rule for generating this file to our platform library.

 $(BUILD_BASE_DIR)/include/sel4/types_gen.h: $(LIBSEL4_DIR)/include/sel4/types.bf
   $(VERBOSE)python $(LIBSEL4_DIR)/tools/bitfield_gen.py \
                    --environment libsel4 "$<" $@

The next missing header that we stumble over is assert.h. So we have to add a simple version of assert.h to base-sel4/include/sel4/. This version uses Genode's PDBG facility to print error messages, which is, of course, not expected to work yet.

On the next attempt to build the program, the compilation fails because of a missing header again:

 In file included from include/sel4/arch/syscalls.h:15:0,
                  from genode/repos/base-sel4/src/test/sel4/main.cc:18:
 include/sel4/types.h:17:26: fatal error: sel4/syscall.h: No such file or directory

The libsel4/Makefile contains the rule to generate this file. At this point, I am wondering whether to use this Makefile to add those rules to our platform library. I decided for the latter because there are not too many files to generate, I will need to look behind the scenes sooner or later anyway, and I would need to supply some some additional environment (such as providing a common.mk file in order to invoke the original Makefile. The rules for generating headers like syscall.h look similar to the rule for types_gen.h above.

With sel4/syscall.h in place, we get confronted with another problem:

 include/sel4/arch/syscalls.h: In function ‘void seL4_Send(seL4_CPtr, seL4_MessageInfo_t)’:
 include/sel4/arch/syscalls.h:32:26: error: ‘seL4_GetMR’ was not declared in this scope

Fortunately, this error is simply caused by a missing include directive of syscalls.h. The seL4_GetMR function is provided by sel4/arch/functions.h but this file is never included except for sel4/sel4.h. I assume that seL4 users are expected to always include the sel4/sel4.h file instead of including individual kernel headers. By prepending the include of <sel4/arch/functions.h> to the include of '<sel4/arch/syscalls.h>, the problem goes away and clears the path for the next one:

 include/sel4/arch/syscalls.h: In function ‘void seL4_DebugPutChar(char)’:
 include/sel4/arch/syscalls.h:478:16: error: ‘seL4_SysDebugPutChar’ was not declared in this scope

This message is accompanied with similar errors for "seL4_SysDebugHalt", "seL4_SysDebugSnapshot", and "seL4_SysDebugCapIdentify". A look in the generated include/sel4/syscall.h reveals that those declarations exists only when building in DEBUG mode. Hence, we need to add the following line to our autoconf.h file:

 #define DEBUG 1

The next problem is more tricky:

 include/sel4/arch/syscalls.h: In function ‘int main()’:
 include/sel4/arch/syscalls.h:481:6: error: impossible register constraint in ‘asm’

The error refers to the system-call binding for seL4_DebugPutChar. After twiddling with the asm constraints, it turns out that the error is caused by the use of an enum value as input argument. The C++ compiler is free to pick any integer type that it sees fit for representing an enum value. Even though the seL4 developers use a helper macro (SEL4_FORCE_LONG_ENUM) to force a certain minimum bit width for the used type, the C++ compiler complains. By explicitly casting the input argument to int, this ambiguity can be resolved and the compiler becomes happy. Unfortunately, this means that I will have to patch the system-call bindings to make them fit for the use with C++. But looking at the bindings, I think that a patch won't be avoidable anyway because the bindings clobber the EBX register. This means that we won't be able to use the headers for compiling position-independent code (as is the case for Genode). For now, we have are not compiling with -fPIC enabled but this issue is clear in front of us.

Patches for the seL4 code will be reside at base-sel4/src/kernel/. E.g., we just added the current modification of the syscalls.h header by copying a git diff to the file base-sel4/src/kernel/syscalls.patch. To apply the patch automatically when preparing the seL4 port, we need to modify the base-sel4/ports/sel4.port file by adding the following lines:

 PATCHES   := src/kernel/syscalls.patch
 PATCH_OPT := -p1 -d src/kernel/sel4

Since we modified the port-description file, we need to update the accompanied hash via ./tool/ports/update_hash sel4.

Edit: After consulting the seL4 mailing list, Adrian Danis pointed out that the actual issue is an off-by-one bug in the SEL4_FORCE_LONG_ENUM macro. So instead of explicitly casting all opcodes to integers, the macro can be fixed at one place. Hence, I replaced the syscalls.patch by a macros.patch until the fix appears upstream.

Anyway, after all the steps, our test-sel4 program can be successfully built. Executing the run script produces the result that we longed for:

 Message printed via the kernel
 Caught cap fault in send phase at address 0x0
 while trying to handle:
 vm fault on data at address 0x1122 with status 0x6
 in thread 0xe0189880 at address 0x10001d8

The first line is produced by our test program. Knowing how to print characters using the kernel's debug interface, filling out the empty stub function Genode::Core_console::_out_char in core_console.h is easy. We can replace the main program with this version:

 #include <base/printf.h>

 int main()
 {
   PDBG("a message printed via Genode's PDBG");

   *(int *)0x1122 = 0;
   return 0;
 }

When running it via make run/test, it produces the expected result:

 int main(): a message printed via Genode's PDBG

Exploration of the kernel interface

Now that we have a nice playground in place, it is time to explore the actual kernel interface step by step. The goal is to get a tangible feeling for the kernel and to exercise the functionality that is needed by Genode. Those functionalities are:

Access to boot information such as the memory layout and the boot modules
Multi-threading
Process-local synchronization
Synchronous inter-process communication between threads
Address-space creation
Page-fault handling

At this point, it is useful to take a look at the excellent seL4 reference manual to learn the concepts behind the kernel interface:

http://sel4.systems/Docs/seL4-manual.pdf: seL4 reference manual

Obtaining boot information

The manual mentions the function seL4_GetBootInfo to obtain a pointer to a boot-info structure. The function implementation, however, requires a prior call of seL4_InitBootInfo from the startup code. According to the manual, the startup code gets the pointer passed in a CPU register. It presumably registers this pointer via seL4_InitBootInfo. Of course, we don't want to change Genode's kernel-agnostic startup code by inserting a call to the seL4_InitBootInfo function. Fortunately, this is not the first time, Genode has to pick up a register value passed by the kernel to root task via a CPU register. The startup code already saves the initial stack pointer, EAX and EDI. The seL4 manual does not state, which register is used. In the kernel code (create_initial_thread), the register is denoted at "capRegister". A look into ia32/arch/machine/registerset.h reveals that this register is actually EBX on the x86 architecture. Since EBX is not saved by Genode's startup code yet, we need to enhance the startup code a bit to save the initial EBX value in the variable _initial_bx.

The boot-info type is declared in sel4/bootinfo.h. When including the header, we have to expand our sel4/stdint.h version by a definition of uint8_t. Also, the header expects CONFIG_MAX_NUM_BOOTINFO_UNTYPED_CAPS to be defined. We are just using the kernel's default config value, which we copy to our sel4/autoconf.h file.

 #define CONFIG_MAX_NUM_BOOTINFO_UNTYPED_CAPS 800

With those changes in place, we can access the boot info with a utility like this:

 #include <sel4/bootinfo.h>

 static seL4_BootInfo const *boot_info()
 {
   extern Genode::addr_t __initial_bx;
   return (seL4_BootInfo const *)__initial_bx;
 }

While writing a simple dump_boot_info function, I noticed that some boot-info fields mentioned in the manual and present on the master branch, have disappeared in the experimental branch, i.e., numDeviceRegions.

Anyway, the output of dump_boot_info function looks like this:

 --- boot info at 102c000 ---
 ipcBuffer:               102b000
 initThreadCNodeSizeBits: 12
 untyped:                 [38,4d)
                          [38] [00100000,00107fff]
                          [39] [00108000,00109fff]
                          [3a] [001a0000,001bffff]
                          [3b] [001c0000,001fffff]
                          [3c] [00200000,003fffff]
                          [3d] [00400000,007fffff]
                          [3e] [00800000,00ffffff]
                          [3f] [01000000,01ffffff]
                          [40] [02000000,02ffffff]
                          [41] [03000000,037fffff]
                          [42] [03800000,03bfffff]
                          [43] [03c00000,03dfffff]
                          [44] [03e00000,03efffff]
                          [45] [03f00000,03f7ffff]
                          [46] [03f80000,03fbffff]
                          [47] [03fc0000,03fdffff]
                          [48] [03fe0000,03feffff]
                          [49] [03ff0000,03ff7fff]
                          [4a] [03ff8000,03ffbfff]
                          [4b] [03ffc000,03ffdfff]
                          [4c] [00189000,001897ff]

Creating a thread

On seL4, kernel objects are created by turning untyped memory into kernel memory using the seL4_Untype_Retype function. As a first test. I am going to manually define the parameters for this function using the information from the boot info.

But before I can start using this function, we first need to generate some stub code. We need to generate sel4/invocation.h and its corresponding sel4/arch/invocation.h. The rules in the platform library are quickly added, taking the seL4 Makefile as inspiration. We also need to generate the interfaces/sel4_client.h stub code. In the stub code, we find the function seL4_Untyped_RetypeAtOffset, which pretty much matches the signature of seL4_Untype_Retype explained in the manual. This is just another difference from the master branch.

I decided to proceed with invoking seL4_Untyped_RetypeAtOffset by manually specifying its parameters. At this point, I am having a hard time with wrapping my mind around seL4's kernel-resource management, in particular the CNode addressing. My first attempt looked like this:

 {
   seL4_Untyped const service     = 0x38; /* untyped */
   int          const type        = seL4_TCBObject;
   int          const offset      = 0;
   int          const size_bits   = 0;
   seL4_CNode   const root        = seL4_CapInitThreadCNode;
   int          const node_index  = 0;
   int          const node_depth  = 32;
   int          const node_offset = 0x100;
   int          const num_objects = 1;

   int const ret = seL4_Untyped_RetypeAtOffset(service,
                                               type,
                                               offset,
                                               size_bits,
                                               root,
                                               node_index,
                                               node_depth,
                                               node_offset,
                                               num_objects);
   PDBG("seL4_Untyped_RetypeAtOffset returned %d", ret);
 }

Admittedly, I feel a bit flabbergasted by the amount of arguments and not 100% certain what I am doing here. I put the individual arguments into named constants instead of directly supplying them to the function to make their meaning easier to memorize. The service argument refers to one of the untyped memory capabilities reported by the boot info. This memory will be turned into a thread control block (TCB). The node_offset is a presumably free capability slot of the root CScope that is supposed to be free. This is where we want to store the capability for the newly created thread.

When executing the code, the kernel reports an error like this:

 vm fault on data at address 0x1e8 with status 0x6
 in thread 0xe0189880 at address 0x10002ed

The fault is triggered by the function seL4_SetCap, more precisely by the instruction:

  mov %eax,%gs:0x1e8(,%ecx,4)

It appears that the seL4 bindings rely on a thread-local-storage facility via GS-relative addressing. When the kernel switches thread contexts, it loads a segment with a thread-specific memory location. Since we have not initialized the GS register with a particular value, we end up addressing the first page, which is not mapped. The issue could be resolved by initializing the GS register as follows:

 static inline void init_ipc_buffer()
 {
   asm volatile ("movl %0, %%gs" :: "r"(IPCBUF_GDT_SELECTOR) : "memory");
 }

The IPCBUF_GDT_SELECTOR is defined by the seL4 headers. On the next attempt to execute the code, we get a much nicer kernel message:

 <<seL4 [decodeUntypedInvocation/136 Te0189880 @100035a]:
         Untyped Retype: Destination cap invalid or read-only.>>

In reality, the message looks even much better - it is in color! The message tells us that the kernel has actually received our request for retyping untyped memory but the arguments are messed up. The message comes from src/object/untyped.c:

 /* Lookup the destination CNode (where our caps will be placed in). */
 if (nodeDepth == 0) {
     nodeCap = extraCaps.excaprefs[0]->cap;
 } else {
     cap_t rootCap = extraCaps.excaprefs[0]->cap;
     lu_ret = lookupTargetSlot(rootCap, nodeIndex, nodeDepth);
     if (lu_ret.status != EXCEPTION_NONE) {
         userError("Untyped Retype: Invalid destination address.");
         return lu_ret.status;
     }
     nodeCap = lu_ret.slot->cap;
 }

Apparently, by specifying the value 32 for the depth argument, we entered the code path for traversing a CNode tree instead of just inserting a capability into the root CScope. By changing node_depth to 0, system call returns successfully:

 int main(): seL4_Untyped_RetypeAtOffset returned 0

Even though the new thread has no valid register set and no defined space, let us see what happens when we try to start it anyway. This can be done via the seL4_TCB_Resume call with our just created TCB capability as argument.

 seL4_TCB_Resume(0x100);

This results in the following exciting output:

 Caught cap fault in send phase at address 0x0
 while trying to handle:
 vm fault on data at address 0x1122 with status 0x6
 in thread 0xe0189880 at address 0x10003a5
 Caught cap fault in send phase at address 0x0
 while trying to handle:
 vm fault on data at address 0x0 with status 0x4
 in thread 0xe0100080 at address 0x0

We see two threads faulting! The main thread faults at our "breakpoint" 0x1122. But there is also another thread, which faults at 0x0. We can see that the TCB creation via seL4_Untyped_RetypeAtOffset was successful!

Now, turning the situation into something useful seems like a walk in the park: We need to allocate a stack for the new thread and set up the initial program counter and stack pointer of the new thread. At this point, I decide to give the number 0x100 a proper name "SECOND_THREAD_CAP" because it will be repeatedly used.

 enum { SECOND_THREAD_CAP = 0x100 };

Following the manual, we have to call seL4_TCB_WriteRegisters and seL4_TCB_SetSpace. The following code snippet allocates the stack for the new thread on the stack of the main thread, initializes the stack pointer and program counter of the new thread, assigns the new thread to the same address space as the main thread, and kicks off the execution of the new thread.

  long stack[0x1000];
  {
    seL4_UserContext regs;
    Genode::memset(&regs, 0, sizeof(regs));

    regs.eip = (uint32_t)&second_thread_entry;
    regs.esp = (uint32_t)&stack[0] + sizeof(stack);
    int const ret = seL4_TCB_WriteRegisters(SECOND_THREAD_CAP, false,
                                            0, 2, &regs);
    PDBG("seL4_TCB_WriteRegisters returned %d", ret);
  }

  {
    seL4_CapData_t no_cap_data = { { 0 } };
    int const ret = seL4_TCB_SetSpace(SECOND_THREAD_CAP, 0,
                      seL4_CapInitThreadCNode, no_cap_data,
                      seL4_CapInitThreadPD, no_cap_data);
    PDBG("seL4_TCB_SetSpace returned %d", ret);
  }

  seL4_TCB_Resume(SECOND_THREAD_CAP);

The entry function of the new thread is supposed to produce a page fault at the predefined address 0x2244:

 void second_thread_entry()
 {
   *(int *)0x2244 = 0;
 }

When executing the code, we get the desired result:

 vm fault on data at address 0x1122 with status 0x6
 ...
 vm fault on data at address 0x2244 with status 0x6

From these messages, we can see that both the main thread and the second thread are faulting at their designated fault addresses.

With two threads in place, we can test-drive the preemptive scheduling of the kernel by changing the second_thread_entry function to:

 static int volatile cnt = 0;

 void second_thread_entry()
 {
   for (;;)
     cnt++;
 }

At the end of the main function, we repeatedly print the counter value:

 for (;;)
   PDBG("cnt = %d", cnt);

When executing the code, the counter values surprisingly stays at the value 0. This is because the just-created new thread has a lower priority than the main thread. By explicitly assigning the maximum priority to the second thread, we can enable the preemptive round-robin scheduling:

 seL4_TCB_SetPriority(SECOND_THREAD_CAP, 0xff);

Now, we can see the counter value nicely increasing:

 int main(): cnt = 0
 ...
 int main(): cnt = 0
 int main(): cnt = 2908738
 ...
 int main(): cnt = 2908738
 int main(): cnt = 5876191
 ...
 int main(): cnt = 5876191
 ...

Each thread consumes its entire time slice. This way, the second thread has the chance to increment the counter circa 3 million times per time slice after which the main thread has the chance to print the counter about 50 times before being preempted again.

Sections