Maze Light-Tracer Game

Navigate the dungeons of a procedurally generated maze with real-time traced light effects. Noise textures and interactive light sources mark the way.

This small game combines a procedurally generated dungeon maze with noise textures and fully ray traced light effects. Lighting is not statically precomputed but dynamic, as all lights sources can be moved by the player for example to mark a junction of interest.

Certain parts of this project base on learnings from the previous generic CPU ray tracer implementation.

Apart from playing with ray traced lighting, the performance goal is to provide a smooth real-time experience without GPU support. For example, achieving even up to 100 frames per second or overall FHD output resolution should easily be possible on contemporary hardware.

Usage and Gameplay

The goal is to navigate the dark dungeon maze by leveraging light sources to find the final emitting black hole. Collect light sources along the way to increase the power of the own torch. Sacrifice the visible range by strategically dropping lights again at will, in order to leave beacons as breadcrumbs that mark (un)visited paths for backtracking.

The locally built binary can be directly invoked:

./maze-tracer [-b] [-f] [-s seed] [-x width] [-y height]

-s seed: Define the level scene to draw as unsigned integer. Randomly chosen if omitted. Each level is fully reproducible by this number in any case.
-x width, -y height: Output resolution in pixels. The window is padded with black bars if both values do not match up.
-f: Disable subsampling, trace the full frame for pixel-correct results. This will reduce aliasing but increases the internal resolution by a factor of four. See also the USE_UPSAMPLE option.
-b: Do not accept any movement input, draw a predefined set of scenes per level and exit. This benchmark mode should produce comparable framerate performance statistics for different settings.

Depending on the USE_WASD option, movement is controlled by the corresponding input or arrow keys. Picking up and dropping lights is done via e or space, respectively. Pressing F5 will restart the level, F3 skips to generating the next level, Esc for exit.

Installation

Building the binary is done by the provided Makefile and only requires a standard GCC C++17 toolchain with X11 library development headers (e.g., libx11-dev and libxext-dev). A default non-debug build is done simply by:

make

For development, debugging, and benchmarking, there are also targets that wrap valgrind, callgrind, gdb, and clang-tidy. Documentation via doxygen can be created by make docs.

For best performance, a profile-guided build can be triggered by make profiled. The corresponding profiles of a run in benchmarking mode must be created by make profile beforehand.

Technical Background

In a first independent step, a maze is procedurally generated from the given seed, using a variant of the recursive division algorithm. This datastructure is the basis for generating the actual level scene of “dungeons”, which consists of tracing primitives such as planes, textures, and lights. For better orientation, a level-wide Perlin noise provides structured textures.

The fundamentals of the threaded processing pipeline are also derived from more or less well-known approaches:

The main loop waits for key input, rate-limited by the targeted framerate that is adapted incrementally.
Current state is updated according to the movement vector and scene intersection for clipping.
For the new viewport, possibly visible and not pairwise occluded objects/lights are determined. This step can be relatively lengthy, but is overall worth it as all later stages vastly profit from a reduced scene to trace.
Scene update and ray tracing pre-checks use AABB (axis-aligned bounding box) volumes in order to avoid costly computations for common “miss” cases.
Ray casting from the camera through the viewport onto the scene determines the hit point, its normal, and texture uv-coordinates.
For each hit, the contribution of all non-occluded light sources is accumulated.
Determine the final color value by texture coordinate lookup and lighting.
Mapped to the 24bit RGB space, the result is written to a frame buffer and passed to the X server connection.
Feed the overall processing delay to a sliding average which is provided as feedback for framerate auto-scaling at the main loop.

Given this simplified intuition, the pipeline consists of a linear queue-based producer/consumer threading model. The queue of work items that connects the four main threads is actually a single-item sized stack that maintains the most recent frame context. Enqueue operations thus succeed in any case – possibly replacing an outdated task but producing “frame drop” events. Note that multiple frames can still be handled at once to a reasonable extent. The tracing and interpolation steps are in turn processed by their own thread pool, with one thread for each of the four frame corners.

+---------+      +-------+               +-------+               +-------+              
|Movement/|   +->|Trace A|--+         +->|Light A|--+         +->|Frame A|--+           
|Clip/    |   |  |-------|  |         |  |-------|  |         |  |-------|  |           
|Objects/ |   +->|Trace B|--+         +->|Light B|--+         +->|Frame B|--+           
|Lights   |   |  |-------|  |         |  |-------|  |         |  |-------|  |           
+---------+   +->|Trace C|--+         +->|Light C|--+         +->|Frame C|--+           
 ^       |    |  |-------|  |         |  |-------|  |         |  |-------|  |           
 |       |    +->|Trace D|--+         +->|Light D|--+         +->|Frame D|--+           
 |       |    |  +-------+  |         |  +-------+  |         |  +-------+  |           
 |       v    |             v         |             v         |             v           
 |      +-+  +---------------+  +-+  +---------------+  +-+  +---------------+  +------+
 |      |Q|->| Tracer Thread |->|Q|->| Lights Thread |->|Q|->| Render Thread |->|Draw X|
 |      +-+  +---------------+  +-+  +---------------+  +-+  +---------------+  +------+
+------+                                                             |                  
|Main: |<----------------------Pipeline duration feedback------------+          +------+
|Delay/|                                                                        |Input |
|Input |<------------------------------Key presses------------------------------|Thread|
+------+                                                                        +------+
                                                                                        
|--Scene-----|------Objects------|---------Light---------|--Texture--|--Sample--|-X11--|

In Xlib, the default is to “auto-repeat” keystrokes, i.e., to instantly send press and release events. This is hardly suitable for a smooth but precise movement – also in any non-axis-aligned direction. Unfortunately, the corresponding XAutoRepeatOff keyboard setting is not per-window but affects the whole X display. In order to workaround this, a special thread continuously watches the XQueryKeymap state and accumulates keypress durations.

Configuration Details

Apart from the commandline arguments for resolution, the following settings in config.hpp might be of main interest:

raster_per_second: Determines the movement speed on the basis of keypress duration.
min_fps, max_fps: Interval of the target framerate. Directly defines the delay between updates, which gets incrementally adjusted according to the sliding average of current pipeline performance.
USE_WASD: Use WASD+E instead of arrow keys for movement.
USE_UPSAMPLE: Enable bilinear interpolation when subsampling, nearest neighbour otherwise.
USE_TEXTURE_SATURATION: Saturation of texture colors, 1.0 for full intensity but varying lighting.
USE_LIGHT_PICKUP: Overall “game mode”: Explore the level with highest visibility, enable pickup and dropping of lights, and grow/shrink player light when doing so.
NO_CLIP

Tweaking Appearance

camera_z, viewport_z: Camera and viewport raster z-distance. Determines the DOV and FOV, respectively.
USE_LIGHTS: Generate light sources in levels.
USE_LIGHTBOX: Try to optimize light volume boxes by cutting down scene overlaps.
USE_TEXTURE: Use uniform, colored, or Perlin noise textures, respectively.
USE_PERLIN_INTERPOLATE: Enable polynomial noise interpolation, avoiding blocky artifacts.
USE_TEXTURE_GRADIENT: Blend textures between levels (“doors”).

Float SIMD Instructions

SIMD vector instruction optimizations on the main vertex datastructure are configured by USE_SIMD. When enabled, explicit GCC x86 extensions are used, in contrast to the immintrin.h intrinsics wrappers, which are not constexpr-compatible.

0: No manual optimizations in this regard, sticking with a standard 3-dimensional floating point structure.
1: Use aligned 4-component vector for SIMD and generic operators on it, enabling automatic “compiler-native” optimizations.
2: Allow the explicit use of __builtin_ia32_* extensions.
3, 4, 5: Use explicit GCC builtins for more complex implementations, 3D comparisons, and also for 2D comparison operators, respectively.

Not all target architectures might profit equally from an increased level of SIMD tweaks, some might even perform noticeably worse – YMMV. The -b benchmark mode can help with finding the local sweet spot.

Output control

USE_LOG_STATS: Log verboseness per-frame timing statistics, 0 – 3. Useful for determining which part of the pipeline is limiting.
SQUEEZE_FACTOR: When less than 1, overlap frames by premature enqueuing, thereby increasing the chance of more than one frame being processed at once. Increases the pipeline saturation when >> 4 threads are supported.
USE_X: Enable X image output and key input, should only be disabled for “headless” benchmarking.
USE_PPM: If X is disabled, write frames as PPM images for debugging instead.
USE_X_SHM: Enable X11 SHM extension for the frame raster image, avoiding the need for copy to the X server connection. Can have a noticeable performance impact. When disabled, the -lXext linker flag can be omitted.