Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

stdgba

stdgba is a C++23 library for Game Boy Advance development.

It keeps the hardware-first model of classic GBA development, but exposes it through strongly-typed, constexpr-friendly APIs instead of macro-heavy C interfaces.

What stdgba is

  • A zero-heap-friendly library for real GBA hardware constraints.
  • A typed register/peripheral API built around inline constexpr objects.
  • A consteval-first toolkit for things that benefit from compile-time validation.
  • A practical replacement for low-level C-era patterns when writing modern C++.

stdgba is not a game engine

You still decide your main loop, memory layout, rendering strategy, and frame budget. stdgba focuses on safer and more expressive building blocks.

Core design goals

  1. Zero-cost abstractions - generated code should match hand-written low-level intent.
  2. Compile-time validation - invalid asset/pattern/config inputs should fail at compile time when possible.
  3. Typed hardware access - peripheral use should be explicit, discoverable, and hard to misuse.
  4. Practical migration path - where meaningful, docs map familiar tonclib-era workflows to stdgba equivalents.

What you get

  • registral<T> register wrappers with designated initialisers
  • fixed-point and angle types with literal support
  • BIOS wrappers for sync, math, memory, compression, affine setup
  • compile-time image embedding and conversion (gba/embed)
  • pattern-based PSG music composition (gba/music)
  • static ECS (gba/ecs) with fixed capacity and deterministic iteration

Quick taste

#include <gba/peripherals>
#include <gba/keyinput>
#include <gba/bios>

int main() {
    // Initialise interrupt handler
    gba::irq_handler = {};

    // Set video mode 0, enable BG0
    gba::reg_dispcnt = { .video_mode = 0, .enable_bg0 = true };

    // Enable VBlank interrupt
    gba::reg_dispstat = { .enable_irq_vblank = true };
    gba::reg_ie = { .vblank = true };
    gba::reg_ime = true;

    gba::keypad keys;
    for (;;) {
        keys = gba::reg_keyinput;
        if (keys.pressed(gba::key_a)) {
            // ...
        }
        gba::VBlankIntrWait();
    }
}

Book roadmap

Who this is for

  • GBA developers who want modern C++ without losing hardware control
  • C++ programmers learning GBA development
  • Existing tonclib/libtonc users migrating to typed APIs

Hello VBlank

The simplest GBA program that actually does something is a VBlank loop. This is the heartbeat of every GBA game - wait for the display to finish drawing, then update your game state.

The code

#include <gba/interrupt>
#include <gba/peripherals>

int main() {
    // Step 1: Initialise the interrupt handler
    gba::irq_handler = {};

    // Step 2: Tell the display hardware to fire an interrupt each VBlank
    gba::reg_dispstat = { .enable_irq_vblank = true };

    // Step 3: Tell the CPU to accept VBlank interrupts
    gba::reg_ie = { .vblank = true };
    gba::reg_ime = true;

    // Step 4: Main loop
    for (;;) {
        gba::VBlankIntrWait();
        // Your game logic goes here
    }
}

What is happening?

The GBA display draws 160 lines of pixels (the “active” period), then enters a 68-line “vertical blank” period where no pixels are drawn. The VBlank is your window to safely update video memory without visual tearing.

gba::VBlankIntrWait() puts the CPU to sleep (saving battery) until the VBlank interrupt fires. This is the BIOS SWI 0x05.

Step by step

  1. gba::irq_handler = {} installs the default interrupt dispatcher. Without this, BIOS interrupt-wait functions will hang forever.

  2. gba::reg_dispstat = { .enable_irq_vblank = true } writes to the DISPSTAT register using a designated initialiser. Only the .enable_irq_vblank bit is set; all other fields default to zero.

  3. gba::reg_ie = { .vblank = true } enables the VBlank interrupt in the interrupt enable register. gba::reg_ime = true is the master interrupt switch.

  4. gba::VBlankIntrWait() is a BIOS call that halts the CPU until a VBlank interrupt occurs.

tonclib comparison

The equivalent tonclib code:

#include <tonc.h>

int main() {
    irq_init(NULL);
    irq_add(II_VBLANK, NULL);

    for (;;) {
        VBlankIntrWait();
    }
}

The key difference is that stdgba uses designated initialisers ({ .vblank = true }) instead of bitfield macros (II_VBLANK). Typos in field names are compile errors; typos in macro names might silently compile to wrong values.

Putting something on screen

The VBlank loop itself produces a blank screen. To prove the program is running, here is a minimal extension that draws a white rectangle in Mode 3:

#include <gba/bios>
#include <gba/interrupt>
#include <gba/video>

int main() {
    gba::irq_handler = {};
    gba::reg_dispstat = {.enable_irq_vblank = true};
    gba::reg_ie = {.vblank = true};
    gba::reg_ime = true;

    gba::reg_dispcnt = {.video_mode = 3, .enable_bg2 = true};

    // Draw a white 40x20 rectangle centered on the 240x160 screen
    for (int y = 70; y < 90; ++y) {
        for (int x = 100; x < 140; ++x) {
            gba::mem_vram[x + y * 240] = 0x7FFF;
        }
    }

    while (true) {
        gba::VBlankIntrWait();
    }
}

Hello VBlank screenshot

Next steps

Hello Graphics and Keypad

Now that you have a stable VBlank loop, the next step is drawing a visible shape and moving it.

This page pairs two tiny demos that share the same consteval circle sprite:

  • demo_hello_graphics.cpp: draw the sprite in the centre.
  • demo_hello_keypad.cpp: move the same sprite with the D-pad.

Part 1: draw a shape

#include <gba/bios>
#include <gba/color>
#include <gba/interrupt>
#include <gba/shapes>
#include <gba/video>

#include <cstring>

using namespace gba::shapes;
using gba::operator""_clr;

namespace {

    constexpr auto spr_ball = sprite_16x16(circle(8.0, 8.0, 7.0));

} // namespace

int main() {
    gba::irq_handler = {};
    gba::reg_dispstat = {.enable_irq_vblank = true};
    gba::reg_ie = {.vblank = true};
    gba::reg_ime = true;

    gba::reg_dispcnt = {
        .video_mode = 0,
        .linear_obj_tilemap = true,
        .enable_obj = true,
    };

    gba::pal_bg_mem[0] = "#102040"_clr;
    gba::pal_obj_bank[0][1] = "white"_clr;

    auto* objDst = gba::memory_map(gba::mem_vram_obj);
    std::memcpy(objDst, spr_ball.data(), spr_ball.size());
    const auto tileIdx = gba::tile_index(objDst);

    auto obj = spr_ball.obj(tileIdx);
    obj.x = (240 - 16) / 2;
    obj.y = (160 - 16) / 2;
    obj.palette_index = 0;
    gba::obj_mem[0] = obj;

    for (int i = 1; i < 128; ++i) {
        gba::obj_mem[i] = {.disable = true};
    }

    while (true) {
        gba::VBlankIntrWait();
    }
}

What is happening?

  1. The setup is the same as Hello VBlank: initialise interrupts and wait on gba::VBlankIntrWait() in the main loop.
  2. sprite_16x16(circle(...)) creates the sprite tile data at compile time (consteval).
  3. We copy that tile data into OBJ VRAM, then place it with obj_mem[0].
  4. The display runs in Mode 0 with objects enabled (.enable_obj = true).
  5. Colours use _clr literals for readability ("#102040"_clr, "white"_clr).

Hello Graphics screenshot

Part 2: move it with keypad

#include <gba/color>
#include <gba/interrupt>
#include <gba/keyinput>
#include <gba/shapes>
#include <gba/video>

#include <algorithm>
#include <cstring>

using namespace gba::shapes;
using gba::operator""_clr;

namespace {

    constexpr int screen_width = 240;
    constexpr int screen_height = 160;
    constexpr int sprite_size = 16;

    constexpr auto spr_ball = sprite_16x16(circle(8.0, 8.0, 7.0));

    int clamp(int value, int lo, int hi) {
        if (value < lo) {
            return lo;
        }
        if (value > hi) {
            return hi;
        }
        return value;
    }

} // namespace

int main() {
    gba::irq_handler = {};
    gba::reg_dispstat = {.enable_irq_vblank = true};
    gba::reg_ie = {.vblank = true};
    gba::reg_ime = true;

    gba::reg_dispcnt = {
        .video_mode = 0,
        .linear_obj_tilemap = true,
        .enable_obj = true,
    };

    gba::pal_bg_mem[0] = "#102040"_clr;
    gba::pal_obj_bank[0][1] = "white"_clr;

    auto* objDst = gba::memory_map(gba::mem_vram_obj);
    std::memcpy(objDst, spr_ball.data(), spr_ball.size());
    const auto tileIdx = gba::tile_index(objDst);

    auto obj = spr_ball.obj(tileIdx);
    obj.palette_index = 0;

    int spriteX = (screen_width - sprite_size) / 2;
    int spriteY = (screen_height - sprite_size) / 2;
    obj.x = static_cast<unsigned short>(spriteX);
    obj.y = static_cast<unsigned short>(spriteY);
    gba::obj_mem[0] = obj;

    gba::object disabled{.disable = true};
    std::fill(std::begin(gba::obj_mem) + 1, std::end(gba::obj_mem), disabled);

    gba::keypad keys;

    while (true) {
        gba::VBlankIntrWait();
        keys = gba::reg_keyinput;

        spriteX += keys.xaxis();
        spriteY += keys.i_yaxis();

        spriteX = clamp(spriteX, 0, screen_width - sprite_size);
        spriteY = clamp(spriteY, 0, screen_height - sprite_size);

        obj.x = static_cast<unsigned short>(spriteX);
        obj.y = static_cast<unsigned short>(spriteY);
        gba::obj_mem[0] = obj;
    }
}
  • keys.xaxis() handles left/right.
  • keys.i_yaxis() handles up/down in screen-space coordinates.
  • Position is clamped to keep the sprite inside the 240x160 screen.

Next step

Continue to Hello Audio to trigger a PSG jingle on button press.

Hello Audio

Now that you can draw and move a sprite, the next step is sound.

This demo plays a short PSG jingle when you press A.

The code

#include <gba/bios>
#include <gba/interrupt>
#include <gba/keyinput>
#include <gba/music>
#include <gba/peripherals>

using namespace gba::music;
using namespace gba::music::literals;

namespace {

    // One-shot PSG jingle (SQ1). Press A to restart playback.
    // .press() applies staccato: each note plays for half duration, rest for half.
    // Compiled at 2_cps (2 cycles per second) for a snappy tempo.
    static constexpr auto jingle = compile<2_cps>(note("c5 e5 g5 c6").channel(channel::sq1).press());

} // namespace

int main() {
    gba::irq_handler = {};
    gba::reg_dispstat = {.enable_irq_vblank = true};
    gba::reg_ie = {.vblank = true};
    gba::reg_ime = true;

    // Basic PSG routing for SQ1 on both speakers.
    gba::reg_soundcnt_x = {.master_enable = true};
    gba::reg_soundcnt_l = {
        .volume_right = 7,
        .volume_left = 7,
        .enable_1_right = true,
        .enable_1_left = true,
    };
    gba::reg_soundcnt_h = {.psg_volume = 2};

    gba::keypad keys;
    auto player = music_player<jingle>{};

    while (true) {
        gba::VBlankIntrWait();
        keys = gba::reg_keyinput;

        if (keys.pressed(gba::key_a)) {
            player = {};
        }

        player();
    }
}

What is happening?

  1. We set up VBlank + interrupts as in earlier chapters.
  2. We enable PSG output with reg_soundcnt_x, reg_soundcnt_l, and reg_soundcnt_h.
  3. note("c5 e5 g5 c6").channel(channel::sq1).press() builds a staccato pattern (each note plays half duration, rests half), ensuring the jingle ends in silence naturally.
  4. compile<2_cps>(...) compiles at 2 cycles per second (4x faster than the default 0.5 cps), making the jingle snappy and brief.
  5. music_player<jingle> advances once per frame, dispatching note events.
  6. Pressing A resets the player with player = {}, restarting the jingle from the beginning.

Next step

Move on to Registers & Peripherals, then dive deeper into Music Composition.

Registers & Peripherals

Every piece of GBA hardware - the display, sound, timers, DMA, buttons - is controlled through memory-mapped registers. In tonclib, these are #define macros to raw addresses. In stdgba, they are inline constexpr objects with real C++ types.

The registral<T> wrapper

registral<T> is a zero-cost wrapper around a hardware address. It provides type-safe reads and writes through operator overloads:

#include <gba/peripherals>

// Write a struct with designated initialisers
gba::reg_dispcnt = { .video_mode = 3, .enable_bg2 = true };

// Read the current value
auto dispcnt = gba::reg_dispcnt.value();

// Write a raw integer directly (for non-integral register types)
gba::reg_dispcnt = 0x0403u;

How it compiles

registral<T> stores the hardware address as a data member. Every operation compiles to a single ldr/str instruction - exactly what you would write in assembly.

// This:
gba::reg_dispcnt = { .video_mode = 3, .enable_bg2 = true };

// Compiles to the same code as:
*(volatile uint16_t*) 0x4000000 = 0x0403u;

Writing raw integers

When a register stores a non-integral type (a struct with bitfields), you can still write a raw integer value when needed:

// Normal: designated initialiser
gba::reg_dispcnt = { .video_mode = 3, .enable_bg2 = true };

// Raw: write an integer directly
gba::reg_dispcnt = 0x0403u; // Same effect, but less readable

This allows some compatibility with tonclib and similar C libraries that treat registers as raw integers.

The memory_map() helper

When you need a raw pointer (for DMA, memcpy, pointer arithmetic, or interop), use gba::memory_map(...) instead of hard-coded addresses.

#include <gba/peripherals>
#include <gba/video>

// Register pointer
auto* dispcnt = gba::memory_map(gba::reg_dispcnt);

// VRAM pointer (BG tile/map region)
auto* vram_bg = gba::memory_map(gba::mem_vram_bg);

This keeps code tied to named hardware mappings while still compiling to direct memory access.

Read-only and write-only registers

The GBA has registers that are read-only, write-only, or read-write. stdgba encodes this in the type:

QualifierBehaviour
registral<T>Read-write
registral<const T>Read-only
registral<volatile T>Write-only

For example, gba::reg_keyinput is read-only (you can not write to the keypad), while gba::reg_bg_hofs is write-only (the hardware does not let you read back scroll values).

Array registers

Some registers are arrays (e.g., timer control, DMA channels, palette RAM):

// Timer 0 control
gba::reg_tmcnt_h[0] = { .prescaler = 3, .enable = true };

// BG0 horizontal scroll
gba::reg_bg_hofs[0] = 120;

// Palette memory (256 BG colours + 256 OBJ colours)
gba::pal_bg_mem[0] = { .red = 31 };   // Red
gba::pal_obj_mem[1] = { .blue = 31 }; // Blue

These compile to indexed memory stores with no overhead.

Using std algorithms with array registers

Array registers support range-based iteration and are compatible with <algorithm>:

#include <algorithm>
#include <gba/peripherals>

// Initialise all 4 timers to zero
std::fill(gba::reg_tmcnt_l.begin(), gba::reg_tmcnt_l.end(), 0);

// Copy a preset palette from EWRAM into OBJ palette
std::copy(preset_palette.begin(), preset_palette.end(), gba::pal_obj_mem.begin());

// Check if any timer is running
bool any_running = std::any_of(gba::reg_tmcnt_h.begin(), gba::reg_tmcnt_h.end(),
    [] (auto tmcnt) { return tmcnt.enabled; });

// Initialise all background control registers at once
std::fill(gba::reg_bgcnt.begin(), gba::reg_bgcnt.end(),
          gba::background_control{.priority = 0, .screenblock = 31});

The array wrapper provides standard range interface: .begin(), .end(), .size(), and forward iterators compatible with all <algorithm> calls.

registral_cast

When you need to access the same memory region through a different type - for example, interpreting palette RAM as typed color entries rather than raw short values - use gba::registral_cast.

#include <gba/color>

// mem_pal_bg is registral<short[256]> (raw shorts)
// pal_bg_mem is the same address, reinterpreted as color[256]
inline constexpr auto pal_bg_mem = gba::registral_cast<gba::color[256]>(gba::mem_pal_bg);

The cast preserves the hardware address and stride. It works for all combinations:

FromToExample
Non-arrayNon-arrayregistral_cast<color>(raw_short_reg)
Non-arrayArrayregistral_cast<color[4]>(raw_reg)
ArrayArrayregistral_cast<color[256]>(short_array_reg)
ArrayNon-arrayregistral_cast<color>(color_array_reg)

Palette example

using namespace gba::literals;

// Write palette entries as typed colors
gba::pal_bg_mem[0] = "#000000"_clr;  // transparent/backdrop
gba::pal_bg_mem[1] = "red"_clr;

// 4bpp: access as 16 banks of 16 colours each
gba::pal_bg_bank[0][0] = "black"_clr;
gba::pal_bg_bank[1][3] = "cornflowerblue"_clr;

VRAM example

#include <gba/video>

// VRAM as typed tile arrays
auto tile_ptr = gba::memory_map(gba::mem_tile_4bpp);
// Equivalent to registral_cast internally:
// registral<tile4bpp[4][512]> at 0x6000000

registral_cast is a zero-cost cast: it produces a new registral<To> at exactly the same base address, with no runtime overhead.

Designated initialisers

The biggest ergonomic win is designated initialisers. Instead of remembering which bit is which:

// tonclib: which bits are these?
REG_DISPCNT = DCNT_MODE0 | DCNT_BG0 | DCNT_BG1 | DCNT_OBJ | DCNT_OBJ_1D;

You write self-documenting code:

// stdgba: every field is named
gba::reg_dispcnt = {
    .video_mode = 0,
    .linear_obj_tilemap = true,
    .enable_bg0 = true,
    .enable_bg1 = true,
    .enable_obj = true,
};

Any field you omit will use sensible default values.

Fixed-Point Math

The GBA ARM7TDMI has no floating-point unit. Floating-point is emulated in software, so fixed-point arithmetic is the usual choice for gameplay math, camera transforms, and register-facing values.

The fixed<> type

#include <gba/fixed_point>
using namespace gba::literals;

// 8.8 format (good for small ranges, fine sub-pixel steps)
gba::fixed<short> position = 3.5_fx;

// 16.16 format (high precision for world-space values)
gba::fixed<int> velocity = 0.125_fx;

// 24.8 format (common GBA-friendly choice, tonclib-style)
gba::fixed<int, 8> angle = 1.5_fx;

fixed<Rep, FracBits, IntermediateRep> stores a scaled integer in Rep.

  • Rep controls storage width and sign.
  • FracBits controls precision (step = 1 / 2^FracBits).
  • IntermediateRep controls multiply/divide intermediate width.

Precision and range

For a signed representation:

  • precision step: 1 / (1 << FracBits)
  • minimum: -2^(integer_bits)
  • maximum: 2^(integer_bits) - step

where integer_bits = numeric_limits<Rep>::digits - FracBits (digits excludes the sign bit).

For unsigned representations, minimum is 0.

Common formats

TypeFormatApprox rangePrecision step
fixed<short>8.8-128 to 127.996093751/256
fixed<int>16.16-32768 to 32767.99998474121/65536
fixed<int, 8>24.8-8388608 to 8388607.996093751/256
fixed<short, 4>12.4-2048 to 2047.93751/16

Introspecting format traits

using fx = gba::fixed<int, 8>;
using traits = gba::fixed_point_traits<fx>;

static_assert(traits::frac_bits == 8);
static_assert(std::is_same_v<traits::rep, int>);

The _fx literal

The _fx suffix creates fixed-point literals at compile time:

using namespace gba::literals;

gba::fixed<short> a = 3.14_fx;
gba::fixed<short> b = 2_fx;

auto c = a + b;
auto d = a * b;

_fx is format-agnostic until assignment, then converts to the destination fixed<> type.

Arithmetic and overflow behaviour

Standard operators are supported:

gba::fixed<short> a = 10.5_fx;
gba::fixed<short> b = 3.25_fx;

auto sum = a + b;
auto diff = a - b;
auto prod = a * b;
auto quot = a / b;

auto neg = -a;
bool gt = a > b;

Multiplication and division use IntermediateRep internally.

  • fixed<short> uses a 32-bit intermediate by default.
  • fixed<int> defaults to int intermediate (faster on ARM, lower headroom).

If you need safer large products/quotients, use precise<>, which switches to a 64-bit intermediate:

using fast = gba::fixed<int, 16>;
using safe = gba::precise<int, 16>;

fast a = 100.0_fx;
fast b = 400.0_fx;
auto fast_prod = a * b;   // may overflow in edge cases

safe x = 100.0_fx;
safe y = 400.0_fx;
auto safe_prod = x * y;   // wider intermediate

Mixed-type arithmetic and promotion API

Operations require compatible types. For different fixed<> formats, use the promotion wrappers in <gba/fixed_point> to make intent explicit.

Why wrappers exist

using fix8 = gba::fixed<int, 8>;
using fix4 = gba::fixed<int, 4>;

fix8 a = 3.5_fx;
fix4 b = 1.25_fx;

// auto bad = a + b; // incompatible formats
auto ok  = gba::as_lhs(a) + b;

Promotion wrappers

WrapperResult steeringTypical use
as_lhs(x)convert other operand to wrapped typekeep left-hand format
as_rhs(x)convert wrapped operand to other typematch right-hand format
as_widening(x)keep higher fractional precisionavoid precision loss
as_narrowing(x)match the narrower sideintentional truncation
as_average_frac(x)average fractional bitsbalanced precision
as_average_int(x)average integer-range bitsbalanced range
as_next_container(x)promote storage to next wider containerheadroom for mixed small types
as_word_storage(x)use int/unsigned int storageARM-friendly word math
as_signed(x)force signed storage typesign-aware operations
as_unsigned(x)force unsigned storage typenon-negative domains only
with_rounding(wrapper)rounding meta-wrapper for conversionsexplicit rounding policy path

Practical examples

using fix8 = gba::fixed<int, 8>;
using fix4 = gba::fixed<int, 4>;

fix8 hi = 3.53125_fx;
fix4 lo = 1.25_fx;

auto keep_hi = gba::as_lhs(hi) + lo;        // fix8 result
auto keep_lo = gba::as_rhs(hi) + lo;        // fix4 result
auto wide    = gba::as_widening(lo) + hi;   // fix8 result
auto narrow  = gba::as_narrowing(hi) + lo;  // fix4 result (truncating conversion)

Container promotion example:

using small = gba::fixed<char, 4>;
using med   = gba::fixed<short, 4>;

small a = 3.5_fx;
med   b = 2.0_fx;

auto r1 = gba::as_next_container(a) + b;
auto r2 = gba::as_word_storage(a) + b;

Converting to and from integers

gba::fixed<short> pos = 3.75_fx;

int whole = static_cast<int>(pos);   // truncates toward zero
short raw = gba::bit_cast(pos);      // raw scaled storage bits

bit_cast is useful for register writes that expect fixed-point bit patterns.

tonclib comparison

stdgbatonclib
fixed<int, 8> x = 3.5_fx;FIXED x = float2fx(3.5f);
auto y = x * z;FIXED y = fxmul(x, z);
auto q = x / z;FIXED q = fxdiv(x, z);
int i = static_cast<int>(x);int i = fx2int(x);

stdgba uses operators plus explicit promotion wrappers, so expressions stay readable while still making precision/range trade-offs visible in code.

Angles

stdgba provides type-safe angle types optimised for GBA hardware. Angles use binary representation where the full range of an integer maps to one full revolution (360 degrees).

Angle types

angle - intermediate type

The angle type is a 32-bit unsigned integer where the full 0 to 2^32 range represents 0 to 360 degrees. Natural integer overflow handles wraparound:

#include <gba/angle>
using namespace gba::literals;

gba::angle heading = 90_deg;
heading += 45_deg;    // 135 degrees
heading = heading * 2; // 270 degrees
heading += 180_deg;   // 90 degrees (wraps around)

packed_angle<Bits> - storage type

For memory-efficient storage, use packed_angle with a specific bit width. These convert implicitly to angle for arithmetic:

gba::packed_angle<16> stored_heading;  // 2 bytes
gba::packed_angle<8> coarse_dir;       // 1 byte

// Promote to angle for arithmetic
gba::angle heading = stored_heading;
heading += 45_deg;

// Store back (truncates to precision)
stored_heading = heading;

Common aliases:

  • packed_angle8 - 8-bit (256 steps, ~1.4 degree resolution)
  • packed_angle16 - 16-bit (65536 steps, ~0.005 degree resolution)

Literals

The gba::literals namespace provides degree and radian literals:

using namespace gba::literals;

gba::angle a = 90_deg;
gba::angle b = 1.5708_rad;  // ~90 degrees

BIOS integration

The GBA BIOS angle functions use 16-bit angles where 0x10000 = 360 degrees. Use packed_angle16 for BIOS results:

gba::packed_angle16 dir = gba::ArcTan2(dx, dy);

// Or keep full precision for further arithmetic
gba::angle precise_dir = gba::ArcTan2(dx, dy);

bit_cast - raw access

gba::bit_cast extracts the underlying integer from an angle without any computation. The full 0..2^32 range represents one complete revolution.

using namespace gba::literals;

gba::angle a = 90_deg;
unsigned int raw = gba::bit_cast(a);  // 0x40000000

gba::packed_angle16 pa = 90_deg;
uint16_t raw16 = gba::bit_cast(pa);  // 0x4000

This is useful when interacting with hardware registers or lookup tables that expect raw integer angles.

Utility functions

lut_index<TableBits> - lookup table index

Converts an angle to an index into a power-of-two-sized lookup table. The full 0..360 degree range maps uniformly onto [0, 2^TableBits) with no gaps.

using namespace gba::literals;

// 256-entry sine table (8-bit indexing)
gba::angle theta = 45_deg;
auto idx = gba::lut_index<8>(theta);  // 0..255

// 512-entry table (9-bit indexing)
auto idx9 = gba::lut_index<9>(theta);  // 0..511

as_signed - signed range view

Reinterprets the angle as a signed integer, treating the range as [-180, +180) degrees rather than [0, 360). Useful for comparisons and threshold tests.

using namespace gba::literals;

gba::angle facing_left = 270_deg;
int s = gba::as_signed(facing_left);  // negative (left of centre)

gba::angle facing_right = 90_deg;
int sr = gba::as_signed(facing_right); // positive (right of centre)

ccw_distance and cw_distance - arc distances

Measure the angular distance between two angles travelling in a specific direction. Both return unsigned values that handle wraparound correctly.

using namespace gba::literals;

// How far is it from 90 to 270 going counter-clockwise?
auto ccw = gba::ccw_distance(90_deg, 270_deg);  // 180 degrees

// How far is it from 270 to 90 going clockwise?
auto cw = gba::cw_distance(270_deg, 90_deg);    // 180 degrees

// Going the short way vs the long way
auto short_way = gba::ccw_distance(0_deg, 90_deg);   // 90 degrees
auto long_way  = gba::cw_distance(0_deg, 90_deg);    // 270 degrees

is_ccw_between - arc containment test

Tests whether an angle lies within a counter-clockwise arc from start to end. Handles wraparound automatically.

using namespace gba::literals;

// Is 90 degrees within the CCW arc from 0 to 180?
bool yes = gba::is_ccw_between(0_deg, 180_deg, 90_deg);   // true
bool no  = gba::is_ccw_between(0_deg, 180_deg, 270_deg);  // false

// Wraparound arc: from 315 to 45 degrees (passing through 0)
bool in_arc = gba::is_ccw_between(315_deg, 45_deg, 0_deg);  // true

tonclib comparison

stdgbatonclib
gba::angleu32 (raw integer)
gba::packed_angle<16>u16 (raw integer)
90_deg0x4000 (magic constant)
gba::ArcTan2(x, y)ArcTan2(x, y)

stdgba wraps raw integers in type-safe wrappers. Overflow arithmetic is identical.

Interrupts

The GBA uses interrupts to notify the CPU about hardware events: VBlank, HBlank, timer overflow, DMA completion, serial communication, and keypad input.

For the raw register bitfields, see Interrupt Peripheral Reference.

Setting up interrupts

Before any BIOS wait function will work, you must install an IRQ handler. The normal stdgba path is the high-level dispatcher exposed as gba::irq_handler:

#include <gba/bios>
#include <gba/interrupt>
#include <gba/peripherals>

// Install the default dispatcher / empty stdgba IRQ stub
gba::irq_handler = {};

// Enable specific interrupt sources
gba::reg_dispstat = { .enable_irq_vblank = true };
gba::reg_ie = { .vblank = true };
gba::reg_ime = true;

// Now VBlankIntrWait() works
gba::VBlankIntrWait();

The three switches

Interrupts require three things to be enabled:

  1. Source - the hardware peripheral must be configured to fire an interrupt (for example reg_dispstat.enable_irq_vblank)
  2. reg_ie - the Interrupt Enable register must have the corresponding bit set
  3. reg_ime - the Interrupt Master Enable must be true

All three must be set for the interrupt to reach the handler.

High-level custom handlers

You can provide a callable (lambda, function pointer, etc.) to gba::irq_handler:

volatile int vblank_count = 0;

gba::irq_handler = [](gba::irq irq) {
    if (irq.vblank) {
        ++vblank_count;
    }
};

The handler receives a gba::irq bitfield with named boolean fields for each interrupt source. stdgba’s internal IRQ wrapper acknowledges REG_IF and the BIOS IRQ flag for you before calling the handler, so BIOS wait functions continue to work.

Multiple interrupt sources

Because the handler receives the full gba::irq bitfield, a single callable can dispatch to different logic based on which flags are set:

volatile int vblank_count = 0;
volatile int timer2_count = 0;

gba::irq_handler = [](gba::irq irq) {
    if (irq.vblank) ++vblank_count;
    if (irq.timer2) ++timer2_count;
};

gba::reg_dispstat = { .enable_irq_vblank = true };
gba::reg_ie       = { .vblank = true, .timer2 = true };
gba::reg_ime      = true;

Querying the current handler

// bool conversion -- true when a handler is installed
if (gba::irq_handler) { /* handler is set */ }

// has_value() is equivalent
if (gba::irq_handler.has_value()) { /* handler is set */ }

// Retrieve a const reference to the stored callable
const gba::handler<gba::irq>& h = gba::irq_handler.value();

Swapping handlers

swap exchanges the stored callable with a local gba::handler<gba::irq>, useful for temporarily replacing a handler and then restoring it:

gba::handler<gba::irq> my_handler = [](gba::irq irq) {
    if (irq.timer0) { /* ... */ }
};

// Swap in; old handler is now in my_handler
gba::irq_handler.swap(my_handler);

// ... do work ...

// Restore the original
gba::irq_handler.swap(my_handler);

Uninstalling the dispatcher

To uninstall the stdgba user handler and restore the built-in empty acknowledgement stub, use either of these:

gba::irq_handler = gba::nullisr;
// or
gba::irq_handler.reset();
// or
gba::irq_handler = {};

This removes the current callable, but still leaves a valid low-level IRQ stub installed so BIOS wait functions remain usable.

What a raw handler must do itself

If you install a low-level handler directly, you are responsible for the work normally done by stdgba’s internal wrapper:

  • acknowledge REG_IF
  • acknowledge the BIOS IRQ flag (0x03FFFFF8)
  • preserve the registers and CPU state your handler clobbers
  • restore any IRQ masking state you change
  • keep BIOS wait functions (VBlankIntrWait(), IntrWait()) working correctly

If you skip the acknowledgements, the interrupt may immediately retrigger or BIOS wait functions may stop working.

Uninstalling a low-level custom handler

If you want to remove a raw handler and go back to stdgba’s safe empty stub, use:

gba::irq_handler.reset();

If instead you want to return to the normal high-level dispatcher path, assign a callable again:

gba::irq_handler = [](gba::irq irq) {
    if (irq.vblank) {
        // ...
    }
};

Important note about irq_handler state queries

gba::irq_handler.has_value() reports whether the low-level vector currently points at something other than stdgba’s empty handler. That means it will also report true for a raw handler installed directly.

However, gba::irq_handler.value() only returns your callable when the vector points at stdgba’s own dispatcher wrapper. If you install a raw handler directly, value() behaves as if no user callable is installed.

Available interrupt sources

FieldSource
.vblankVertical blank
.hblankHorizontal blank
.vcounterV-counter match
.timer0Timer 0 overflow
.timer1Timer 1 overflow
.timer2Timer 2 overflow
.timer3Timer 3 overflow
.serialSerial communication
.dma0-.dma3DMA channel completion
.keypadKeypad interrupt
.gamepakGame Pak interrupt

tonclib comparison

stdgbatonclib
gba::irq_handler = {};irq_init(NULL);
gba::irq_handler = my_fn;irq_set(II_VBLANK, my_fn);
gba::irq_handler = gba::nullisr;(no direct equivalent)
gba::irq_handler.reset();(no direct equivalent)
gba::registral<void(*)()>{0x3007FFC} = my_raw_irq;direct IRQ vector write
gba::reg_ie = { .vblank = true };irq_enable(II_VBLANK);

Timers

The GBA has four hardware timers (0-3). Each is a 16-bit counter that increments at a configurable rate and can trigger an interrupt on overflow. Timers can cascade - timer N+1 increments when timer N overflows - enabling periods far longer than a single 16-bit counter allows.

Compile-time timer configuration

stdgba configures timers at compile time using std::chrono durations. The compiler selects the best prescaler and cascade chain automatically:

#include <gba/timer>
#include <gba/peripherals>
#include <algorithm>

using namespace std::chrono_literals;

// A 1-second timer with overflow IRQ
constexpr auto timer_1s = gba::compile_timer(1s, true);

// Write the cascade chain to hardware starting at timer 0
std::copy(timer_1s.begin(), timer_1s.end(), gba::reg_tmcnt.begin());

compile_timer returns a std::array of timer register values. A simple duration might need only one timer; a long duration might cascade two or three. The array size is determined at compile time.

You can also start timers at a specific index:

// Use timers 2 and 3 for a long-duration timer
constexpr auto timer_10s = gba::compile_timer(10s, false);  // No IRQ
std::copy(timer_10s.begin(), timer_10s.end(), gba::reg_tmcnt.begin() + 2);

And disable timers by clearing their control registers:

// Disable timer 0
gba::reg_tmcnt_h[0] = {};

Supported durations

Any std::chrono::duration works:

#include <gba/timer>
#include <gba/peripherals>
#include <algorithm>

using namespace std::chrono_literals;

constexpr auto fast = gba::compile_timer(16ms);
constexpr auto slow = gba::compile_timer(30s, true);
constexpr auto precise = gba::compile_timer(100us);

// All three can be loaded without conflicts (each uses different timer indices)
std::copy(fast.begin(), fast.end(), gba::reg_tmcnt.begin() + 0);    // Timers 0+
std::copy(slow.begin(), slow.end(), gba::reg_tmcnt.begin() + 1);    // Timers 1+
std::copy(precise.begin(), precise.end(), gba::reg_tmcnt.begin() + 2);  // Timers 2+

If the duration cannot be represented exactly, compile_timer picks the closest possible configuration. Use compile_timer_exact if you need an exact match (compile error if impossible).

Raw timer registers

For manual control, write directly to the timer registers:

#include <gba/peripherals>

// Timer 0: 1024-cycle prescaler, enable interrupt
gba::reg_tmcnt_l[0] = 0;                                      // Reload value (auto-reload on overflow)
gba::reg_tmcnt_h[0] = {
    .cycles = gba::cycles_1024,
    .overflow_irq = true,
    .enabled = true
};

// Timer 1: cascade from timer 0 (counts overflows)
gba::reg_tmcnt_l[1] = 0;
gba::reg_tmcnt_h[1] = {
    .cascade = true,
    .overflow_irq = true,
    .enabled = true
};

Polling timer state

Read the current timer counter (careful: this captures the live counter value):

// Get current count of timer 0
unsigned short count = gba::reg_tmcnt_l_stat[0];

// Check if timer 2 is running
bool timer2_enabled = (gba::reg_tmcnt_h[2].enabled);

Note: reg_tmcnt_l_stat is a read-only view of the counter registers. The count continuously increments and should be read only when you need the current value.

Prescaler values

ValueDividerFrequency
0116.78 MHz
164262.2 kHz
225665.5 kHz
3102416.4 kHz

tonclib comparison

stdgbatonclib
compile_timer(1s)Manual prescaler + reload calculation
gba::reg_tmcnt_h[0] = { ... };REG_TM0CNT = TM_FREQ_1024 | TM_ENABLE;
Automatic cascade chainManual multi-timer setup

Demo: Analogue Clock with Timer

This demo combines compile-time timer setup, timer IRQ handling, shapes-generated OBJ sprites, and BIOS affine transforms for clock-hand rotation:

#include <gba/angle>
#include <gba/bios>
#include <gba/color>
#include <gba/interrupt>
#include <gba/peripherals>
#include <gba/shapes>
#include <gba/timer>
#include <gba/video>

#include <array>
#include <cstdint>
#include <cstring>

using namespace std::chrono_literals;
using namespace gba::shapes;
using namespace gba::literals;
using namespace gba;

namespace {

    constexpr auto second_timer = compile_timer(1s, true);
    static_assert(second_timer.size() == 1);

    constexpr int clock_center_x = 120;
    constexpr int clock_center_y = 80;
    constexpr int sprite_half_extent = 32;

    // Clock face: visible outline, hour markers, and center hub.
    constexpr auto clock_face = sprite_64x64(palette_idx(1), circle_outline(32.0, 32.0, 30.0, 2), palette_idx(1),
                                             rect(31, 4, 2, 6), palette_idx(1), rect(31, 54, 2, 6), palette_idx(1),
                                             rect(4, 31, 6, 2), palette_idx(1), rect(54, 31, 6, 2), palette_idx(1),
                                             circle(32.0, 32.0, 2.5));

    // Hands are authored pointing straight up.
    // ObjAffineSet rotates visually anti-clockwise for positive angles, so the
    // runtime clock update negates angles to get normal clockwise clock motion.
    constexpr auto hand_hour = sprite_64x64(palette_idx(3), rect(30, 18, 4, 15));

    constexpr auto hand_minute = sprite_64x64(palette_idx(3), rect(31, 12, 2, 21));

    constexpr auto hand_second = sprite_64x64(palette_idx(2), rect(31, 8, 2, 25));

} // namespace

int main() {
    // Set up IRQ.
    std::uint32_t elapsed_seconds = 0;
    irq_handler = {[&elapsed_seconds](irq flags) {
        if (flags.timer2) {
            elapsed_seconds += 1;
        }
    }};
    reg_dispstat = {.enable_irq_vblank = true};
    reg_ie = {.vblank = true, .timer2 = true};
    reg_ime = true;

    // Start a 1-second timer on timer 2.
    reg_tmcnt[2] = second_timer[0];

    // Set up video mode 0 with sprites.
    reg_dispcnt = {
        .video_mode = 0,
        .linear_obj_tilemap = true,
        .enable_obj = true,
    };

    // Bank 0, colour 0 stays transparent for all sprites.
    pal_obj_bank[0][0] = "black"_clr;
    pal_obj_bank[0][1] = "firebrick"_clr;
    pal_obj_bank[0][2] = "lime"_clr;
    pal_obj_bank[0][3] = "royalblue"_clr;

    // Copy sprite data to OBJ VRAM using byte offsets.
    auto* objVram = reinterpret_cast<std::uint8_t*>(memory_map(mem_vram_obj));
    const auto baseTileIndex = tile_index(memory_map(mem_vram_obj));
    std::uint16_t vramOffset = 0;

    std::memcpy(objVram + vramOffset, clock_face.data(), clock_face.size());
    const auto tileIdxFace = static_cast<unsigned short>(baseTileIndex + vramOffset / sizeof(tile4bpp));
    vramOffset += static_cast<std::uint16_t>(clock_face.size());

    std::memcpy(objVram + vramOffset, hand_hour.data(), hand_hour.size());
    const auto tileIdxHour = static_cast<unsigned short>(baseTileIndex + vramOffset / sizeof(tile4bpp));
    vramOffset += static_cast<std::uint16_t>(hand_hour.size());

    std::memcpy(objVram + vramOffset, hand_minute.data(), hand_minute.size());
    const auto tileIdxMinute = static_cast<unsigned short>(baseTileIndex + vramOffset / sizeof(tile4bpp));
    vramOffset += static_cast<std::uint16_t>(hand_minute.size());

    std::memcpy(objVram + vramOffset, hand_second.data(), hand_second.size());
    const auto tileIdxSecond = static_cast<unsigned short>(baseTileIndex + vramOffset / sizeof(tile4bpp));

    auto faceObj = clock_face.obj(tileIdxFace);
    faceObj.x = clock_center_x - sprite_half_extent;
    faceObj.y = clock_center_y - sprite_half_extent;
    obj_mem[0] = faceObj;

    auto hourObj = hand_hour.obj_aff(tileIdxHour);
    hourObj.x = clock_center_x - sprite_half_extent;
    hourObj.y = clock_center_y - sprite_half_extent;
    hourObj.affine_index = 0;
    obj_aff_mem[1] = hourObj;

    auto minuteObj = hand_minute.obj_aff(tileIdxMinute);
    minuteObj.x = clock_center_x - sprite_half_extent;
    minuteObj.y = clock_center_y - sprite_half_extent;
    minuteObj.affine_index = 1;
    obj_aff_mem[2] = minuteObj;

    auto secondObj = hand_second.obj_aff(tileIdxSecond);
    secondObj.x = clock_center_x - sprite_half_extent;
    secondObj.y = clock_center_y - sprite_half_extent;
    secondObj.affine_index = 2;
    obj_aff_mem[3] = secondObj;

    // Disable remaining OAM entries.
    for (int i = 4; i < 128; ++i) {
        obj_mem[i] = {.disable = true};
    }

    std::array<object_parameters, 3> affineParams{
        {
         {.sx = 1.0_fx, .sy = 1.0_fx, .alpha = 0_deg},
         {.sx = 1.0_fx, .sy = 1.0_fx, .alpha = 0_deg},
         {.sx = 1.0_fx, .sy = 1.0_fx, .alpha = 0_deg},
         }
    };

    ObjAffineSet(affineParams.data(), memory_map(mem_obj_aff), affineParams.size(), 8);

    while (true) {
        VBlankIntrWait();

        const std::uint32_t secs = elapsed_seconds;
        const auto hours = static_cast<unsigned int>((secs / 3600U) % 12U);
        const auto mins = static_cast<unsigned int>((secs / 60U) % 60U);
        const auto secUnits = static_cast<unsigned int>(secs % 60U);

        affineParams[0].alpha = -(30_deg * hours + 0.5_deg * mins);
        affineParams[1].alpha = -(6_deg * mins + 0.1_deg * secUnits);
        affineParams[2].alpha = -(6_deg * secUnits);

        ObjAffineSet(affineParams.data(), memory_map(mem_obj_aff), affineParams.size(), 8);
    }
}

Timer clock demo screenshot

Key points shown in the demo:

  • compile_timer(1s, true) configures a 1-second overflow interrupt at compile time.
  • The timer IRQ increments a seconds counter used for hand angles.
  • ObjAffineSet(...) writes affine matrices each frame to rotate hour/minute/second hands.
  • Angle literals are used directly in runtime math (30_deg * hours + 0.5_deg * mins).

Key Input

The GBA has 10 buttons: A, B, L, R, Start, Select, and the 4-direction D-pad.

gba::keypad gives you:

  • level checks (held)
  • edge checks (pressed, released)
  • axis helpers (xaxis, i_xaxis, yaxis, i_yaxis, lraxis, i_lraxis)
  • a predefined combo constant named gba::reset_combo

Reading keys

#include <gba/keyinput>
#include <gba/peripherals>

gba::keypad keys;

// In your game loop:
for (;;) {
    gba::VBlankIntrWait();
    keys = gba::reg_keyinput;  // One sample per frame

    if (keys.held(gba::key_a)) {
        // A is currently held down
    }

    if (keys.pressed(gba::key_b)) {
        // B was just pressed this frame (edge detection)
    }

    if (keys.released(gba::key_start)) {
        // Start was just released this frame
    }
}

Frame update contract

gba::keypad stores previous and current state internally. Each assignment from gba::reg_keyinput updates that state (normally once per frame). This is what powers pressed() and released().

Recommended pattern: call keys = gba::reg_keyinput; exactly once per game frame (usually right before game state needs to be updated).

If you sample multiple times in the same frame, edge checks can appear inconsistent because you advanced the internal history more than once.

The keypad hardware register itself is active-low (0 means pressed), but gba::keypad normalizes this so held(key) reads naturally.

Practical patterns

// One-shot action: only fires on the transition frame.
if (keys.pressed(gba::key_a)) {
    jump();
}

// Release-triggered action: useful for menus and drag/release interactions.
if (keys.released(gba::key_b)) {
    close_menu();
}

D-pad axes

For movement, use the axis helpers. yaxis() uses the mathematical convention where up is positive:

int dx = keys.xaxis();  // -1 (left), 0, or 1 (right)
int dy = keys.yaxis();  // -1 (down), 0, or 1 (up)

These return a tri-state value based on the D-pad. If both left and right are held simultaneously, they cancel out to 0.

Inverted axes

The inverted variants flip the sign. i_xaxis() is useful when your camera or gameplay logic expects right-negative coordinates, and i_yaxis() matches screen coordinates where Y increases downward:

int dx = keys.i_xaxis();  // -1 (right), 0, or 1 (left)
int dy = keys.i_yaxis();  // -1 (up), 0, or 1 (down)

player_x += dx;
player_y += dy;

For most gameplay movement, i_yaxis() is the convenient choice because screen-space Y grows downward.

Shoulder axis

The L and R buttons can also be read as an axis:

int lr = keys.lraxis();    // -1 (L), 0, or 1 (R)
int ilr = keys.i_lraxis(); // -1 (R), 0, or 1 (L)

Key constants

ConstantButton
gba::key_aA
gba::key_bB
gba::key_lL shoulder
gba::key_rR shoulder
gba::key_startStart
gba::key_selectSelect
gba::key_upD-pad up
gba::key_downD-pad down
gba::key_leftD-pad left
gba::key_rightD-pad right

Combos and reset_combo

Use operator| to combine button masks:

auto combo = gba::key_a | gba::key_b;
if (keys.held(combo)) {
    // Both A and B are held
}

stdgba also provides gba::reset_combo, defined as A + B + Select + Start:

if (keys.held(gba::reset_combo)) {
    // Enter your reset path
}

Rationale: this is the long-standing GBA soft-reset convention. Requiring four buttons reduces accidental resets during normal play while still giving a predictable emergency-exit combo.

If you use it for reset, wait until the combo is released before returning to normal flow to avoid immediate retrigger:

if (keys.held(gba::reset_combo)) {
    request_reset();
    do {
        keys = gba::reg_keyinput;
    } while (keys.held(gba::reset_combo));
}

Common Pitfalls

  • Sampling keys = gba::reg_keyinput; multiple times in one frame: this advances history repeatedly and can break pressed()/released() expectations.
  • Using pressed() for continuous movement: pressed() is edge-only, so movement usually belongs on held() or axis helpers.
  • Mixing yaxis() and screen-space coordinates: yaxis() treats up as +1; use i_yaxis() when down-positive screen coordinates are what you want.
  • Forgetting that i_xaxis() is also available: if horizontal math is inverted in your coordinate system, use i_xaxis() instead of manually negating xaxis().
  • Forgetting release-wait after reset combo handling: without the short hold-until-release loop, reset paths can retrigger immediately.
  • Treating the hardware register as active-high in custom low-level code: KEYINPUT is active-low; prefer gba::keypad unless you intentionally handle bit inversion yourself.

tonclib comparison

stdgbatonclib
keys = gba::reg_keyinput;key_poll();
keys.held(gba::key_a)key_is_down(KEY_A)
keys.pressed(gba::key_a)key_hit(KEY_A)
keys.released(gba::key_a)key_released(KEY_A)
keys.xaxis()key_tri_horz()
keys.i_xaxis()-key_tri_horz()
keys.yaxis()key_tri_vert()
keys.i_yaxis()-key_tri_vert()
keys.held(gba::reset_combo)key_is_down(KEY_A|KEY_B|KEY_SELECT|KEY_START)

key_tri_vert() and keys.yaxis() both treat up as positive. For screen-space movement where Y increases downward, use keys.i_yaxis().

For keypad API details (gba::keypad, key masks, edge and axis methods), see book/src/reference/keypad.md.

For keypad register details (including active-low hardware semantics), see book/src/reference/peripherals/keypad.md.

Demo: Visual button layout

This demo renders a simple GBA-style button layout and updates each button colour from pressed(), released(), and held() state:

#include <gba/bios>
#include <gba/color>
#include <gba/interrupt>
#include <gba/keyinput>
#include <gba/shapes>
#include <gba/video>

#include <array>
#include <cstring>

using namespace gba::shapes;
using gba::operator""_clr;

namespace {

    // D-pad directional buttons: 16x16 squares with direction labels
    constexpr auto dpad_up_button = sprite_16x16(rect(2, 2, 12, 12), palette_idx(0), text(6, 6, "U"));

    constexpr auto dpad_down_button = sprite_16x16(rect(2, 2, 12, 12), palette_idx(0), text(6, 6, "D"));

    constexpr auto dpad_left_button = sprite_16x16(rect(2, 2, 12, 12), palette_idx(0), text(6, 6, "L"));

    constexpr auto dpad_right_button = sprite_16x16(rect(2, 2, 12, 12), palette_idx(0), text(6, 6, "R"));

    // A button: 16x16 circle with label
    constexpr auto a_button = sprite_16x16(circle(8.0, 8.0, 6.0), // Filled circle
                                           palette_idx(0), text(7, 6, "A"));

    // B button: 16x16 circle with label
    constexpr auto b_button = sprite_16x16(circle(8.0, 8.0, 6.0), // Filled circle
                                           palette_idx(0), text(7, 6, "B"));

    // L button: 32x16 wide rectangle
    constexpr auto l_button = sprite_32x16(rect(2, 3, 28, 10), palette_idx(0), text(13, 5, "L"));

    // R button: 32x16 wide rectangle
    constexpr auto r_button = sprite_32x16(rect(2, 3, 28, 10), palette_idx(0), text(13, 5, "R"));

    // Start button: 32x16 oval with label
    constexpr auto start_button = sprite_32x16(oval(2, 3, 28, 10), palette_idx(0), text(10, 5, "Str"));

    // Select button: 32x16 oval with label
    constexpr auto select_button = sprite_32x16(oval(2, 3, 28, 10), palette_idx(0), text(9, 5, "Sel"));

    // Controller layout: buttons with different shapes
    struct ButtonDef {
        int obj_index;   // Which OAM object
        gba::key mask;   // Associated key mask
        int sprite_type; // 0=dpad_up, 1=dpad_down, 2=dpad_left, 3=dpad_right, 4=a, 5=b, 6=l, 7=r, 8=start, 9=select
    };

    // Map out the 10 GBA buttons in OAM space
    std::array<ButtonDef, 10> buttons{
        {
         {0, gba::key_up, 0},     // Up - dpad_up
            {1, gba::key_down, 1},   // Down - dpad_down
            {2, gba::key_left, 2},   // Left - dpad_left
            {3, gba::key_right, 3},  // Right - dpad_right
            {4, gba::key_a, 4},      // A - a_button
            {5, gba::key_b, 5},      // B - b_button
            {6, gba::key_l, 6},      // L - l_button
            {7, gba::key_r, 7},      // R - r_button
            {8, gba::key_start, 8},  // Start - start_button
            {9, gba::key_select, 9}, // Select - select_button
        }
    };

    // Position data for each button (arranged in a GBA-like layout)
    // Adjusted for larger sprite sizes
    struct Position {
        int x, y;
    };

    std::array<Position, 10> positions{
        {
         {56, 60},  // Up - dpad top
            {56, 84},  // Down - dpad bottom
            {40, 72},  // Left - dpad left
            {72, 72},  // Right - dpad right (meet in middle)
            {160, 96}, // A - circle
            {144, 96}, // B - circle
            {16, 16},  // L - left shoulder
            {176, 16}, // R - right shoulder
            {72, 128}, // Start - bottom left
            {24, 128}, // Select - bottom center
        }
    };

} // namespace

int main() {
    gba::irq_handler = {};
    gba::reg_dispstat = {.enable_irq_vblank = true};
    gba::reg_ie = {.vblank = true};
    gba::reg_ime = true;

    // Video mode 0, objects enabled
    gba::reg_dispcnt = {
        .video_mode = 0,
        .linear_obj_tilemap = true,
        .enable_obj = true,
    };

    // Set up palette banks (shared across all button types)
    // Palette 0: untouched (gray)
    gba::pal_obj_bank[0][0] = "#888888"_clr; // background
    gba::pal_obj_bank[0][1] = "#CCCCCC"_clr; // untouched button
    gba::pal_obj_bank[0][2] = "#999999"_clr; // text placeholder

    // Palette 1: pressed (bright green)
    gba::pal_obj_bank[1][0] = "#888888"_clr;
    gba::pal_obj_bank[1][1] = "#00FF00"_clr; // pressed (bright green)
    gba::pal_obj_bank[1][2] = "#FFFFFF"_clr; // text

    // Palette 2: released (red)
    gba::pal_obj_bank[2][0] = "#888888"_clr;
    gba::pal_obj_bank[2][1] = "#FF0000"_clr; // released (red)
    gba::pal_obj_bank[2][2] = "#FFFFFF"_clr; // text

    // Palette 3: held (medium green)
    gba::pal_obj_bank[3][0] = "#888888"_clr;
    gba::pal_obj_bank[3][1] = "#00AA00"_clr; // held (medium green)
    gba::pal_obj_bank[3][2] = "#FFFFFF"_clr; // text

    auto* objVRAM = gba::memory_map(gba::mem_vram_obj);
    auto* vramPtr = reinterpret_cast<std::uint8_t*>(objVRAM);

    // Copy all button sprite shapes to VRAM and track tile indices
    std::uint16_t baseTileIdx = gba::tile_index(objVRAM);
    std::uint16_t tileOffset = 0;

    // D-pad buttons (8x8 squares, each with its own label)
    std::memcpy(vramPtr + tileOffset, dpad_up_button.data(), dpad_up_button.size());
    const auto dpad_up_tile = baseTileIdx + (tileOffset / 32);
    tileOffset += dpad_up_button.size();

    std::memcpy(vramPtr + tileOffset, dpad_down_button.data(), dpad_down_button.size());
    const auto dpad_down_tile = baseTileIdx + (tileOffset / 32);
    tileOffset += dpad_down_button.size();

    std::memcpy(vramPtr + tileOffset, dpad_left_button.data(), dpad_left_button.size());
    const auto dpad_left_tile = baseTileIdx + (tileOffset / 32);
    tileOffset += dpad_left_button.size();

    std::memcpy(vramPtr + tileOffset, dpad_right_button.data(), dpad_right_button.size());
    const auto dpad_right_tile = baseTileIdx + (tileOffset / 32);
    tileOffset += dpad_right_button.size();

    // A button (8x8 circle)
    std::memcpy(vramPtr + tileOffset, a_button.data(), a_button.size());
    const auto a_tile = baseTileIdx + (tileOffset / 32);
    tileOffset += a_button.size();

    // B button (8x8 circle)
    std::memcpy(vramPtr + tileOffset, b_button.data(), b_button.size());
    const auto b_tile = baseTileIdx + (tileOffset / 32);
    tileOffset += b_button.size();

    // L button (16x8 rectangle)
    std::memcpy(vramPtr + tileOffset, l_button.data(), l_button.size());
    const auto l_tile = baseTileIdx + (tileOffset / 32);
    tileOffset += l_button.size();

    // R button (16x8 rectangle)
    std::memcpy(vramPtr + tileOffset, r_button.data(), r_button.size());
    const auto r_tile = baseTileIdx + (tileOffset / 32);
    tileOffset += r_button.size();

    // Start button (16x8 oval)
    std::memcpy(vramPtr + tileOffset, start_button.data(), start_button.size());
    const auto start_tile = baseTileIdx + (tileOffset / 32);
    tileOffset += start_button.size();

    // Select button (16x8 oval)
    std::memcpy(vramPtr + tileOffset, select_button.data(), select_button.size());
    const auto select_tile = baseTileIdx + (tileOffset / 32);
    tileOffset += select_button.size();

    // Store tile indices for use in rendering
    std::array<std::uint16_t, 10> spritesTiles{
        {
         dpad_up_tile, dpad_down_tile,
         dpad_left_tile, dpad_right_tile,
         a_tile, b_tile,
         l_tile, r_tile,
         start_tile, select_tile,
         }
    };

    // Store sprite data for each button (sprite, tile)
    struct SpriteData {
        gba::object obj;
        int x, y;
    };
    std::array<SpriteData, 10> buttonSprites;

    // Initialize all button sprites once
    for (int i = 0; i < 10; ++i) {
        const auto& btn = buttons[i];
        const auto& pos = positions[i];

        gba::object obj;

        switch (btn.sprite_type) {
            case 0: // D-pad Up
                obj = dpad_up_button.obj(spritesTiles[0]);
                break;
            case 1: // D-pad Down
                obj = dpad_down_button.obj(spritesTiles[1]);
                break;
            case 2: // D-pad Left
                obj = dpad_left_button.obj(spritesTiles[2]);
                break;
            case 3: // D-pad Right
                obj = dpad_right_button.obj(spritesTiles[3]);
                break;
            case 4: // A button
                obj = a_button.obj(spritesTiles[4]);
                break;
            case 5: // B button
                obj = b_button.obj(spritesTiles[5]);
                break;
            case 6: // L button
                obj = l_button.obj(spritesTiles[6]);
                break;
            case 7: // R button
                obj = r_button.obj(spritesTiles[7]);
                break;
            case 8: // Start button
                obj = start_button.obj(spritesTiles[8]);
                break;
            case 9: // Select button
                obj = select_button.obj(spritesTiles[9]);
                break;
            default: obj = dpad_up_button.obj(spritesTiles[0]);
        }

        obj.x = pos.x;
        obj.y = pos.y;
        obj.palette_index = 0; // Start with palette 0 (untouched)

        buttonSprites[i] = {obj, pos.x, pos.y};
        gba::obj_mem[i] = obj;
    }

    // Disable remaining OAM entries
    for (int i = 10; i < 128; ++i) {
        gba::obj_mem[i] = {.disable = true};
    }

    gba::keypad keys;

    while (true) {
        gba::VBlankIntrWait();

        keys = gba::reg_keyinput;

        // Update each button's palette based on current state
        for (int i = 0; i < 10; ++i) {
            const auto& btn = buttons[i];
            auto& sprite = buttonSprites[i];

            // Determine palette based on key state
            if (keys.pressed(btn.mask)) {
                // Just pressed this frame (bright green)
                sprite.obj.palette_index = 1;
            } else if (keys.released(btn.mask)) {
                // Just released this frame (red)
                sprite.obj.palette_index = 2;
            } else if (keys.held(btn.mask)) {
                // Currently held (medium green)
                sprite.obj.palette_index = 3;
            } else {
                // Not held (gray)
                sprite.obj.palette_index = 0;
            }

            gba::obj_mem[i] = sprite.obj;
        }
    }
}

Keypad buttons demo screenshot

Video Modes

The GBA has 6 video modes (0-5), split into two categories:

  • Tile modes (0-2) - the display is built from 8x8 pixel tiles arranged on background layers
  • Bitmap modes (3-5) - the display is a framebuffer you write pixels to directly

Setting the video mode

#include <gba/peripherals>

// Mode 3: 240x160 bitmap, 15-bit colour, 1 layer
gba::reg_dispcnt = { .video_mode = 3, .enable_bg2 = true };

// Mode 0: 4 tile backgrounds, no rotation
gba::reg_dispcnt = {
    .video_mode = 0,
    .enable_bg0 = true,
    .enable_bg1 = true,
};

Mode summary

ModeTypeBG layersResolutionColours
0TileBG0-BG3 (all regular)Up to 512x5124bpp or 8bpp
1TileBG0-BG1 regular, BG2 affineUp to 1024x10244bpp/8bpp + 8bpp
2TileBG2-BG3 (both affine)Up to 1024x10248bpp
3BitmapBG2240x16015-bit direct
4BitmapBG2 (page flip)240x1608-bit indexed
5BitmapBG2 (page flip)160x12815-bit direct

Mode 3: the simplest mode

Mode 3 is a raw 240x160 framebuffer at 0x06000000. Each pixel is a 15-bit colour:

#include <gba/bios>
#include <gba/interrupt>
#include <gba/video>

int main() {
    gba::irq_handler = {};
    gba::reg_dispstat = {.enable_irq_vblank = true};
    gba::reg_ie = {.vblank = true};
    gba::reg_ime = true;

    gba::reg_dispcnt = {.video_mode = 3, .enable_bg2 = true};

    // Draw a red pixel at (120, 80) - center of screen
    gba::mem_vram[120 + 80 * 240] = 0x001F;

    // Draw a green pixel one to the right
    gba::mem_vram[121 + 80 * 240] = 0x03E0;

    // Draw a blue pixel one below
    gba::mem_vram[120 + 81 * 240] = 0x7C00;

    while (true) {
        gba::VBlankIntrWait();
    }
}

Mode 3 pixels

This is the easiest mode to learn with, but it uses the most VRAM (75 KB of the available 96 KB), leaving little room for sprites or other data.

Tile modes for games

Most GBA games use mode 0 or mode 1. Tiles are memory-efficient (a 256x256 background uses only ~2 KB for the map + shared tile data), and the hardware handles scrolling, flipping, and palette lookup in zero CPU time.

See Tiles & Maps for details on tile-based rendering.

Colours & Palettes

The GBA uses 16-bit colours: 5 bits each for red, green, and blue in bits 0-14.

"..."_clr lives in gba::literals and accepts both hex ("#RRGGBB") and CSS web colour names (for example "cornflowerblue").

Named-colour list: MDN CSS named colors.

Colour format

Bit:  15      14-10  9-5    4-0
      grn_lo  Blue   Green  Red

Most software treats bit 15 as unused and works with 15-bit colour (5-5-5). This is perfectly fine for general use.

#include <gba/video>

// Write colours to background palette
gba::pal_bg_mem[0] = { .red = 0 };                  // Black (background colour)
gba::pal_bg_mem[1] = { .red = 31 };                  // Red   (5 bits max = 31)
gba::pal_bg_mem[2] = { .green = 31 };                // Green (5-bit, range 0-31)
gba::pal_bg_mem[3] = { .blue = 31 };                 // Blue
gba::pal_bg_mem[4] = { .red = 31, .green = 31, .blue = 31 }; // White

// Hex colour literals (grn_lo is derived from the green channel)
using namespace gba::literals;
gba::pal_bg_mem[5] = "#FF8040"_clr;
gba::pal_bg_mem[6] = "cornflowerblue"_clr;

Here are several colours displayed as palette swatches using Mode 0 tiles:

Colour swatches

#include <gba/bios>
#include <gba/interrupt>
#include <gba/video>

static void fill_tile_solid(int tile_idx) {
    // Fill every nibble with palette index 1 (0x11111111 per row)
    gba::mem_tile_4bpp[0][tile_idx] = {
        0x11111111, 0x11111111, 0x11111111, 0x11111111, 0x11111111, 0x11111111, 0x11111111, 0x11111111,
    };
}

int main() {
    gba::irq_handler = {};
    gba::reg_dispstat = {.enable_irq_vblank = true};
    gba::reg_ie = {.vblank = true};
    gba::reg_ime = true;

    gba::reg_dispcnt = {
        .video_mode = 0,
        .enable_bg0 = true,
    };

    // Use charblock 0 for tiles, screenblock 31 for map
    gba::reg_bgcnt[0] = {.screenblock = 31};

    // Create a solid tile (palette index 1 everywhere)
    fill_tile_solid(1);

    // Set up 8 color swatches across the top row
    using namespace gba;
    using namespace gba::literals;
    pal_bg_bank[0][1] = "red"_clr;            // CSS: red
    pal_bg_bank[1][1] = "lime"_clr;           // CSS: lime (pure green)
    pal_bg_bank[2][1] = "blue"_clr;           // CSS: blue
    pal_bg_bank[3][1] = "gold"_clr;           // CSS: gold
    pal_bg_bank[4][1] = "cyan"_clr;           // CSS: cyan
    pal_bg_bank[5][1] = "magenta"_clr;        // CSS: magenta
    pal_bg_bank[6][1] = "white"_clr;          // CSS: white
    pal_bg_bank[7][1] = "cornflowerblue"_clr; // CSS: cornflowerblue

    // Background color (palette 0, index 0)
    pal_bg_mem[0] = {.red = 2, .green = 2, .blue = 4};

    // Place 3x3 blocks of the solid tile across screen row 8-10
    for (int swatch = 0; swatch < 8; ++swatch) {
        for (int dy = 0; dy < 3; ++dy) {
            for (int dx = 0; dx < 3; ++dx) {
                int map_x = 1 + swatch * 4 + dx;
                int map_y = 8 + dy;
                mem_se[31][map_x + map_y * 32] = {
                    .tile_index = 1,
                    .palette_index = static_cast<unsigned short>(swatch),
                };
            }
        }
    }

    while (true) {
        gba::VBlankIntrWait();
    }
}

Palette memory layout

The GBA has 512 palette entries total (1 KB), split evenly:

RegionAddressEntriesUsed by
mem_pal_bg0x05000000256Background tiles
mem_pal_obj0x05000200256Sprites (objects)

In 4bpp (16-colour) mode, the 256 entries are organised as 16 sub-palettes of 16 colours each. Each tile chooses which sub-palette to use.

In 8bpp (256-colour) mode, all 256 entries form one large palette.

Palette index 0

Palette index 0 is special: it is the transparent colour for both backgrounds and sprites. For the very first background palette (sub-palette 0, index 0), it also serves as the screen backdrop colour - the colour you see when no background or sprite covers a pixel.

// Set the backdrop to dark blue
gba::pal_bg_mem[0] = { .blue = 16 };

Bit 15 and hardware blending

Bit 15 (grn_lo) is usually safe to ignore for everyday palette work.

When colour effects are enabled (brighten, darken, or alpha blend), hardware treats green as an internal 6-bit value and may use grn_lo. This can create hardware-visible differences that many emulators do not reproduce.

For full details, demo code, and emulator-vs-hardware screenshots, see Advanced: Green Low Bit (grn_lo).

tonclib comparison

Colour construction

stdgbatonclibNotes
{ .red = r, .green = g, .blue = b }RGB15(r, g, b)5-bit channels (0-31)
"#RRGGBB"_clrRGB8(r, g, b)8-bit channels (0-255)

RGB8 and "#RRGGBB"_clr are direct equivalents - both accept 8-bit per channel values and truncate to 5 bits.

Named colour constants

tonclib defines a small set of CLR_* constants for the primary colours. The stdgba equivalents use CSS web colour names with _clr:

tonclib stdgba Value
CLR_BLACK"black"_clr#000000
CLR_RED"red"_clr#FF0000
CLR_LIME"lime"_clr#00FF00
CLR_YELLOW"yellow"_clr#FFFF00
CLR_BLUE"blue"_clr#0000FF
CLR_MAG"magenta"_clr or "fuchsia"_clr#FF00FF
CLR_CYAN"cyan"_clr or "aqua"_clr#00FFFF
CLR_WHITE"white"_clr#FFFFFF
CLR_MAROON"maroon"_clr#800000
CLR_GREEN"green"_clr#008000
CLR_NAVY"navy"_clr#000080
CLR_TEAL"teal"_clr#008080
CLR_PURPLE"purple"_clr#800080
CLR_OLIVE"olive"_clr#808000
CLR_ORANGE"orange"_clr#FFA500
CLR_GRAY / CLR_GREY"gray"_clr or "grey"_clr#808080
CLR_SILVER"silver"_clr#C0C0C0

stdgba’s CSS colour set is a strict superset - all 147 CSS Color Level 4 names are supported, including colours like "cornflowerblue"_clr that have no tonclib constant.

Tiles & Maps

Tile modes (0-2) are the backbone of GBA graphics. The display hardware composites 8x8 pixel tiles from VRAM, using a tilemap to arrange them into backgrounds. This is extremely memory-efficient and the scrolling is handled entirely by hardware.

How it works

  1. Tile data (the pixel art) is stored in VRAM “character base blocks”
  2. Tilemap (which tile goes where) is stored in VRAM “screen base blocks”
  3. Palette maps pixel indices to colours
  4. The hardware reads the map, looks up each tile, applies the palette, and draws the scanline

Loading tile data

Tile graphics are usually pre-converted at build time and copied into VRAM. Each 8x8 tile in 4bpp mode is 32 bytes (4 bits per pixel, 64 pixels):

#include <gba/peripherals>
#include <gba/dma>
#include <gba/video>

// Assuming tile_data is a const array in ROM
extern const unsigned short tile_data[];
extern const unsigned int tile_data_size;

// Copy tile data to character base block 0 (0x06000000)
gba::reg_dma[3] = gba::dma::copy(
    tile_data,
    gba::memory_map(gba::mem_vram_bg),
    tile_data_size / 4
);

Setting up a background

// Configure BG0: 256x256, 4bpp tiles
// Character base = 0 (tile data at 0x06000000)
// Screen base = 31 (map at 0x0600F800)
gba::reg_bgcnt[0] = {
    .charblock = 0,
    .screenblock = 31,
    .size = 0,  // 256x256 (32x32 tiles)
};

// Scroll BG0
gba::reg_bgofs[0][0] = 0;
gba::reg_bgofs[0][1] = 0;

Background sizes

Size valueDimensions (pixels)Dimensions (tiles)
0256x25632x32
1512x25664x32
2256x51232x64
3512x51264x64

Scrolling

Scrolling is a single register write per axis:

gba::reg_bgofs[0][0] = scroll_x; // BG0 horizontal offset
gba::reg_bgofs[0][1] = scroll_y; // BG0 vertical offset

The hardware wraps seamlessly at the background boundaries. A 256x256 background scrolled past x=255 wraps back to x=0 - perfect for side-scrolling games.

Here is a scrollable checkerboard built from two solid tiles:

#include <gba/interrupt>
#include <gba/video>

int main() {
    gba::irq_handler = {};
    gba::reg_dispstat = {.enable_irq_vblank = true};
    gba::reg_ie = {.vblank = true};
    gba::reg_ime = true;

    gba::reg_dispcnt = {.video_mode = 0, .enable_bg0 = true};
    gba::reg_bgcnt[0] = {.screenblock = 31};

    // Palette
    gba::pal_bg_mem[0] = {.red = 2, .green = 2, .blue = 6};
    gba::pal_bg_bank[0][1] = {.red = 10, .green = 14, .blue = 20};
    gba::pal_bg_bank[0][2] = {.red = 4, .green = 6, .blue = 12};

    // Tile 1: solid light (palette index 1)
    gba::mem_tile_4bpp[0][1] = {
        0x11111111, 0x11111111, 0x11111111, 0x11111111, 0x11111111, 0x11111111, 0x11111111, 0x11111111,
    };

    // Tile 2: solid dark (palette index 2)
    gba::mem_tile_4bpp[0][2] = {
        0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222, 0x22222222,
    };

    // Fill the 32x32 tilemap with a checkerboard
    for (int ty = 0; ty < 32; ++ty)
        for (int tx = 0; tx < 32; ++tx)
            gba::mem_se[31][tx + ty * 32] = {
                .tile_index = static_cast<unsigned short>(((tx ^ ty) & 1) ? 2 : 1),
            };

    int scroll_x = 0, scroll_y = 0;

    while (true) {
        gba::VBlankIntrWait();

        ++scroll_x;
        ++scroll_y;

        gba::reg_bgofs[0][0] = static_cast<short>(scroll_x);
        gba::reg_bgofs[0][1] = static_cast<short>(scroll_y);
    }
}

Tile checkerboard

Sprites (Objects)

The GBA calls sprites “objects” (OBJ). Up to 128 sprites can be displayed simultaneously, each with independent position, size, palette, flipping, and priority. The hardware composites sprites automatically.

For field-by-field API details, see gba::object and gba::object_affine.

OAM (Object Attribute Memory)

Sprite attributes are stored in OAM at 0x07000000. Each entry is 8 bytes with three 16-bit attribute words (plus an affine parameter slot shared across entries).

#include <gba/video>

// Place sprite 0 at position (120, 80), using tile 0
gba::obj_mem[0] = {
    .y = 80,
    .x = 120,
    .tile_index = 0,
};

Important: OAM should only be written during VBlank or HBlank. Writing during the active display period can cause visual glitches. Use DMA or a shadow buffer for safe updates.

Sprite sizes

Sprites can be various sizes by combining shape and size fields:

ShapeSize 0Size 1Size 2Size 3
Square8x816x1632x3264x64
Wide16x832x832x1664x32
Tall8x168x3216x3232x64

Sprite tile data

Sprite tiles live in the lower portion of VRAM (starting at 0x06010000 in tile modes). Like background tiles, they can be 4bpp (16 colours) or 8bpp (256 colours) and use the object palette (pal_obj_mem).

1D vs 2D mapping

The .linear_obj_tilemap field in reg_dispcnt controls how multi-tile sprites index their tile data:

  • 1D mapping (linear_obj_tilemap = true): tiles are laid out sequentially in memory. A 16x16 sprite (4 tiles) uses tiles N, N+1, N+2, N+3.
  • 2D mapping (linear_obj_tilemap = false): tiles are laid out in a 32-tile-wide grid. A 16x16 sprite uses tiles at grid positions.

Most games use 1D mapping - it is simpler and wastes less VRAM:

gba::reg_dispcnt = {
    .video_mode = 0,
    .linear_obj_tilemap = true,
    .enable_bg0 = true,
    .enable_obj = true,
};

Hiding a sprite

Set the object disable flag to remove a sprite from the display without deleting its data:

gba::obj_mem[0] = { .disable = true };

Iterators and ranges can also be used to hide multiple sprites at once:

// Hides all sprites
std::ranges::fill(gba::obj_mem, gba::object{ .disable = true });

tonclib comparison

stdgbatonclib
gba::obj_mem[0] = { .y = 80, .x = 120, ... };obj_set_attr(&oam_mem[0], ...)
gba::pal_obj_mem[n] = color;pal_obj_mem[n] = color;

Text Rendering

stdgba provides a 4bpp BG text-layer renderer.

The core goal is to render formatted strings efficiently - including typewriter effects - without a full-screen redraw each frame.

Features

  • Bitmap fonts embedded from BDF files at compile time via <gba/embed>.
  • Compile-time font variant baking: with_shadow<dx, dy> and with_outline<thickness>.
  • Stream/tokenizer support for incremental rendering:
    • C-string tokenizer streams (cstr_stream).
    • Generator-backed streams from <gba/format> via stream(gen, ...).
  • Word wrapping using a lookahead to the next break character.
  • Incremental rendering via make_cursor(...) and next_visible() for typewriter effects.
  • Bitplane palette profiles for 2-colour, 3-colour, and full-colour (up to 15 colours) text.
  • Inline colour escape sequences for per-character palette switching in full-colour mode.

Quick start

The demo below embeds 9x18.bdf, configures the bitplane palette, and draws one visible glyph per frame.

#include <gba/bios>
#include <gba/embed>
#include <gba/format>
#include <gba/interrupt>
#include <gba/text>

#include <array>

int main() {
    using namespace gba::literals;

    static constexpr auto font = gba::text::with_shadow<1, 1>(gba::embed::bdf([] {
        return std::to_array<unsigned char>({
#embed "9x18.bdf"
        });
    }));
    static constexpr auto fmt = "The frame is: {value}"_fmt;

    gba::irq_handler = {};
    gba::reg_dispstat = {.enable_irq_vblank = true};
    gba::reg_ie = {.vblank = true};
    gba::reg_ime = true;

    gba::reg_dispcnt = {.video_mode = 0, .enable_bg0 = true};
    gba::reg_bgcnt[0] = {.screenblock = 31};

    constexpr auto config = gba::text::bitplane_config{
        .profile = gba::text::bitplane_profile::two_plane_three_color,
        .palbank_0 = 1,
        .palbank_1 = 2,
        .start_index = 1,
    };

    gba::text::set_theme(config, {
                                      .background = "#304060"_clr,
                                      .foreground = "white"_clr,
                                      .shadow = "#102040"_clr,
                                  });
    gba::pal_bg_mem[0] = "#304060"_clr;

    unsigned int frame = 0;

    gba::text::linear_tile_allocator alloc{.next_tile = 1, .end_tile = 512};
    using layer_type = gba::text::bg4bpp_text_layer<240, 160>;
    static layer_type::cell_state_map cell_state{};
    layer_type layer{31, config, alloc, cell_state};

    gba::text::stream_metrics metrics{
        .letter_spacing_px = 1,
        .line_spacing_px = 2,
        .tab_width_px = 32,
        .wrap_width_px = 220,
    };

    auto make_cursor = [&] {
        auto gen = fmt.generator("value"_arg = [&] { return frame; });
        auto s = gba::text::stream(gen, font, metrics);
        return layer.make_cursor(font, s, 0, 0, metrics);
    };

    auto cursor = make_cursor();

    while (true) {
        gba::VBlankIntrWait();
        ++frame;

        if (!cursor.next_visible() && frame % 120 == 0) {
            alloc = {.next_tile = 1, .end_tile = 512};
            layer = layer_type{31, config, alloc, cell_state};
            cursor = make_cursor();
        }
    }
}

Text rendering demo


Bitplane profiles

bg4bpp_text_layer<Width, Height> multiplexes multiple palette layers onto 4bpp VRAM tiles using a mixed-radix encoding scheme. Choose the profile that matches how many colour roles your text needs.

ProfilePlanesPalette entriesColour roles
two_plane_binary24background, foreground
two_plane_three_color29background, foreground, shadow
three_plane_binary38background, foreground
one_plane_full_color116nibble = palette index directly

two_plane_three_color is the most common choice: it provides foreground, shadow (or outline decoration), and background using only two VRAM tiles worth of palette space per 8x8 cell.

one_plane_full_color maps nibble values directly to palette entries, giving up to 15 distinct colours at the cost of one VRAM tile per cell (no cell sharing).


Palette configuration

A bitplane_config binds a profile to concrete palette banks and a starting index:

constexpr auto config = gba::text::bitplane_config{
    .profile    = gba::text::bitplane_profile::two_plane_three_color,
    .palbank_0  = 1,   // plane 0 uses palette bank 1
    .palbank_1  = 2,   // plane 1 uses palette bank 2
    .start_index = 1,  // first occupied entry within each bank
};

Apply colours to palette RAM with set_theme:

gba::text::set_theme(config, {
    .background = "#304060"_clr,
    .foreground = "white"_clr,
    .shadow     = "#102040"_clr,
});

set_theme fills all active planes in one call. Call it again any time to change the entire colour scheme without re-rendering text.


Font variants

Font variants bake visual effects into the glyph bitmap data at compile time. The renderer then uses a separate decoration bitmap for the shadow/outline colour role, so no extra per-effect bitmap generation is done at runtime.

Drop shadow

// 1px shadow shifted right and down
static constexpr auto font_shadowed = gba::text::with_shadow<1, 1>(base_font);

The template arguments are <ShadowDX, ShadowDY>. The shadow pixels are only drawn where they do not overlap the foreground glyph, so they never occlude text.

Outline

// 1px outline around every glyph
static constexpr auto font_outlined = gba::text::with_outline<1>(base_font);

The template argument is <OutlineThickness>. Each glyph is expanded by thickness pixels in every direction; the outline pixels form a separate decoration mask that is drawn in the shadow colour role.

Both variants return a new font type compatible with all drawing functions - pass them wherever a plain font is accepted.


Streams

A stream wraps a text source and exposes single-character iteration plus a lookahead used by the word-wrap algorithm.

C-string stream

gba::text::stream_metrics metrics{.letter_spacing_px = 1};
auto s = gba::text::cstr_stream{gba::text::cstr_source{"HP: 42/99"}};

Format generator stream

static constexpr auto fmt = "HP: {hp}/{max}"_fmt;

auto gen = fmt.generator("hp"_arg = hp, "max"_arg = max_hp);
auto s   = gba::text::stream(gen, font, metrics);

The generator is copied for lookahead, so it must be copyable (all format generators are).

There is currently no stream(const char*, ...) convenience overload; use cstr_stream{cstr_source{...}} for C-strings.

Inline colour escapes

In one_plane_full_color mode, embed palette switches directly in the text using the literal escape sequence \x1B followed by a hex digit (0-F).

// Hex digit = palette nibble: 0-9 = nibbles 0-9, A-F = nibbles 10-15
std::string msg = "Status: \x1B2Error\x1B3 - \x1B1OK";
//                         ^^         ^^       ^^
//                         red        yellow   white

The escape code is consumed silently; it never appears as text and does not affect glyph counts or word-wrap measurements. The active nibble resets to 1 (foreground) at the start of each draw_stream or cursor call.

See Full-colour mode for how to configure the palette and the layer to use one_plane_full_color.


Drawing

draw_stream - batch rendering

Renders a full stream in one call, with layout, word wrapping, and optional character limit for partial reveals:

gba::text::stream_metrics metrics{
    .letter_spacing_px = 1,
    .line_spacing_px   = 2,
    .tab_width_px      = 32,
    .wrap_width_px     = 220,
};

// Draw everything
auto count = layer.draw_stream(font, "HP: 42/99", /*x=*/8, /*y=*/16, metrics);

// Draw only the first 10 characters (typewriter snapshot)
auto count = layer.draw_stream(font, "HP: 42/99", 8, 16, metrics, /*max_chars=*/10);

Returns the number of emitted characters (including whitespace/newlines). Inline colour escape sequences are consumed and are not included in the count.

draw_char - single glyph

// Returns the advance width in pixels
auto advance = layer.draw_char(font, static_cast<unsigned int>('A'), pen_x, baseline_y);

make_cursor + cursor object - incremental typewriter

make_cursor(...) returns a cursor object that draws one character per next() call, maintaining cursor position between calls. Use next_visible() to skip whitespace and advance the cursor in the same call, so a typewriter effect never wastes a frame on a space:

auto cursor = layer.make_cursor(font, s, /*start_x=*/0, /*start_y=*/0, metrics);

// In the update loop - one visible glyph per frame:
if (!cursor.next_visible()) {
    // stream exhausted - restart or do something else
}

The cursor also exposes:

MethodDescription
next()Draws the next character step; returns true while characters remain
next_visible()Draws the next non-whitespace character; skips layout whitespace in one call
emitted()Total processed characters so far
done()true when the stream is exhausted
operator()()Shorthand for next()

To restart a typewriter sequence, re-create the layer (to clear tile state) and construct a fresh cursor:

// Reset tile allocator and layer, then create a new cursor
alloc = {.next_tile = 1, .end_tile = 512};
layer = layer_type{31, config, alloc, cell_state};
cursor = layer.make_cursor(font, new_stream, 0, 0, metrics);

Full-colour mode

one_plane_full_color maps nibble values directly to palette entries, giving access to up to 15 distinct foreground colours in a single bg4bpp_text_layer.

constexpr auto config = gba::text::bitplane_config{
    .profile    = gba::text::bitplane_profile::one_plane_full_color,
    .palbank_0  = 3,
    .start_index = 0,   // must be 0 so nibble 0 = transparent
};

Inline colour escapes

Use the text-format palette extension (:pal) to emit inline colour escapes in generated text (see Streams – Inline colour escapes above for the escape semantics). At present, the :pal argument is emitted as a single character and decoded as a hex digit, so pass '1'..'9' or 'A'..'F' ('0' remains reserved for transparent).

using namespace gba::literals;

constexpr gba::text::text_format<"HP {fg:pal}{hp}/{max}"> fmt{};
auto gen = fmt.generator("fg"_arg = '2', "hp"_arg = hp, "max"_arg = max_hp);
auto s = gba::text::stream(gen, font, metrics);

Make sure the corresponding palette entries are populated. set_theme fills nibbles 1 (foreground) and 2 (shadow); write additional entries directly:

gba::text::set_theme(config, {
    .background = {},             // nibble 0 = transparent
    .foreground = "white"_clr,   // nibble 1
    .shadow     = "#FF4444"_clr, // nibble 2 -- repurposed as accent red
});

// Extra colours beyond the three theme roles
gba::pal_bg_mem[config.palbank_0 * 16 + 3] = "#FFFF00"_clr; // nibble 3 = yellow
gba::pal_bg_mem[config.palbank_0 * 16 + 4] = "#88FF88"_clr; // nibble 4 = green

API reference

bitplane_config

FieldTypeDescription
profilebitplane_profilePlane/colour role layout
palbank_0unsigned charPalette bank for plane 0 (255 = unused)
palbank_1unsigned charPalette bank for plane 1 (255 = unused)
palbank_2unsigned charPalette bank for plane 2 (255 = unused)
start_indexunsigned charFirst occupied entry within each bank

stream_metrics

FieldDefaultDescription
letter_spacing_px0Extra pixels between glyphs
line_spacing_px0Extra pixels between lines
tab_width_px32Width of a tab character in pixels
wrap_width_px0xFFFFMaximum line width before wrapping

linear_tile_allocator

Simple bump allocator over a VRAM tile range. Reset it by re-assigning the struct:

alloc = {.next_tile = 1, .end_tile = 512};

bg4bpp_text_layer<Width, Height>

MethodDescription
draw_char(font, encoding, x, y)Draw a single glyph; returns advance width
draw_stream(font, const char* str, x, y, metrics [, max_chars])Draw a full C-string with layout
make_cursor(font, s, x, y, metrics)Create an incremental cursor object
clear()Reset all tile allocations and clear the tilemap to background
uses_full_color()true when the profile is one_plane_full_color

Notes

  • Word wrapping only occurs at word starts (after a break character). Long tokens are allowed to overflow rather than wrapping one character per line.
  • The bitplane renderer uses mixed-radix encoding so multiple planes can share a 4bpp tile while selecting different palette banks.
  • start_index = 0 is required when using one_plane_full_color so that nibble 0 maps to palette index 0 (transparent in 4bpp tile mode).
  • with_shadow and with_outline bake the effect into separate decoration bitmaps at compile time; rendering cost is the same as a plain font plus one extra pass per glyph for the decoration pixels.

Embedding Fonts (BDF)

stdgba embeds bitmap fonts at compile time from BDF files through gba::embed::bdf in <gba/embed>.

BDF format reference: Glyph Bitmap Distribution Format (Wikipedia).

This gives you a typed font object with:

  • per-glyph metrics and offsets,
  • packed 1bpp glyph bitmap data,
  • helpers for BIOS BitUnPack parameters,
  • lookup with fallback to DEFAULT_CHAR.

Quick start

#include <array>
#include <gba/embed>

static constexpr auto font = gba::embed::bdf([] {
    return std::to_array<unsigned char>({
#embed "9x18.bdf"
    });
});

static_assert(font.glyph_count > 0);

The returned type is gba::embed::bdf_font_result<GlyphCount, BitmapBytes>.

Demo

The demo below embeds multiple BDF files and renders them in one text layer.

Demo fonts used:

  • 6x13B.bdf
  • HaxorMedium-12.bdf

Font source: IT-Studio-Rech/bdf-fonts.

The demo applies with_shadow<1, 1> to both embedded fonts and uses the two_plane_three_color profile so the shadow pass is visible.


#include <gba/bios>
#include <gba/embed>
#include <gba/interrupt>
#include <gba/text>

#include <array>

int main() {
    using namespace gba::literals;

    static constexpr auto base_font_ui = gba::embed::bdf([] {
        return std::to_array<unsigned char>({
#embed "6x13B.bdf"
        });
    });

    static constexpr auto base_font_haxor = gba::embed::bdf([] {
        return std::to_array<unsigned char>({
#embed "HaxorMedium-12.bdf"
        });
    });

    static constexpr auto font_ui = gba::text::with_shadow<1, 1>(base_font_ui);
    static constexpr auto font_haxor = gba::text::with_shadow<1, 1>(base_font_haxor);

    gba::irq_handler = {};
    gba::reg_dispstat = {.enable_irq_vblank = true};
    gba::reg_ie = {.vblank = true};
    gba::reg_ime = true;

    gba::reg_dispcnt = {.video_mode = 0, .enable_bg0 = true};
    gba::reg_bgcnt[0] = {.screenblock = 31};

    constexpr auto config = gba::text::bitplane_config{
        .profile = gba::text::bitplane_profile::two_plane_three_color,
        .palbank_0 = 1,
        .palbank_1 = 2,
        .start_index = 1,
    };

    constexpr auto theme = gba::text::bitplane_theme{
        .background = "#1A2238"_clr,
        .foreground = "#F6F7FB"_clr,
        .shadow = "#0A1020"_clr,
    };

    gba::text::set_theme(config, theme);
    gba::pal_bg_mem[0] = theme.background;

    gba::text::linear_tile_allocator alloc{.next_tile = 1, .end_tile = 512};
    using layer_type = gba::text::bg4bpp_text_layer<240, 160>;
    static layer_type::cell_state_map cell_state{};
    layer_type layer{31, config, alloc, cell_state};

    // Stream metrics for layout
    gba::text::stream_metrics title_metrics{
        .letter_spacing_px = 0,
        .line_spacing_px = 0,
        .tab_width_px = 32,
        .wrap_width_px = 224,
    };
    gba::text::stream_metrics body_metrics{
        .letter_spacing_px = 1,
        .line_spacing_px = 1,
        .tab_width_px = 32,
        .wrap_width_px = 224,
    };

    layer.draw_stream(font_haxor, "Embedded BDF fonts", 4, 8, title_metrics);

    layer.draw_stream(font_haxor, "HaxorMedium-12: ABC abc 0123", 4, 34, body_metrics);

    layer.draw_stream(font_ui, "6x13B: GBA text layer sample", 4, 64, body_metrics);

    layer.draw_stream(font_ui, "glyph_or_default + BitUnPack-ready rows", 4, 84, body_metrics);

    layer.flush_cache();

    while (true) {
        gba::VBlankIntrWait();
    }
}

Embedded fonts demo

What embed::bdf(...) parses

The parser expects standard text BDF structure and reads these fields:

  • font-level:
    • FONTBOUNDINGBOX
    • CHARS
    • FONT_ASCENT and FONT_DESCENT (from STARTPROPERTIES block)
    • DEFAULT_CHAR (optional, from STARTPROPERTIES)
  • per-glyph:
    • STARTCHAR / ENDCHAR
    • ENCODING
    • DWIDTH
    • BBX
    • BITMAP

It validates glyph counts and bitmap row sizes at compile time.

BDF to GBA bitmap packing

Each BITMAP row is packed to 1bpp bytes in a BIOS-friendly way:

  • leftmost source pixel is written to bit 0 (LSB),
  • rows are stored in row-major order,
  • byte width is (glyph_width + 7) / 8.

This layout is designed so BitUnPack can expand glyph rows directly.

Using glyph metadata

const auto& g = font.glyph_or_default(static_cast<unsigned int>('A'));

auto width_px = g.width;
auto height_px = g.height;
auto advance_px = g.dwidth;

Useful members on glyph:

  • encoding
  • dwidth
  • width, height
  • x_offset, y_offset
  • bitmap_offset
  • bitmap_byte_width
  • bitmap_bytes()

Accessing bitmap data and BitUnPack headers

#include <gba/bios>

const auto& g = font.glyph_or_default(static_cast<unsigned int>('A'));
const unsigned char* src = font.bitmap_data(g);

auto unpack = g.bitunpack_header(
    /*dst_bpp=*/4,
    /*dst_ofs=*/1,
    /*offset_zero=*/false
);

// Example destination buffer for expanded glyph data
unsigned int expanded[128]{};

gba::BitUnPack(src, expanded, unpack);

You can also fetch by encoding directly:

const unsigned char* src = font.bitmap_data(static_cast<unsigned int>('A'));
auto unpack = font.bitunpack_header(static_cast<unsigned int>('A'));

Fallback behaviour

glyph_or_default(encoding) resolves in this order:

  1. exact glyph encoding,
  2. DEFAULT_CHAR (if present and found),
  3. glyph index 0.

This makes rendering robust when text includes characters not present in your BDF.

Font variants for text rendering

After embedding, you can generate compile-time variants for the text renderer:

#include <gba/text>

static constexpr auto font_shadow = gba::text::with_shadow<1, 1>(font);
static constexpr auto font_outline = gba::text::with_outline<1>(font);

These variants keep the same font-style API but add pre-baked decoration masks.

See also

Embedding Images

The <gba/embed> header converts image files into GBA-ready data entirely at compile time. Combined with C23’s #embed directive, this replaces external asset pipelines like grit with a single #include and a constexpr variable.

For procedural sprite generation without source image files, see Shapes. For animated sprite-sheet workflows, see Animated Sprite Sheets. For type-level API details, see Embedded Sprite Type Reference.

This page focuses on still images: framebuffers, tilemaps, and single-frame sprites.

Supported formats

FormatVariantsTransparency
PPM24-bit RGBIndex 0
PNGGrayscale, RGB, indexed, grayscale+alpha, RGBA (8-bit channels)Alpha < 50%
TGAUncompressed, RLE, true-colour (15/16/24/32bpp), colour-mapped, grayscaleAlpha < 50%

Format is auto-detected from the file header.

Conversion functions

FunctionOutputBest for
bitmap15Flat gba::color arrayMode 3 or software blitters
indexed44bpp sprite payload + 16-colour palette + tilemapBackgrounds and 4bpp sprites
indexed88bpp tiles + 256-colour palette + tilemap8bpp backgrounds
indexed4_sheet<FrameW, FrameH>sheet4_resultAnimated OBJ sheets; covered on the next page

All converters take a supplier lambda returning std::array<unsigned char, N>.

Quick start

#include <gba/embed>

static constexpr auto bg = gba::embed::indexed4([] {
    return std::to_array<unsigned char>({
#embed "background.png"
    });
});

static constexpr auto hero = gba::embed::indexed4<gba::embed::dedup::none>([] {
    return std::to_array<unsigned char>({
#embed "hero.png"
    });
});

Use dedup::none for OBJ sprites so tiles stay in 1D sequential order. Use the default dedup::flip for backgrounds to save VRAM when tiles repeat.

Example: scrollable background with sprite

This demo embeds a 512x256 background image and a 16x16 character sprite, both as PNG files. The D-pad scrolls the background, and holding A + D-pad moves the sprite:

#include <gba/bios>
#include <gba/embed>
#include <gba/interrupt>
#include <gba/keyinput>
#include <gba/video>

#include <cstring>

constexpr auto bg = gba::embed::indexed4([] {
    return std::to_array<unsigned char>({
#embed "bg_2x1.png"
    });
});

constexpr auto hero = gba::embed::indexed4<gba::embed::dedup::none>([] {
    return std::to_array<unsigned char>({
#embed "sprite.png"
    });
});

int main() {
    gba::irq_handler = {};
    gba::reg_dispstat = {.enable_irq_vblank = true};
    gba::reg_ie = {.vblank = true};
    gba::reg_ime = true;

    gba::reg_dispcnt = {.video_mode = 0, .linear_obj_tilemap = true, .enable_bg0 = true, .enable_obj = true};
    gba::reg_bgcnt[0] = {.screenblock = 30, .size = 1}; // 512x256

    for (auto&& x : gba::obj_mem) {
        x = {.disable = true};
    }

    // Background palette + tiles
    std::memcpy(gba::memory_map(gba::pal_bg_mem), bg.palette.data(), sizeof(bg.palette));
    std::memcpy(gba::memory_map(gba::mem_tile_4bpp[0]), bg.sprite.data(), bg.sprite.size());

    // Background map: stored in screenblock order, memcpy directly
    std::memcpy(gba::memory_map(gba::mem_se[30]), bg.map.data(), sizeof(bg.map));

    // Sprite palette + tiles (no deduplication - sequential for 1D mapping)
    std::memcpy(gba::memory_map(gba::pal_obj_bank[0]), hero.palette.data(), sizeof(hero.palette));
    std::memcpy(gba::memory_map(gba::mem_vram_obj), hero.sprite.data(), hero.sprite.size());

    int scroll_x = 0, scroll_y = 0;
    int sprite_x = 112, sprite_y = 72;

    gba::object hero_obj = hero.sprite.obj();
    hero_obj.y = static_cast<unsigned short>(sprite_y & 0xFF);
    hero_obj.x = static_cast<unsigned short>(sprite_x & 0x1FF);
    gba::obj_mem[0] = hero_obj;

    gba::keypad keys;
    for (;;) {
        gba::VBlankIntrWait();
        keys = gba::reg_keyinput;

        if (keys.held(gba::key_a)) {
            // A + D-pad moves the sprite
            sprite_x += keys.xaxis();
            sprite_y += keys.i_yaxis();

            hero_obj.y = static_cast<unsigned short>(sprite_y & 0xFF);
            hero_obj.x = static_cast<unsigned short>(sprite_x & 0x1FF);
            gba::obj_mem[0] = hero_obj;
        } else {
            // D-pad scrolls the background
            scroll_x += keys.xaxis();
            scroll_y += keys.i_yaxis();

            gba::reg_bgofs[0][0] = static_cast<short>(scroll_x);
            gba::reg_bgofs[0][1] = static_cast<short>(scroll_y);
        }
    }
}

Scrollable background with sprite

How it works

The background uses a 2x1 screenblock layout (size = 1 in reg_bgcnt), giving 64x32 tiles (512x256 pixels). The indexed4 map is stored in GBA screenblock order, so the entire map can be written to VRAM with one std::memcpy.

The sprite uses dedup::none so its tiles remain sequential - exactly what the GBA expects for 1D OBJ mapping. Without this, deduplication could merge mirrored tiles and break the sprite layout.

Transparent pixels (alpha < 128 in the PNG source) become palette index 0, so the hardware automatically shows the background through the sprite.

Tile deduplication

The indexed4 and indexed8 converters accept a dedup mode as a template parameter:

ModeBehaviourUse case
dedup::flip (default)Matches identity, horizontal flip, vertical flip, and bothBackground tilemaps
dedup::identityMatches exact duplicates onlyTilemaps without flip support
dedup::noneNo deduplication; tiles stay sequentialOBJ sprites
using gba::embed::dedup;

constexpr auto bg = gba::embed::indexed4(supplier);
constexpr auto obj = gba::embed::indexed4<dedup::none>(supplier);

When dedup::flip is active, matching tiles reuse an existing tile index and encode flip flags in the emitted screen_entry. This keeps map VRAM usage low for symmetric art.

Sprite OAM helpers

When image dimensions match a valid GBA sprite size, indexed4 returns a sprite payload with obj() and obj_aff() helpers:

constexpr auto sprite = gba::embed::indexed4<gba::embed::dedup::none>([] {
    return std::to_array<unsigned char>({
#embed "sprite.png"
    });
});

gba::obj_mem[0] = sprite.sprite.obj(0);
gba::obj_aff_mem[0] = sprite.sprite.obj_aff(0);

Valid sprite sizes:

ShapeSizes
Square8x8, 16x16, 32x32, 64x64
Wide16x8, 32x8, 32x16, 64x32
Tall8x16, 8x32, 16x32, 32x64

If the source image does not match one of those shapes, obj() and obj_aff() fail at compile time.

Transparency and palettes

  • PPM: palette index 0 is always reserved as transparent; the first visible colour becomes index 1.
  • PNG: RGBA/GA alpha maps transparent pixels (alpha < 128) to palette index 0.
  • TGA: 32bpp alpha and 16bpp attribute-bit transparency map transparent pixels (alpha < 128) to palette index 0.
  • indexed4: images may spread across multiple palette banks when background tiles use <= 15 opaque colours per tile.
  • indexed8: one 256-entry palette is shared across the whole image.

Constexpr evaluation limits

All image conversion happens at compile time. Large assets can hit GCC’s constexpr operation limit. If you see constexpr evaluation operation count exceeds limit, raise the limit for that target:

target_compile_options(my_target PRIVATE -fconstexpr-ops-limit=335544320)

Small sprites usually fit within default limits. Large backgrounds, especially 512x256 maps, often need a higher ceiling.

Animated Sprite Sheets

gba::embed::indexed4_sheet<FrameW, FrameH>() turns one sprite-sheet image into frame-packed OBJ tile data at compile time. It is the animation-oriented sibling to Embedding Images: same file formats, same supplier-lambda pattern, but a different output shape tuned for OBJ 1D mapping.

For procedural sprite generation without source image files, see Shapes. For type-level API details, see Animated Sprite Sheet Type Reference.

When to use indexed4_sheet

Use indexed4_sheet when:

  • one source image contains multiple animation frames
  • every frame has the same width and height
  • you want each frame’s tiles laid out contiguously in OBJ VRAM
  • you want compile-time flipbook helpers instead of manual tile math

Use plain indexed4<dedup::none>() when you only need one static sprite frame.

Quick start

#include <cstring>
#include <gba/embed>
#include <gba/video>

static constexpr auto actor = gba::embed::indexed4_sheet<16, 16>([] {
	return std::to_array<unsigned char>({
#embed "actor.png"
	});
});

static constexpr auto walk = actor.ping_pong<0, 3>();

const auto base_tile = gba::tile_index(gba::memory_map(gba::mem_vram_obj));
std::memcpy(gba::memory_map(gba::mem_vram_obj), actor.sprite.data(), actor.sprite.size());

unsigned int frame = walk.frame(tick / 8);

gba::obj_mem[0] = actor.frame_obj(base_tile, frame, 0);

The converter validates at compile time that:

  • the full image width is a multiple of FrameW
  • the full image height is a multiple of FrameH
  • FrameW x FrameH is a valid GBA OBJ size
  • the whole sheet fits a single 15-colour palette plus transparent index 0

What sheet4_result gives you

Member / helperPurpose
paletteShared OBJ palette bank for every frame
spriteFrame-packed 4bpp tile payload ready for OBJ VRAM upload
tile_offset(frame)Tile offset for a frame, useful with manual tile_index management
frame_obj(base, frame, pal)Regular OAM helper for one frame
frame_obj_aff(base, frame, pal)Affine OAM helper for one frame
forward<Start, Count>()Compile-time sequential flipbook
ping_pong<Start, Count>()Compile-time forward-then-reverse flipbook
sequence<"...">()Explicit frame order via string literal
row<R>()Row-scoped flipbook builder for multi-row sheets

How frames are laid out

The important difference from plain indexed4() is tile order. indexed4_sheet() repacks tiles frame-by-frame so the GBA can step through animation frames with simple tile offsets.

Source sheet (2 rows x 4 columns, 16x16 frames)

+----+----+----+----+
| f0 | f1 | f2 | f3 |
+----+----+----+----+
| f4 | f5 | f6 | f7 |
+----+----+----+----+

OBJ tile payload emitted by indexed4_sheet

[f0 tiles][f1 tiles][f2 tiles][f3 tiles][f4 tiles][f5 tiles][f6 tiles][f7 tiles]

That means tile_offset(frame) is simply:

frame * tiles_per_frame

No runtime repacking step is needed.

Flipbook builders

Sequential animation

static constexpr auto idle = actor.forward<0, 4>();

Frames: 0, 1, 2, 3

Ping-pong animation

static constexpr auto walk = actor.ping_pong<0, 4>();

Frames: 0, 1, 2, 3, 2, 1

Explicit frame order

static constexpr auto attack = actor.sequence<"01232100">();

Each character selects a frame index. 0-9 map to frames 0-9, a-z continue from 10 upward, and A-Z map the same way as lowercase.

Row-based sheets

For RPG Maker style character sheets with one direction per row, use row<R>() to scope animations to a single row.

static constexpr auto actor = gba::embed::indexed4_sheet<16, 16>([] {
	return std::to_array<unsigned char>({
#embed "hero_walk.png"
	});
});

static constexpr auto down  = actor.row<0>().ping_pong<0, 3>();
static constexpr auto left  = actor.row<1>().ping_pong<0, 3>();
static constexpr auto right = actor.row<2>().ping_pong<0, 3>();
static constexpr auto up    = actor.row<3>().ping_pong<0, 3>();

Row helpers still produce sheet-global frame indices, so the result plugs directly into frame_obj() and tile_offset().

A practical render loop

#include <algorithm>
#include <cstring>
#include <gba/bios>
#include <gba/embed>
#include <gba/video>

static constexpr auto actor = gba::embed::indexed4_sheet<16, 16>([] {
	return std::to_array<unsigned char>({
#embed "actor.png"
	});
});

static constexpr auto walk = actor.ping_pong<0, 4>();

int main() {
	gba::reg_dispcnt = {
		.video_mode = 0,
		.linear_obj_tilemap = true,
		.enable_obj = true,
	};

	std::copy(actor.palette.begin(), actor.palette.end(), gba::pal_obj_bank[0]);
	std::memcpy(gba::memory_map(gba::mem_vram_obj), actor.sprite.data(), actor.sprite.size());

	unsigned int tick = 0;
	const auto base_tile = gba::tile_index(gba::memory_map(gba::mem_vram_obj));

	while (true) {
		gba::VBlankIntrWait();
		const unsigned int frame = walk.frame(tick / 8);
		auto obj = actor.frame_obj(base_tile, frame, 0);
		obj.x = 112;
		obj.y = 72;
		gba::obj_mem[0] = obj;
		++tick;
	}
}

Palette and colour limits

indexed4_sheet builds one shared 16-entry OBJ palette:

  • palette index 0 stays transparent
  • the whole sheet may use at most 15 opaque colours total
  • unlike background-oriented indexed4(), sheet conversion does not spread tiles across multiple palette banks

That trade-off keeps every frame interchangeable at one base tile and one OBJ palette bank.

Compile-time failure modes

Typical compile-time diagnostics are:

  • frame width or height not divisible into the source image
  • source image not aligned to 8x8 tile boundaries
  • frame dimensions not matching a legal OBJ size
  • more than 15 opaque colours across the whole sheet
  • invalid frame index in forward, ping_pong, sequence, or row

Choosing between the asset paths

WorkflowBest for
ShapesSimple geometric sprites, HUD markers, debug art, zero external assets
Embedding ImagesStatic backgrounds, portraits, logos, and one-frame sprites
indexed4_sheet()Animated sprite sheets with compile-time frame selection

Music Composition

The GBA has four PSG (Programmable Sound Generator) channels: two square waves, one wave (sample) channel, and one noise channel. Rather than manually writing register values, stdgba lets you compose music using Strudel notation (a text-based mini-language for patterns) and compiles it to an optimised event table at build time.

Quick Start

#include <gba/music>
#include <gba/peripherals>
#include <gba/bios>

using namespace gba::music;
using namespace gba::music::literals;

int main() {
    // Enable sound output
    gba::reg_soundcnt_x = { .master_enable = true };
    gba::reg_soundcnt_l = {
        .volume_right = 7, .volume_left = 7,
        .enable_1_right = true, .enable_1_left = true,
        .enable_2_right = true, .enable_2_left = true,
        .enable_3_right = true, .enable_3_left = true,
        .enable_4_right = true, .enable_4_left = true
    };
    gba::reg_soundcnt_h = { .psg_volume = 2 };

    // Compile a simple melody
    static constexpr auto music = compile(note("c4 e4 g4 c5"));

    // Play it in a loop
    auto player = music_player<music>{};
    while (player()) {
        gba::VBlankIntrWait();
    }
}

Pattern Syntax

Patterns use Strudel notation. Here’s the reference:

SyntaxMeaningExample
c4 e4 g4Sequence (space-separated notes)"c4 e4 g4"
~Rest (silence)"c4 ~ g4"
_Hold/tie (sustain, no retrigger)"c4 _ _" (hold for 3 steps)
[a b]Subdivision (fit into one parent step)"[c4 d4] e4"
<a b c>Alternating (cycle through each step)"<c4 d4 e4>"
<a, b>Parallel layers (commas create stacked voices)"<c4, g3>"
a@3Elongation (weight = 3)"c4@3 e4"
a!3Replicate (repeat 3 times equally)"c4!3"
a*2Fast (play 2x in one step)"c4*2"
a/2Slow (stretch over 2 cycles)"c4/2"
(3,8)Euclidean rhythm (Bjorklund: 3 pulses in 8 steps)"c4(3,8)"
eb3Flat notation (Eb3 = D#3)"eb3 f3 g3"

Creating Melodies with note()

note() is the main function for creating pitched patterns:

// Single melody (auto-assigned to square 1)
auto melody = note("c4 e4 g4 c5");

// With modifiers
auto fast = note("c4*2 e4*2");  // Double speed
auto slow = note("c4/2");        // Stretch over 2 cycles
auto rests = note("c4 ~ ~ e4");  // With silences

All notes from C2 to B8 are supported. Octave-1 notes (C1-B1) are rejected at compile time because the PSG hardware cannot represent those frequencies.

Multi-Voice Patterns with Stacking

Create parallel voices using commas inside <>:

// Two voices: melody (sq1) + bass (sq2)
static constexpr auto music = compile(
    note("<c4 e4 g4 c5, c3 c3 c3 c3>")
);

// Or use the stack() combinator
static constexpr auto music = compile(
    stack(
        note("c4 e4 g4 c5"),
        note("c3 c3 c3 c3"),
        s("bd sd bd sd")  // Drums on noise channel
    )
);

The layers are auto-assigned to channels in order: square 1 -> square 2 -> wave -> noise.

PSG Channels (CH1-CH4)

Use one page per channel when you need hardware details:

Quick inline examples:

using namespace gba::music;
using namespace gba::music::literals;

auto lead = "c4 e4 g4 c5"_sq1;
auto bass = note("c3 c3 g2 g2").channel(channel::sq2);
auto pad = note("c4 _ g4 _").channel(channel::wav, waves::triangle);
auto drums = s("bd sd hh sd");

static constexpr auto song = compile(loop(stack(lead, bass, pad, drums)));

Drums with s()

The s() function creates drum patterns using Strudel percussion names. It auto-assigns to the noise channel:

// Kick + snare beat
auto beat = s("bd sd bd sd");

// Euclidean kick pattern
auto kick = s("bd(3,8)");

// Complex drum pattern
auto drums = s("bd [sd rim]*2 bd sd");

20 drum presets are supported: bd, sd, hh, oh, cp, rs, rim, lt, mt, ht, cb, cr, rd, hc, mc, lc, cl, sh, ma, ag.

Chaining with Sequential (seq())

Combine multiple patterns sequentially. Instrument changes are emitted at boundaries:

static constexpr auto music = compile(
    loop(
        seq(
            note("c4 e4 g4 c5"),
            note("d4 f4 a4 d5"),
            note("e4 g4 b4 e5")
        )
    )
);

Compile-Time Tempos

By default, compile() uses 0.5 cycles-per-second (120 BPM in 4/4). Override it:

// Explicit BPM
static constexpr auto music = compile<120_bpm>(note("c4 e4 g4"));

// Or cycles-per-second
static constexpr auto music = compile<1_cps>(note("c4 e4 g4"));

// Or cycles-per-minute
static constexpr auto music = compile<30_cpm>(note("c4 e4 g4"));

Pattern Functions

All patterns support transformation methods:

auto melody = note("c4 e4 g4 c5");

melody.add(12);       // Transpose up one octave
melody.sub(5);        // Transpose down 5 semitones
melody.rev();         // Reverse the sequence
melody.ply(2);        // Stutter (repeat each note 2x)
melody.press();       // Staccato (half duration + rest)
melody.late(1, 8);    // Shift 1/8 cycle later (swing)

User-Defined Literal Shorthands

For convenience, single-note assignments use UDLs:

using namespace gba::music::literals;

auto melody = "c4 e4 g4"_sq1;   // Assign to square 1
auto bass = "c3 c3"_sq2;         // Assign to square 2
auto sample = "c4 d4"_wav;       // Use wave channel
auto drums = "bd sd hh"_s;       // Drums (noise channel)

WAV Channel & Custom Waveforms

The wave channel (CH3) can play 4-bit sampled audio. Use built-in waveforms or embed .wav files:

For a deeper guide to wav_embed(), resampling limits, and custom sample authoring, see Embedded WAV Samples.

// Built-in waveforms
using namespace gba::music::waves;

auto melody = note("c4 e4 g4").channel(channel::wav, sine);

// Embed a .wav file (requires C++26 #embed and GCC 15+)
static constexpr auto piano = gba::music::wav_embed([] {
    return std::to_array<unsigned char>({
#embed "Piano.wav"
    });
});

static constexpr auto music = compile(
    note("<c4 e4 g4, c3>")
        .channels(layer_cfg{channel::wav, piano}, channel::sq2)
);

Playing Music

Use music_player with NTTP (non-type template parameter) syntax:

static constexpr auto music = compile(note("c4 e4 g4 c5"));

auto player = music_player<music>{};  // Pass as template argument

// Play in VBlank loop
while (player()) {
    gba::VBlankIntrWait();
}

music_player::operator() returns false when the pattern ends (for non-looping patterns) or loops forever.

Performance

Music playback uses tail-call recursive dispatch over compile-time batches. Per-frame cost:

  • Idle frame (no events): ~400 cycles (~0.6% of VBlank)
  • 4-channel batch dispatch: ~760 cycles (~1.1% of VBlank)

This leaves >99% of VBlank budget for game logic.

Embedded WAV Samples

The <gba/music> header provides consteval WAV parsing and resampling for the GBA’s wave channel (PWM output with 64??4-bit custom waveforms). Combined with C23’s #embed directive, custom acoustic instruments and samples can be baked into the ROM at compile time.

For procedural sprite generation, see Shapes. For music composition with square-wave channels, see Music Composition.

Why embed WAV samples

The GBA wave channel (CH4) plays back a 64-sample, 4-bit waveform at a frequency determined by the timer reload value. Instead of generic square/triangle/saw tones, embedded PCM samples add:

  • Acoustic instruments: Piano, flute, bells, drums
  • Sound effects: Explosions, coins, hits, chimes
  • Complex timbres: Any 64-sample periodic waveform

Since the GBA only has 32 KB of EWRAM and 256 KB of WRAM, samples must be highly compressed. The 4-bit quantization and 64-sample limit constraint audio to short, punchy instruments - not long-form music or speech.

WAV embedding API

FunctionInputOutputUse case
wav_embed()C-array or supplier lambdastd::array<uint8_t, 64>Parse .wav file + resample
wav_from_samples()std::array<uint8_t, 64> (4-bit values 0-15)std::array<uint8_t, 64>Direct 4-bit waveform data
wav_from_pcm8()const uint8_t (&data)[N] (8-bit PCM)std::array<uint8_t, 64>Resample 8-bit PCM to 64 samples

All three are consteval and produce compile-time waveform constants.

Built-in waveforms (no file needed):

WaveformAccessDescription
Sinegba::music::waves::sineSmooth sine wave
Trianglegba::music::waves::triangleContinuous triangle
Sawtoothgba::music::waves::sawLinear sawtooth
Squaregba::music::waves::square50% duty cycle

Simple example: embedded Piano

The demo_hello_audio_wav demo plays a four-note jingle using embedded Piano.wav:

#include <gba/bios>
#include <gba/interrupt>
#include <gba/keyinput>
#include <gba/music>
#include <gba/peripherals>

#include <array>

using namespace gba::music;
using namespace gba::music::literals;

namespace {

    // Embed Piano.wav sample data for the wav channel (64 x 4-bit waveform).
    // The wav_embed() function parses RIFF/WAV headers and resamples to GBA format.
    static constexpr auto piano = wav_embed([] {
        return std::to_array<unsigned char>({
#embed "Piano.wav"
        });
    });

    // A simple melodic phrase played on the wav channel with embedded Piano timbre.
    // Press A to restart playback.
    // .press() applies staccato: each note plays for half duration, rest for half.
    // Compiled at 1_cps (1 cycle per second) for slower, more legato playback.
    static constexpr auto jingle = compile<1_cps>(note("c5 e5 g5 c6").channel(channel::wav, piano).press());

} // namespace

int main() {
    gba::irq_handler = {};
    gba::reg_dispstat = {.enable_irq_vblank = true};
    gba::reg_ie = {.vblank = true};
    gba::reg_ime = true;

    // Basic PSG routing for the WAV channel on both speakers.
    gba::reg_soundcnt_x = {.master_enable = true};
    gba::reg_soundcnt_l = {
        .volume_right = 7,
        .volume_left = 7,
        .enable_4_right = true,
        .enable_4_left = true,
    };
    gba::reg_soundcnt_h = {.psg_volume = 2};

    gba::keypad keys;
    auto player = music_player<jingle>{};

    while (true) {
        gba::VBlankIntrWait();
        keys = gba::reg_keyinput;

        if (keys.pressed(gba::key_a)) {
            player = {};
        }

        player();
    }
}

Place Piano.wav in the demos directory. The #embed directive is placed on its own line inside the compound initialiser braces.

Resampling and quantization

wav_embed() performs nearest-neighbor resampling: PCM samples are read from the RIFF/WAV file header (supporting mono/stereo, 8/16-bit formats) and resampled to exactly 64 x 4-bit samples for the GBA hardware. Stereo input is mixed to mono; stereo is not supported by the hardware.

Quantization from N-bit to 4-bit uses simple scaling: (sample >> (N - 4)). Complex samples (speech, noise) lose clarity; sine waves and simple acoustic timbres sound best.

Built-in waveforms

For fast prototyping without external .wav files:

#include <gba/music>

using namespace gba::music;

// Use compiled sine wave (always available)
auto sine_melody = compile(
    note("c4 e4 g4 c5").channel(channel::wav, waves::sine)
);

// Mix instruments: sine bass layer, square melody layer  
auto layered = compile(
    stack(
        note("c2 c2 c2 c2").channel(channel::wav, waves::sine),
        note("c5 e5 g5 c6").channel(channel::sq1)
    )
);

Advanced: custom waveforms from samples

For hand-crafted 4-bit waveforms, use wav_from_samples():

#include <gba/music>

// Organ pipe sound: 64 custom 4-bit values
static constexpr auto organ = gba::music::wav_from_samples(
    std::array<uint8_t, 64>{
        // First 16 samples of a custom profile
        15, 14, 12, 10, 8, 6, 5, 4, 4, 4, 5, 6, 8, 10, 12, 14,
        // Continue pattern...
        15, 14, 12, 10, 8, 6, 5, 4, 4, 4, 5, 6, 8, 10, 12, 14,
        15, 14, 12, 10, 8, 6, 5, 4, 4, 4, 5, 6, 8, 10, 12, 14,
        15, 14, 12, 10, 8, 6, 5, 4, 4, 4, 5, 6, 8, 10, 12, 14,
    }
);

auto synth = compile(
    note("c4 e4 g4 c5").channel(channel::wav, organ)
);

Values are clamped to 0-15 (4-bit range). Each full period should smoothly loop back to avoid clicks at the waveform boundary.

Practical constraints

  • 64 samples maximum: The GBA hardware uses a fixed 64-byte waveform buffer for CH4.
  • 4-bit quantization: ~24 dB dynamic range. Loud timpani and quiet pizzicato do not mix well.
  • No polyphony: Only one waveform plays at a time on CH4. Combine with stack() to play multiple square-wave channels simultaneously.
  • Frequency limits: WAV channel operates from ~32 Hz (timer reload = 255) to ~131 kHz (reload = 0). Most musical pitches fall in the 32 Hz-8 kHz range due to the timer’s integer reload values.

See Music Composition for combining WAV with square-wave and noise channels, and Channel WAV/CH4 for register-level details.

DMA Transfers

<gba/dma> gives you two layers of control:

  • raw register access (reg_dmasad, reg_dmadad, reg_dmacnt_l, reg_dmacnt_h, reg_dma)
  • helper constructors on gba::dma for common transfer patterns

Use the helper layer for most gameplay code, then drop to raw registers when you need an exact hardware setup.

For full register/type tables, see DMA Peripheral Reference.

Why DMA matters on GBA

DMA moves data without per-element CPU loops. Typical wins:

  • bulk tile/map/palette uploads
  • repeated clears/fills
  • VBlank/HBlank timed updates
  • DirectSound FIFO streaming

The ARM7TDMI is fast enough for logic, but memory traffic can eat frame budget quickly. DMA is the default path for larger copies.

Note: stdgba provides a hand-tuned implementation of std::memset/memclr (via the __aeabi_memset* entry points).

For large contiguous buffers in RAM (especially EWRAM), this can be faster than an immediate DMA fill.

API map

APIWhat it representsTypical use
reg_dmasad[4]source address register per channelmanual setup
reg_dmadad[4]destination address register per channelmanual setup
reg_dmacnt_l[4]transfer unit count per channelmanual setup
reg_dmacnt_h[4]dma_control flags per channeltiming, size, repeat, enable
reg_dma[4]combined volatile dma[4] descriptor writeone-shot configuration
dma_controllow-level control bitfieldexplicit register programming
dma::copy()immediate 32-bit copyVRAM/OAM/block copies
dma::copy16()immediate 16-bit copypalette or halfword tables
dma::fill()immediate 32-bit fill (src fixed)clears/pattern fills
dma::fill16()immediate 16-bit fillhalfword fills
dma::on_vblank()VBlank-triggered repeating transferper-frame buffered updates
dma::on_hblank()HBlank-triggered repeating transferscanline effects
dma::to_fifo_a()repeating FIFO A stream setupDirectSound A
dma::to_fifo_b()repeating FIFO B stream setupDirectSound B

Choosing helper vs raw registers

Use gba::dma helpers when:

  • transfer pattern is standard (copy/fill/vblank/hblank/fifo)
  • you want fewer control-bit mistakes
  • you do not need unusual flag combinations

Use raw registers when:

  • you need custom dma_control fields not covered by helper defaults
  • you are debugging exact channel state
  • you are doing unusual timing/control experiments

Immediate transfer examples

32-bit copy

#include <gba/dma>

// Copy 256 words now.
gba::reg_dma[3] = gba::dma::copy(src, dst, 256);

16-bit copy

#include <gba/dma>

// Copy 256 halfwords now.
gba::reg_dma[3] = gba::dma::copy16(src16, dst16, 256);

32-bit fill

#include <gba/dma>

static constexpr unsigned int zero = 0;
gba::reg_dma[3] = gba::dma::fill(&zero, dst, 1024);

fill() and fill16() use fixed-source mode; the source points at the value to repeat.

Timed transfer examples

VBlank repeating transfer

Useful for per-frame buffered copies such as OAM shadow updates.

#include <gba/dma>

// Run once per VBlank until disabled.
gba::reg_dma[3] = gba::dma::on_vblank(shadow_oam, oam_dst, 128);

// Later, stop channel 3.
gba::reg_dmacnt_h[3] = {};

HBlank repeating transfer (HDMA)

Useful for scanline effects (scroll gradients, wave distortions, etc.).

#include <gba/dma>

// One halfword per HBlank from a scanline table.
gba::reg_dma[0] = gba::dma::on_hblank(scanline_values, bg_hofs_reg_ptr, 1);

// Later, stop channel 0.
gba::reg_dmacnt_h[0] = {};

DirectSound FIFO streaming

#include <gba/dma>

// Common convention: DMA1 -> FIFO A, DMA2 -> FIFO B.
gba::reg_dma[1] = gba::dma::to_fifo_a(samples_a);
gba::reg_dma[2] = gba::dma::to_fifo_b(samples_b);

These helpers set fixed destination, repeat, 32-bit units, and sound FIFO timing.

Manual register setup (raw path)

Equivalent to helper-style configuration when you need full control:

#include <gba/dma>

gba::reg_dmasad[3] = src;
gba::reg_dmadad[3] = dst;
gba::reg_dmacnt_l[3] = 256;
gba::reg_dmacnt_h[3] = {
    .dest_op = gba::dest_op_increment,
    .src_op = gba::src_op_increment,
    .dma_type = gba::dma_type::word,
    .dma_cond = gba::dma_cond_now,
    .enable = true,
};

Safety and correctness notes

  • count/units means transfer units, not bytes.
    • dma_type::half -> halfwords
    • dma_type::word -> words
  • For fill() and repeating transfers, source memory must remain valid while DMA can still run.
  • Repeating channels keep firing until disabled (reg_dmacnt_h[n] = {}).
  • Channel conventions are common practice, not hard rules:
    • DMA0: HBlank effects
    • DMA1/DMA2: DirectSound FIFO
    • DMA3: bulk/general transfers
  • For VRAM/OAM writes, prefer VBlank/HBlank-safe timing patterns.

See also

Shapes

stdgba provides a consteval API for generating sprite pixel data from geometric shapes. All pixel data is computed at compile time and stored directly in ROM.

For file-based asset pipelines, see Embedding Images.

Quick start

#include <gba/shapes>
using namespace gba::shapes;

// Define 16x16 sprite geometry
constexpr auto sprite = sprite_16x16(
    circle(8.0, 8.0, 4.0),   // palette index 1
    rect(2, 2, 12, 12)        // palette index 2
);

// Load colours into palette memory
gba::pal_obj_bank[0][1] = { .red = 31 };    // red circle
gba::pal_obj_bank[0][2] = { .green = 31 };  // green rectangle

// Copy pixel data to VRAM
auto* dest = gba::memory_map(gba::mem_vram_obj);
std::memcpy(dest, sprite.data(), sprite.size());

// Set OAM attributes
gba::obj_mem[0] = sprite.obj(gba::tile_index(dest));

How it works

Each sprite_WxH() call takes a list of shape groups. Each group is assigned a sequential palette index starting from 1 (palette index 0 is transparent). The shapes within each group are rasterized into 4bpp pixel data.

Available sprite sizes

SizeFunctionBytes
8x8sprite_8x8()32
16x16sprite_16x16()128
16x32sprite_16x32()256
32x16sprite_32x16()256
32x32sprite_32x32()512
32x64sprite_32x64()1024
64x32sprite_64x32()1024
64x64sprite_64x64()2048

Shape types

ShapeSignatureNotes
Circlecircle(cx, cy, r)Float centre + radius for pixel alignment
Ovaloval(x, y, w, h)Bounding box coordinates
Rectanglerect(x, y, w, h)Bounding box coordinates
Triangletriangle(x1, y1, x2, y2, x3, y3)Three vertices
Lineline(x1, y1, x2, y2, thickness)Endpoints + thickness
Circle Outlinecircle_outline(cx, cy, r, thickness)Hollow circle
Oval Outlineoval_outline(x, y, w, h, thickness)Hollow oval
Rect Outlinerect_outline(x, y, w, h, thickness)Hollow rectangle
Texttext(x, y, "string")Built-in 3x5 font

Circle pixel alignment

The float centre and radius control how circles align to the pixel grid:

circle(8.0, 8.0, 4.0)   // 8px even diameter, centre between pixels
circle(8.0, 8.0, 3.5)   // 7px odd diameter, centre on pixel 8
oval(4, 4, 8, 8)         // Same 8px circle via bounding box

Erasing with palette index 0

Palette index 0 is transparent. Switch to it to cut holes in shapes:

constexpr auto donut = sprite_16x16(
    circle(8.0, 8.0, 6.0),     // Filled circle (palette 1)
    palette_idx(0),              // Switch to transparent
    circle(8.0, 8.0, 3.0)       // Erase inner circle
);

Grouping shapes

Use group() to assign multiple shapes to the same palette index:

constexpr auto sprite = sprite_16x16(
    group(circle(8.0, 8.0, 3.0), line(0, 0, 16, 16, 1)),  // Both palette 1
    group(rect(0, 0, 16, 16))                               // Palette 2
);

OAM attributes

Each sprite result provides a pre-filled obj method that sets the correct shape, size, and colour depth for OAM:

auto obj_attrs = sprite.obj(gba::tile_index(dest));
obj_attrs.x = 120;
obj_attrs.y = 80;
gba::obj_mem[0] = obj_attrs;

Example output

Several consteval shapes rendered as sprites:

#include <gba/bios>
#include <gba/interrupt>
#include <gba/shapes>
#include <gba/video>

#include <cstring>

using namespace gba::shapes;

// Compile-time sprites
constexpr auto spr_circle = sprite_16x16(circle(8.0, 8.0, 7.0));

constexpr auto spr_donut = sprite_16x16(circle(8.0, 8.0, 7.0), palette_idx(0), circle(8.0, 8.0, 3.0));

constexpr auto spr_rect = sprite_16x16(rect(1, 1, 14, 14));

constexpr auto spr_triangle = sprite_16x16(triangle(8, 1, 15, 14, 1, 14));

constexpr auto spr_face = sprite_32x32(circle(16.0, 16.0, 14.0), // Head (palette 1)
                                       group(                    // Eyes (palette 2)
                                           circle(11.0, 12.0, 2.5), circle(21.0, 12.0, 2.5)),
                                       group( // Mouth (palette 3)
                                           oval(10, 20, 12, 4)),
                                       palette_idx(0),     // Erase
                                       oval(11, 21, 10, 2) // Inner mouth cutout
);

constexpr auto spr_label = sprite_64x32(text(2, 2, "stdgba"),
                                        group(),                      // Reserve palette 2
                                        rect_outline(0, 0, 64, 14, 1) // Border (palette 3)
);

int main() {
    gba::irq_handler = {};
    gba::reg_dispstat = {.enable_irq_vblank = true};
    gba::reg_ie = {.vblank = true};
    gba::reg_ime = true;

    gba::reg_dispcnt = {
        .video_mode = 0,
        .linear_obj_tilemap = true,
        .enable_obj = true,
    };

    // Background
    gba::pal_bg_mem[0] = {.red = 4, .green = 6, .blue = 10};

    // Sprite palettes
    gba::pal_obj_bank[0][1] = {.red = 28, .green = 8, .blue = 8};  // Red
    gba::pal_obj_bank[1][1] = {.red = 8, .green = 28, .blue = 8};  // Green
    gba::pal_obj_bank[2][1] = {.red = 8, .green = 8, .blue = 28};  // Blue
    gba::pal_obj_bank[3][1] = {.red = 28, .green = 28, .blue = 8}; // Yellow

    // Face palette
    gba::pal_obj_bank[4][1] = {.red = 31, .green = 25, .blue = 12}; // Skin
    gba::pal_obj_bank[4][2] = {.red = 4, .green = 4, .blue = 8};    // Eyes
    gba::pal_obj_bank[4][3] = {.red = 24, .green = 8, .blue = 8};   // Mouth

    // Label palette
    gba::pal_obj_bank[5][1] = {.red = 31, .green = 31, .blue = 31}; // Text
    gba::pal_obj_bank[5][3] = {.red = 16, .green = 20, .blue = 28}; // Border

    // Copy tile data to OBJ VRAM
    auto* dest = gba::memory_map(gba::mem_vram_obj);
    auto* base = dest;

    auto copy_sprite = [&](const auto& spr) {
        auto idx = gba::tile_index(dest);
        std::memcpy(dest, spr.data(), spr.size());
        dest += spr.size() / sizeof(*dest);
        return idx;
    };

    auto idx_circle = copy_sprite(spr_circle);
    auto idx_donut = copy_sprite(spr_donut);
    auto idx_rect = copy_sprite(spr_rect);
    auto idx_triangle = copy_sprite(spr_triangle);
    auto idx_face = copy_sprite(spr_face);
    auto idx_label = copy_sprite(spr_label);

    // Place sprites across the screen
    auto place = [](int slot, auto spr_data, unsigned short tile_idx, unsigned short x, unsigned short y,
                    unsigned short pal) {
        auto obj = spr_data.obj(tile_idx);
        obj.x = x;
        obj.y = y;
        obj.palette_index = pal;
        gba::obj_mem[slot] = obj;
    };

    place(0, spr_circle, idx_circle, 20, 64, 0);
    place(1, spr_donut, idx_donut, 52, 64, 1);
    place(2, spr_rect, idx_rect, 84, 64, 2);
    place(3, spr_triangle, idx_triangle, 116, 64, 3);
    place(4, spr_face, idx_face, 156, 56, 4);
    place(5, spr_label, idx_label, 88, 120, 5);

    // Hide remaining sprites
    for (int i = 6; i < 128; ++i) {
        gba::obj_mem[i] = {.disable = true};
    }

    while (true) {
        gba::VBlankIntrWait();
    }
}

Shapes demo

BIOS Functions

The GBA BIOS provides built-in routines accessible through software interrupts (SWI). stdgba wraps these in C++ functions, some of which are constexpr - the compiler evaluates them at compile time when possible and falls back to the BIOS call at runtime.

Common functions

Halting and waiting

#include <gba/bios>

// Wait for VBlank interrupt (most common - used every frame)
gba::VBlankIntrWait();

// Halt CPU until any interrupt
gba::Halt();

// Halt CPU until a specific interrupt
gba::IntrWait(true, { .vblank = true });

Math

// Square root (constexpr when argument is known at compile time)
auto root = gba::Sqrt(144u);  // 12

// Arc tangent
auto angle = gba::ArcTan2(dx, dy);

// Division (avoid - the compiler's division is usually better)
auto [quot, rem] = gba::Div(100, 7);

Memory copy

// CpuSet: 32-bit word copy/fill via BIOS
gba::CpuSet(src, dst, { .count = 256, .set_32bit = true });

// CpuFastSet: 32-bit copy in 8-word chunks (must be aligned, count multiple of 8)
gba::CpuFastSet(src, dst, { .count = 256 });

Note: For general memory copying, prefer standard memcpy/memset - stdgba’s optimised ARM assembly implementations are faster than the BIOS routines in most cases.

Decompression

// Decompress LZ77 data to work RAM (byte writes)
gba::LZ77UnCompWram(compressed_data, dest);

// Decompress LZ77 data to video RAM (halfword writes)
gba::LZ77UnCompVram(compressed_data, dest);

// Huffman decompression
gba::HuffUnCompReadNormal(compressed_data, dest);

// Run-length decompression
gba::RLUnCompReadNormalWrite8bit(compressed_data, dest);

Reset

// Soft reset (restart the ROM)
gba::SoftReset();  // [[noreturn]]

// Clear specific memory regions
gba::RegisterRamReset({
    .ewram = true,
    .iwram = true,
    .palette = true,
    .vram = true,
    .oam = true,
});

Constexpr BIOS functions

Several BIOS math functions are constexpr in stdgba. When called with compile-time arguments, the compiler evaluates them directly and embeds the result:

// Evaluated at compile time - no SWI at runtime
constexpr auto root = gba::Sqrt(256u);  // 16

// Evaluated at runtime - SWI 0x08
volatile unsigned int x = 256;
auto root2 = gba::Sqrt(x);  // BIOS call

This is possible because stdgba provides constexpr implementations of the algorithms alongside the SWI wrappers. The compiler chooses the appropriate path automatically.

tonclib comparison

stdgbatonclib
gba::VBlankIntrWait()VBlankIntrWait()
gba::Sqrt(n)Sqrt(n)
gba::CpuSet(s, d, cfg)CpuSet(s, d, mode)
gba::SoftReset()SoftReset()
gba::ArcTan2(x, y)ArcTan2(x, y)

The API names match the BIOS function names from the community documentation. The main difference is type safety: stdgba uses structs with named fields for configuration instead of raw integers with magic bit patterns.

Save Data

The GBA supports three save memory types. stdgba provides APIs for all three: SRAM, Flash, and EEPROM.

SRAM (32KB)

SRAM is the simplest save type - byte-addressable static RAM at 0x0E000000. Read and write directly through the gba::mem_sram registral:

#include <gba/save>

// Write a byte
gba::mem_sram[0] = std::byte{0x42};

// Read it back
auto val = gba::mem_sram[0];

SRAM must be accessed one byte at a time (no 16/32-bit access). Data persists as long as the cartridge battery lasts.

Flash (64KB / 128KB)

Flash memory uses sector-erased NOR storage. Unlike SRAM, Flash requires a command protocol - you cannot write directly. stdgba provides two chip-family APIs that compile command sequences at build time:

  • gba::flash::standard - Macronix, Panasonic, Sanyo, SST chips
  • gba::flash::atmel - Atmel chips (128-byte page writes, no separate erase)

Standard Flash example

#include <gba/save>

namespace sf = gba::flash::standard;

// Define callbacks for writing and reading sector data
void fill(sf::sector_span buf) {
    buf[0] = std::byte{0x42};
}

void recv(sf::const_sector_span buf) {
    // process loaded data...
}

// Compile a command sequence at build time
constexpr auto cmds = sf::compile(
    sf::erase_sector(0),
    sf::write_sector(0, fill),
    sf::read_sector(0, recv)
);

// Execute at runtime
auto err = cmds.execute();

Flash detection

Before using Flash, detect the chip to populate the global state:

auto info = gba::flash::detect();
// info.mfr      - manufacturer (macronix, panasonic, sanyo, sst, atmel)
// info.chip_size - flash_64k or flash_128k

Flash specifics

  • Writing is slow (milliseconds per byte)
  • Flash has a limited number of erase cycles (~100,000)
  • Flash and ROM share the same bus - interrupts that read ROM must be disabled during Flash operations

EEPROM (512B / 8KB)

EEPROM is serial memory accessed via DMA3 in 8-byte blocks. Two APIs for the two sizes:

  • gba::eeprom::eeprom_512b - 64 blocks, 6-bit addressing
  • gba::eeprom::eeprom_8k - 1024 blocks, 14-bit addressing

Both provide raw block access and sequential stream types:

#include <gba/save>

namespace ee = gba::eeprom::eeprom_512b;

// Stream-based write
ee::ostream out;
ee::block data = {std::byte{0xAA}};
out.write(&data, 1);

// Stream-based read
ee::istream in;
ee::block buf;
in.read(&buf, 1);

Memory Utilities

<gba/memory> collects the low-level allocation and data-layout helpers that show up repeatedly in real GBA projects:

  • bitpool for fixed-capacity VRAM or RAM allocation
  • unique<T> and make_unique() for RAII ownership
  • bitpool_buffer_resource for std::pmr containers backed by a bitpool
  • plex<Ts...> for trivially copyable tuple-like register payloads
  • optimised memcpy, memmove, and memset wrappers tuned for ARM7TDMI

For raw VRAM addresses and palette/OAM memory maps, see Video Memory.

Why this module exists

The GBA gives you tight, fixed memory regions instead of a desktop-style heap:

  • 32 KiB IWRAM for hot code and stack
  • 256 KiB EWRAM for larger runtime data
  • 32 KiB OBJ VRAM and 64 KiB BG VRAM with hardware-specific layout rules

That environment pushes you toward fixed-capacity allocators, predictable ownership, and careful copy/fill paths. <gba/memory> packages those patterns into APIs that stay small enough for the platform.

API map

APIWhat it doesTypical use
bitpool32-chunk bitmap allocator over a caller-owned regionOBJ VRAM tiles, BG blocks, arena-style RAM
bitpool::allocate()Raw byte allocationReserve tile or buffer space
bitpool::allocate_unique()Raw allocation + RAII deallocationTemporary VRAM ownership
bitpool::make_unique()Placement-new object + RAII destructionPool-owned runtime objects
bitpool::subpool()Carve one pool out of anotherReserve a sheet- or scene-local arena
bitpool_buffer_resourcePMR adapter over bitpoolstd::pmr::vector or std::pmr::string
unique<T>Small owning pointer with type-erased deleterResource ownership without std::unique_ptr
plex<Ts...>Tuple-like object guaranteed to fit in 32 bitsRegister pairs like timer reload + control
memcpy / memmove / memsetFast wrappers over specialized AEABI back endsBulk transfers and clears

bitpool - a 32-chunk allocator

bitpool manages a contiguous region using a 32-bit mask. Each bit represents one chunk of equal size.

chunk 0  chunk 1  chunk 2  ...  chunk 31
  bit0     bit1     bit2           bit31

That means every pool has exactly 32 allocatable chunk positions. You choose the chunk size to fit the memory region you care about.

Examples:

RegionTotal sizeSensible chunk sizeWhy
OBJ VRAM32 KiB1024 bytes32 chunks exactly cover the whole region
Small scratch arena4 KiB128 bytesGood for many tiny fixed blocks
BG map staging8 KiB256 bytesOne chunk per quarter screenblock
#include <gba/memory>
#include <gba/video>

gba::bitpool obj_vram{gba::memory_map(gba::mem_vram_obj), 1024};

auto tiles = obj_vram.allocate_unique<unsigned char>(2048);
if (tiles) {
    std::memcpy(tiles.get(), sprite_data, 2048);
}

Core queries

FunctionMeaning
bitpool::capacity()Always 32 chunks
chunk_size()Bytes per chunk
size()Total bytes managed (capacity() * chunk_size())

Raw allocation

allocate(bytes) rounds up to whole chunks and returns the first contiguous run that fits.

alignas(4) unsigned char buffer[1024];
gba::bitpool pool{buffer, 32};

void* a = pool.allocate(32);  // 1 chunk
void* b = pool.allocate(64);  // 2 contiguous chunks

pool.deallocate(a, 32);
pool.deallocate(b, 64);

Important properties:

  • allocation is simple and deterministic: scan the 32-bit mask for a free run
  • deallocation is O(1): clear the matching bits
  • chunk size must be a power of two
  • large requests can fail if the free space is split into non-contiguous holes

So bitpool is not a general heap replacement. It is best when you deliberately size chunks around your asset granularity.

Alignment-aware allocation

allocate(bytes, chunkAlignment) steps the search in chunk-sized increments derived from chunkAlignment.

alignas(32) unsigned char buffer[256];
gba::bitpool pool{buffer, 16};

void* aligned = pool.allocate(16, 32);

The alignment is effectively rounded up to chunk boundaries. If your chunks are already 1024 bytes wide, asking for 4-byte alignment changes nothing.

VRAM workflow

bitpool is especially useful when OBJ tile ownership changes at runtime.

#include <gba/memory>
#include <gba/video>

gba::bitpool obj_tiles{gba::memory_map(gba::mem_vram_obj), 1024};

auto slot = obj_tiles.allocate_unique<unsigned char>(1024);
if (!slot) {
    // No room for another sprite sheet chunk
    return;
}

std::memcpy(slot.get(), sprite_sheet, 1024);
const auto tile = gba::tile_index(slot.get());
gba::obj_mem[0] = sprite.obj(tile);

The same pattern works well for BG VRAM, because tile graphics (4 charblocks) and screen entries (32 screenblocks) share the same 64 KiB mem_vram_bg region.

A convenient chunking is “one chunk per screenblock”:

  • 1 screenblock = 0x800 bytes (2 KiB)
  • 1 charblock = 0x4000 bytes (16 KiB) = 8 screenblocks

That makes bitpool a good fit for allocating both tile graphics and tilemaps from one shared pool.

#include <gba/memory>
#include <gba/video>

// BG VRAM is 64 KiB. Using 0x800-byte chunks gives exactly 32 chunks:
// one per screenblock.
gba::bitpool bg_vram{gba::memory_map(gba::mem_vram_bg), 0x800};

auto tiles = bg_vram.allocate_unique<unsigned char>(0x4000); // 1 charblock
auto map   = bg_vram.allocate_unique<unsigned char>(0x800);  // 1 screenblock

const auto cbb = gba::char_map(tiles.get());
const auto sbb = gba::screen_map(map.get());

gba::reg_bgcnt[0] = {
    .charblock = cbb,
    .screenblock = sbb,
};

This pattern works well for:

  • allocating BG charblocks + screenblocks for background layers
  • staging background tilemap uploads
  • swapping sprite sets between scenes
  • reserving temporary OBJ tiles for effects
  • carving a VRAM upload arena out of EWRAM or VRAM

allocate_unique() - raw bytes with RAII

If you want ownership without placement-new, use allocate_unique<T>().

{
    auto sprite_tiles = obj_vram.allocate_unique<unsigned char>(512);
    if (sprite_tiles) {
        std::memcpy(sprite_tiles.get(), data, 512);
    }
} // returned to the pool here

T only controls pointer type and default alignment. No constructor runs.

make_unique() - construct an object in pool memory

If you want an actual object stored inside the pool, use make_unique().

struct cache_entry {
    unsigned short tile_base;
    unsigned short frame_count;
};

auto entry = obj_vram.make_unique<cache_entry>(12, 4);

On destruction, the object destructor runs first, then the bytes are returned to the pool.

subpool() - reserve one arena inside another

Subpools let you split a parent pool into smaller lifetime domains.

gba::bitpool obj_vram{gba::memory_map(gba::mem_vram_obj), 1024};

auto enemy_bank = obj_vram.subpool(4096, 1024);
auto boss_bank  = obj_vram.subpool(8192, 1024);

This is useful when one group of assets should be freed all at once. For example, a scene can own a subpool and drop the whole reservation when unloading.

Important lifetime rule:

  • the parent pool must outlive every subpool created from it

bitpool_buffer_resource - PMR bridge

If you want STL-like dynamic containers but still want to control exactly where the bytes come from, wrap a pool as a std::pmr::memory_resource.

#include <memory_resource>
#include <vector>

alignas(4) unsigned char arena[4096];
gba::bitpool pool{arena, 128};
gba::bitpool_buffer_resource resource{pool};

std::pmr::vector<int> values{&resource};
values.push_back(1);
values.push_back(2);
values.push_back(3);

This does not magically remove dynamic allocation costs, but it keeps them inside a bounded arena you control.

unique<T> and make_unique()

gba::unique<T> is a small owning pointer with a type-erased deleter stored inline. It is useful even outside bitpool, because it lets you attach custom destruction behaviour without dragging in the full standard smart-pointer machinery.

auto owned = gba::make_unique<int>(42);
if (owned) {
    *owned = 100;
}

Use cases:

  • ownership of pool allocations
  • placement-new objects in custom arenas
  • temporary wrappers around manually managed resources

plex<Ts...> - tuple-like data that fits registers

plex<Ts...> is a trivially copyable heterogeneous aggregate that is guaranteed to fit in 32 bits. Unlike std::tuple, it is designed to be safe for hardware-oriented use cases such as register pairs and packed configuration values.

#include <bit>
#include <gba/memory>

gba::plex<unsigned short, unsigned short> pair{0x1234, 0x5678};
auto [lo, hi] = pair;

auto raw = std::bit_cast<unsigned int>(pair);

Typical uses:

  • timer reload + control (gba::timer_config is a plex)
  • paired register writes
  • tiny aggregate values you want to destructure with structured bindings

plex supports:

  • 1 to 4 elements
  • structured bindings via get<I>()
  • comparisons and swap()
  • deduction guides and make_plex(...)

Optimised memcpy, memmove, and memset

stdgba ships custom wrappers in source/memcpy.cpp, source/memmove.cpp, and source/memset.cpp. They let the compiler inline small constant cases and jump straight to specialized AEABI entry points when alignment is provable.

memcpy

SpecializationTrigger
No-opn == 0 known at compile time
Inline word copyaligned source + dest, n % 4 == 0, 0 < n < 64
Inline byte copy1 <= n <= 6
Fast aligned AEABI pathboth pointers provably word-aligned
Generic AEABI patheverything else

memmove

SpecializationTrigger
No-opn == 0 known at compile time
Inline overlap-safe byte move1 <= n <= 6
Fast aligned AEABI pathboth pointers provably word-aligned
Generic AEABI patheverything else

memset

SpecializationTrigger
No-opn == 0 known at compile time
Inline word storesaligned destination, n % 4 == 0, 0 < n < 64, constant fill byte
Inline byte stores1 <= n <= 12
Fast aligned AEABI pathdestination provably word-aligned
Generic AEABI patheverything else

These paths matter because the ARM7TDMI is sensitive to call overhead, alignment checks, and instruction fetch bandwidth. Small constant copies and clears are common in sprite/OAM/tile code, so letting the compiler collapse them early saves cycles.

In practice you usually just call std::memcpy, std::memmove, or std::memset as normal. The library provides the tuned implementation underneath.

Choosing the right tool

ProblemRecommended tool
Reserve OBJ VRAM tiles for a runtime-loaded sprite sheetbitpool
Keep a pool allocation alive until a sprite/effect is destroyedallocate_unique()
Construct a small object inside a bounded arenamake_unique()
Give a PMR container a fixed arenabitpool_buffer_resource
Pack <= 32 bits of heterogenous register dataplex
Copy/fill bytes quicklymemcpy / memmove / memset

Functional

<gba/functional> provides a lightweight, heap-free type-erased callable wrapper designed for GBA embedded development.

Overview

The standard library’s std::function allocates on the heap when the stored callable is too large for its internal buffer, and its virtual-dispatch overhead is higher than necessary for a single-core embedded target. gba::function avoids both problems:

  • No heap allocation – callables are stored in a 12-byte inline buffer. Oversized callables are rejected at compile time via static_assert.
  • Function-pointer dispatch – avoids virtual-table overhead.
  • Copyable and movable – full value semantics, including assignment from nullptr.

gba::function<Sig>

#include <gba/functional>

gba::function<void(int)> fn = [](int x) { /* ... */ };
fn(42);

The template parameter Sig is a function signature such as void(int) or int(float, float).

Construction

// Default-construct (null / empty)
gba::function<void()> empty;

// Construct from a lambda
int counter = 0;
gba::function<void()> inc = [&counter] { ++counter; };

// Construct from a free function
void on_tick() { /* ... */ }
gba::function<void()> tick = on_tick;

// Assign null
inc = nullptr;

Invocation

if (fn) {
    fn(42);   // only call when non-null
}

Invoking a null gba::function is undefined behaviour – guard with the bool conversion operator before calling.

Null checks and reassignment

gba::function<void(int)> fn;

if (!fn) {
    fn = [](int x) { /* ... */ };
}

fn = nullptr;  // reset to empty

gba::handler<Args...>

handler is a convenience alias for void-returning functions:

// Equivalent to gba::function<void(int)>
gba::handler<int> h = [](int x) { process(x); };
h(42);

It is the idiomatic type for GBA event callbacks (VBlank handler, key-press callback, etc.) where the return value is not needed.

Small-buffer constraint

The inline storage is 12 bytes. Any callable larger than 12 bytes triggers a static_assert at compile time:

int a, b, c, d;   // four ints = 16 bytes - too large
gba::function<void()> fn = [a, b, c, d] { /* ... */ };
// error: Callable too large for small buffer optimization

To capture more state, store it in a struct and capture a pointer or reference to it instead:

struct State {
    int a, b, c, d;
};

State state{1, 2, 3, 4};

// Capture a pointer - sizeof(State*) == 4 bytes, fits easily
gba::function<void()> fn = [&state] {
    state.a += state.b;
};

Usage with gba::irq_handler

gba::irq_handler (from <gba/interrupt>) stores a gba::handler<gba::irq>, so any callable that accepts a gba::irq can be assigned directly:

#include <gba/interrupt>

gba::irq_handler = [](gba::irq irq) {
    if (irq.vblank) { /* frame logic */ }
};

For the full interrupt setup and irq_handler API (has_value, swap, reset, nullisr), see Interrupts.

Type sizes

TypeSize
gba::function<void()>20 bytes
gba::function<void(int)>20 bytes
gba::handler<>20 bytes (alias)

The 20-byte total comes from: 4-byte invoke pointer + 4-byte ops-table pointer + 12-byte inline storage.

Summary

Featuregba::functionstd::function
Heap allocationNeverWhen callable > SBO buffer
Inline storage12 bytes (fixed)Implementation-defined
Oversized callablestatic_assert at compile timeHeap fallback
Dispatch mechanismFunction pointerVirtual dispatch
Null / empty stateYes (nullptr / default)Yes
Copy / moveYesYes

Compression

stdgba provides consteval compression functions that compress data entirely at compile time. The compressed output is compatible with the GBA BIOS decompression routines, so assets can be stored compressed in ROM and decompressed at runtime with a single BIOS call.

Supported algorithms

AlgorithmBest forHeader format
LZ77Repeated patterns (tiles, maps)BIOS-compatible
HuffmanSkewed symbol frequencies (text)BIOS-compatible
RLELong runs of identical valuesBIOS-compatible
BitPackReducing bit depth (e.g., 32-bit to 4-bit)BIOS-compatible

LZ77 compression

#include <gba/compress>
#include <gba/bios>

// Compress tilemap data at compile time
constexpr auto compressed_map = gba::lz77_compress([] {
    return std::array<unsigned short, 1024>{
        0, 0, 0, 1, 1, 1, 2, 2, 2, // ...
    };
});

// Decompress at runtime using BIOS
alignas(4) std::array<unsigned short, 1024> buffer;
gba::LZ77UnCompWram(compressed_map, buffer.data());

Use LZ77UnCompWram for general RAM targets and LZ77UnCompVram for video RAM (which requires halfword writes).

Huffman compression

constexpr auto compressed_text = gba::huffman_compress([] {
    return std::array<unsigned char, 256>{ /* text data */ };
});

alignas(4) std::array<unsigned char, 256> buffer;
gba::HuffUnCompReadNormal(compressed_text, buffer.data());

RLE compression

constexpr auto compressed_fill = gba::rle_compress([] {
    return std::array<unsigned char, 512>{ /* data with runs */ };
});

alignas(4) std::array<unsigned char, 512> buffer;
gba::RLUnCompReadNormalWrite8bit(compressed_fill, buffer.data());

Bit packing

Bit packing reduces the bit depth of data elements. Useful for compacting palette indices or other small values:

constexpr auto packed = gba::bit_pack<4>([] {
    return std::array<unsigned int, 64>{ 0, 1, 2, 3, /* 4-bit values in 32-bit containers */ };
});

Combining with differential filtering

For data with gradual changes (audio waveforms, gradients), apply a differential filter before compression:

#include <gba/filter>
#include <gba/compress>

constexpr auto filtered = gba::diff_filter<1>([] {
    return std::array<unsigned char, 512>{
        128, 130, 132, 134, 136, // ...
    };
});

constexpr auto compressed = gba::lz77_compress([] { return filtered; });

String Formatting

stdgba provides a compile-time string formatting library designed for GBA constraints. Format strings are parsed at compile time, and arguments are bound by name using user-defined literals.

Basic usage

#include <gba/format>
using namespace gba::literals;

// Define a format string (parsed at compile time)
constexpr auto fmt = "HP: {hp}/{max}"_fmt;

// Format into a buffer
char buf[32];
fmt.to(buf, "hp"_arg = 42, "max"_arg = 100);
// buf contains "HP: 42/100"

Without literals

If you prefer not to use literal operators:

constexpr auto fmt = gba::format::make_format<"HP: {hp}/{max}">();
constexpr auto hp = gba::format::make_arg<"hp">();
constexpr auto max_hp = gba::format::make_arg<"max">();

char buf[32];
fmt.to(buf, hp = 42, max_hp = 100);

Placeholder forms

FormMeaning
{name}Named placeholder with default formatting
{name:spec}Named placeholder with format spec
{}Implicit positional placeholder
{:spec}Implicit positional placeholder with format spec
{0}Explicit positional placeholder
{0:spec}Explicit positional placeholder with format spec
{{ / }}Escaped literal braces

Format spec grammar

The format spec follows a Python-style mini-language:

[[fill]align][sign][#][0][width][grouping][.precision][type]
FieldSyntaxDefaultApplies to
fillany ASCII character before align' 'all aligned outputs
align< left, > right, ^ centre, = sign-awaretype-dependentall (= is numeric-only)
sign+, -, or space- behaviournumeric types
#alternate formoffintegral prefixes, fixed-point decimal point retention
0zero-fill (equivalent to fill=0 align==)offnumeric types
widthdecimal digits0all types
grouping, or _noneinteger, fixed-point, angle decimal output
precision. followed by digitsunsetstrings, fixed-point, angle degrees/radians/turns, angle hex
typetrailing presentation characterper value categorysee tables below

Integer type codes

CodeMeaning# alternate form
(default)decimal-
ddecimal-
bbinary0b prefix
ooctal0o prefix
xhex lowercase0x prefix
Xhex uppercase0X prefix
ngrouped decimal-
csingle character from code point-

Integer grouping inserts a separator every 3 digits for decimal/octal, or every 4 digits for binary/hex.

String type codes

CodeMeaning
(default)emit string as-is
ssame as default

Precision truncates the string to at most N characters before width/alignment is applied.

Fixed-point type codes

CodeMeaning
(default)fixed decimal, trailing fractional zeros trimmed
f / Ffixed decimal with exactly .N fractional digits
escientific notation lowercase (1.23e+03)
Escientific notation uppercase (1.23E+03)
ggeneral format – uses fixed for small values, scientific for large
Ggeneral format uppercase
%multiply by 100 and append %

Grouping applies to the integer part only. # with .0f retains the decimal point.

Angle type codes

CodeMeaning
(default)degrees
rradians
tturns (0.0 - 1.0)
iraw integer value of the angle storage
xraw hex lowercase
Xraw hex uppercase

For x/X, precision controls the number of emitted hex digits (most-significant digits are kept). If omitted, the native width is used (8 for gba::angle, Bits/4 for gba::packed_angle<Bits>). # adds a 0x/0X prefix.

Examples

Integers

constexpr auto fmt = "Addr: {a:#010x}"_fmt;
char buf[16];
fmt.to(buf, "a"_arg = 0x2A);
// buf contains "Addr: 0x0000002a"
constexpr auto fmt = "Gold: {gold:_d}"_fmt;
char buf[16];
fmt.to(buf, "gold"_arg = 9999);
// buf contains "Gold: 9_999"

Strings

constexpr auto fmt = "{name:*^7.3}"_fmt;
char buf[16];
fmt.to(buf, "name"_arg = "Hello");
// buf contains "**Hel**"

Fixed-point

#include <gba/fixed_point>
using fix8 = gba::fixed<int, 8>;

constexpr auto fmt = "X: {x:,.2f}"_fmt;
char buf[32];
fmt.to(buf, "x"_arg = fix8(1234.5));
// buf contains "X: 1,234.50"

Scientific notation:

constexpr auto fmt = "X: {x:.2e}"_fmt;
char buf[32];
fmt.to(buf, "x"_arg = fix8(1234.5));
// buf contains "X: 1.23e+03"

Percent formatting:

constexpr auto fmt = "HP: {x:%}"_fmt;
char buf[32];
fmt.to(buf, "x"_arg = fix8(0.5));
// buf contains "HP: 50%"

Angles

#include <gba/angle>
using namespace gba::literals;

constexpr auto fmt = "Angle: {a:.4r}"_fmt;
char buf[32];
fmt.to(buf, "a"_arg = 90_deg);
// buf contains "Angle: 1.5708"

Compact raw hex view of a packed angle:

constexpr auto fmt = "Rot: {a:#.4X}"_fmt;
char buf[16];
fmt.to(buf, "a"_arg = gba::packed_angle16{0x4000});
// buf contains "Rot: 0X4000"

Compile-time formatting

constexpr auto result = "HP: {hp}"_fmt.to_static("hp"_arg = 42);
// result is a compile-time array containing "HP: 42"

to_static also accepts gba::literals::fixed_literal values (e.g. 3.14_fx), which are compile-time-only and cannot be used with runtime output paths.

Typewriter generator

The generator API emits one character at a time, perfect for RPG-style text rendering:

constexpr auto fmt = "You found {item}!"_fmt;

auto gen = fmt.generator("item"_arg = "Sword");
while (auto ch = gen()) {
    draw_char(*ch);
    wait_frames(2);  // Typewriter delay
}

Lazy (lambda) arguments

Arguments can also be bound to a callable (for example, a lambda). The callable is invoked when formatting reaches that placeholder.

This is useful for typewriter-style output: you can defer looking up a value until the moment the generator starts emitting that argument.

constexpr auto fmt = "HP: {hp}/{max}"_fmt;

// player.hp is read when the generator reaches {hp}, not when it is created.
auto gen = fmt.generator(
    "hp"_arg = [&] { return player.hp; },
    "max"_arg = [&] { return player.max_hp; }
);

while (auto ch = gen()) {
    draw_char(*ch);
    wait_frames(2);
}

For string arguments, the supplier should return a stable pointer (for example, a string stored in memory) rather than a temporary buffer.

Word boundary lookahead

The generator provides until_break() to check how many characters remain until the next word boundary. Use this for line wrapping:

auto gen = fmt.generator("hp"_arg = 42);
int col = 0;
while (auto ch = gen()) {
    if (col + gen.until_break() > 30) {
        newline();
        col = 0;
    }
    draw_char(*ch);
    ++col;
}

Output paths

All output paths share the same rendering semantics and produce identical results for the same inputs:

PathDescription
generator()Streaming character-by-character emission
to(buf, ...)Render into a caller-provided buffer
to_array(...)Render into a std::array
to_static(...)Compile-time render into a constexpr array

Invalid spec rejection

Invalid format spec combinations are rejected at compile time. Examples of rejected specs:

SpecReason
+ssign on string type
,sgrouping on string type
=ssign-aware alignment on string type
.2iprecision on raw integer angle type
#calternate form on character type

Deferred features

The following features are not supported in the current implementation:

  • !s / !r conversion flags
  • Dynamic width / precision ({x:{w}.{p}f})
  • Nested replacement fields inside format specs
  • Runtime-parsed format strings
  • Built-in float / double formatting

Design notes

  • Format strings are parsed entirely at compile time - no runtime parsing overhead
  • Arguments are bound by name, not position, making format strings self-documenting
  • Arguments may be bound to callables (lambdas) for lazy evaluation at placeholder time
  • The generator API emits digits MSB-first, enabling typewriter effects without buffering
  • No heap allocation - all formatting uses caller-provided buffers
  • The generator uses a deterministic phase/state machine with category-specialised emission states

Logging

stdgba provides a logging system with pluggable backends for emulator debug output. It auto-detects whether the game is running under mGBA or no$gba and routes log messages to the appropriate debug console.

Setup

#include <gba/logger>

using namespace gba::literals;

int main() {
    // Auto-detect emulator and initialise
    if (gba::log::init()) {
        gba::log::info("Game started!");
    }
}

init() returns true if a supported emulator was detected, false otherwise (a null backend is installed so logging calls are safe but do nothing).

Log levels

Five severity levels are available:

gba::log::fatal("Critical error");
gba::log::error("Something failed");
gba::log::warn("Potential problem");
gba::log::info("Status update");
gba::log::debug("Verbose trace");

Filtering by level

gba::log::set_level(gba::log::level::warn);
// Only fatal, error, and warn messages are output

Runtime level selection

Use write() when the log level is determined at runtime:

gba::log::level lvl = config.verbose ? gba::log::level::debug : gba::log::level::info;
gba::log::write(lvl, "Message");

Formatted logging

Log messages support the same format string syntax as <gba/format>:

For full format syntax ({x}, {x:X}, named args, and generator behaviour), see String Formatting.

using namespace gba::literals;

gba::log::info("HP: {hp}"_fmt, "hp"_arg = 42);
gba::log::warn("Sector {s} failed"_fmt, "s"_arg = 3);

Custom backends

Implement the gba::log::backend interface to route logs anywhere:

struct screen_logger : gba::log::backend {
    int line = 0;
    std::size_t write(gba::log::level lvl, const char* msg, std::size_t len) override {
        draw_text(0, line++, msg);
        return len;
    }
};

screen_logger my_logger;
gba::log::set_backend(&my_logger);

Built-in backends

BackendEmulatorDetection
mgba_backendmGBAWrites to 0x4FFF780 debug registers
nocash_backendno$gbaWrites to 0x4FFFA00 signature-based output
null_backend(fallback)Discards all output

init() tries mGBA first, then no$gba, then falls back to the null backend.

Testing, Assertions & Benchmarking

stdgba provides lightweight APIs for unit testing, assertions, and cycle-accurate benchmarking on hardware or emulator.

For debugger value rendering, see GDB Pretty Printers.

Test API

The gba::test singleton provides simple assertion and expectation checking. Tests run on real GBA hardware or mGBA emulator, with results reported via log output.

Basic test structure

#include <gba/testing>

int main() {
    gba::test("example test case", [] {
        gba::test.expect.eq(2 + 2, 4);
    });

    return gba::test.finish(); // Must call finish() to exit
}

Every test must:

  1. Call gba::test(name, lambda) to define a test case
  2. Use gba::test.expect.* or gba::test.assert.* inside the lambda
  3. Call gba::test.finish() at the end of main()

The test framework automatically exits via SWI 0x1A (or a custom exit SWI in -DSTDGBA_EXIT_SWI=0x##).

Expectation checks

Expectations continue execution on failure and count failures for the final report:

gba::test("expectations", [] {
    gba::test.expect.eq(2 + 2, 4, "arithmetic");                    // Pass
    gba::test.expect.ne(0, 1, "inequality");                        // Pass
    gba::test.expect.lt(1, 2);                                      // Pass
    gba::test.expect.le(1, 1);                                      // Pass
    gba::test.expect.gt(2, 1);                                      // Pass
    gba::test.expect.ge(1, 1);                                      // Pass
    gba::test.expect.is_true(true);                                 // Pass
    gba::test.expect.is_false(false);                               // Pass
    gba::test.expect.is_zero(0);                                    // Pass
    gba::test.expect.at_least(5, 3);                                // Pass (5 >= 3)
});

Assertion checks

Assertions stop execution on failure immediately:

gba::test("assertions", [] {
    gba::test.assert.eq(5, 5);                                      // Pass, continue
    gba::test.assert.eq(5, 6);                                      // FAIL, stop test
    gba::test.expect.eq(1, 1);                                      // Never reached
});

Range and container checks

Test ranges and containers element-wise:

#include <array>
#include <gba/testing>

int main() {
    gba::test("ranges", [] {
        std::array<int, 3> a = {1, 2, 3};
        std::array<int, 3> b = {1, 2, 3};
        gba::test.expect.range_eq(a, b, "array equality");

        std::array<int, 3> c = {1, 2, 4};
        gba::test.expect.range_ne(a, c, "array inequality");
    });

    return gba::test.finish();
}

Running tests on mGBA

Build your test executable, then run with mgba-headless:

# Build
cmake --build build --target my_test - -j 8

# Run (exit SWI 0x1A, return exit code in r0, timeout 10 seconds)
timeout 15 mgba-headless -S 0x1A -R r0 -t 10 build/tests/my_test.elf
echo "Exit code: $?"

The test framework writes results to the logger, viewable via:

  • mGBA debug console (Ctrl+D or Tools -> GDB)
  • no$gba debug window
  • Custom logger backend

Benchmark API

The gba::benchmark module provides cycle-accurate timing using cascading hardware timers.

Cycle counter

A cycle_counter wraps two cascading timers to form a 32-bit counter:

#include <gba/benchmark>

gba::benchmark::cycle_counter counter;
counter.start();
// ... code to measure ...
unsigned int cycles = counter.stop();

By default, cycle_counter uses TM2+TM3, leaving TM0+TM1 free for audio or other uses. Override via:

using namespace gba::benchmark;
cycle_counter counter(make_timer_pair(timer_pair_id::tm0_tm1));

Valid pairs: (0,1), (1,2), (2,3).

Measuring code

Use measure() to run a function and return its cycle cost:

#include <gba/benchmark>

unsigned int work(unsigned int n) {
    unsigned int sum = 0;
    for (unsigned int i = 0; i < n; ++i) {
        sum += i;
    }
    return sum;
}

int main() {
    // Measure one run
    auto cycles = gba::benchmark::measure(work, 1024u);
    
    // Measure and average 8 runs
    auto avg = gba::benchmark::measure_avg(8, work, 1024u);

    return 0;
}

measure() returns the cycle count. measure_avg() runs the function N times and returns the average, reducing noise from interrupts or cache effects.

Preventing dead-code elimination

Use do_not_optimize() to wrap code so the compiler cannot eliminate it:

#include <gba/benchmark>

gba::benchmark::cycle_counter counter;
counter.start();

gba::benchmark::do_not_optimize([&] {
    // Compiler cannot dead-code eliminate or reorder this
    volatile unsigned int x = 0;
    for (int i = 0; i < 100; ++i) x += i;
});

auto cycles = counter.stop();

Without do_not_optimize(), the compiler may optimise away unused computations, giving misleading cycle counts.

Combined example

Test a function with both assertions and benchmarks:

#include <gba/benchmark>
#include <gba/testing>

// Function under test
unsigned int sum_of_squares(unsigned int n) {
    unsigned int sum = 0;
    for (unsigned int i = 1; i <= n; ++i) {
        sum += i * i;
    }
    return sum;
}

int main() {
    // Unit test
    gba::test("sum_of_squares", [] {
        gba::test.expect.eq(sum_of_squares(1), 1, "sum(1) = 1");
        gba::test.expect.eq(sum_of_squares(3), 14, "sum(1..3) = 14");
        gba::test.expect.eq(sum_of_squares(5), 55, "sum(1..5) = 55");
    });

    // Benchmark
    gba::test("sum_of_squares benchmark", [] {
        using namespace gba::benchmark;
        auto cycles = measure_avg(4, sum_of_squares, 100u);
        gba::test.expect.lt(cycles, 5000, "reasonable cycle cost");
    });

    return gba::test.finish();
}

Tips & Best Practices

  • Always call gba::test.finish(): It flushes logs and signals the exit SWI to mgba-headless.
  • Use expect.* for non-critical checks: Failures don’t stop the test, so you can gather multiple failures at once.
  • Use assert.* for setup validation: Stop immediately if preconditions fail, preventing cascade failures.
  • Add descriptive messages: The third parameter makes test-failure output readable.
  • Benchmark multiple runs: Use measure_avg() to reduce noise from VBlank interrupts.
  • Isolate what you measure: Wrap only the code under test with do_not_optimize().
  • Test on hardware too: emulator behaviour may differ from real GBA in timing or memory access patterns.

Reference

FunctionPurpose
gba::test(name, fn)Run test case
gba::test.expect.eq(a, b)Expect a == b
gba::test.expect.ne(a, b)Expect a != b
gba::test.expect.lt(a, b)Expect a < b
gba::test.expect.le(a, b)Expect a <= b
gba::test.expect.gt(a, b)Expect a > b
gba::test.expect.ge(a, b)Expect a >= b
gba::test.expect.is_true(x)Expect x is true
gba::test.expect.is_false(x)Expect x is false
gba::test.expect.is_zero(x)Expect x == 0
gba::test.expect.range_eq(a, b)Expect ranges a and b are equal
gba::test.expect.range_ne(a, b)Expect ranges a and b are not equal
gba::test.assert.*Same as expect, but stops on failure
gba::test.finish()Exit the test (required)
gba::benchmark::measure(fn, args...)Measure cycles for one run
gba::benchmark::measure_avg(n, fn, args...)Measure and average N runs
gba::benchmark::do_not_optimize(fn)Prevent dead-code elimination
gba::benchmark::cycle_counterManual 32-bit timer pair counter

GDB Pretty Printers

stdgba ships Python pretty-printers under gdb/ so common library types are shown in a readable form while debugging.

Instead of raw storage fields, GDB can show decoded values such as fixed-point numbers, angles in degrees, key masks, timer configuration, and music tokens.

Quick start

Load the aggregate script once per GDB session:

source D:/CLionProjects/stdgba/gdb/stdgba.py

To load them automatically, add the same source ... line to your .gdbinit.

When loaded successfully, GDB prints status lines including:

  • Loading stdgba pretty printers...
  • stdgba pretty printers loaded successfully

Available printers

The aggregate loader gdb/stdgba.py imports and registers these printer modules:

ModuleExample types
gdb/fixed_point.pygba::fixed<Rep, FracBits>
gdb/angle.pygba::angle, gba::packed_angle<Bits>
gdb/format.pygba::format::compiled_format, arg_binder, bound_arg, format_generator
gdb/music.pygba::music::note, bpm_value, token_type, ast_type, token, pattern types
gdb/log.pygba::log::level
gdb/video.pygba::color, gba::object
gdb/keyinput.pygba::keypad
gdb/key.pygba::key
gdb/registral.pygba::registral<T>
gdb/memory.pygba::plex<...>, gba::unique<T>, gba::bitpool
gdb/benchmark.pygba::benchmark::cycle_counter
gdb/interrupt.pygba::irq, gba::irq_handler
gdb/timer.pygba::timer::compiled_timer

You can also source any individual module directly if you only want one printer.

Practical workflow

tests/debug/test_pretty_printers.cpp constructs representative values for all supported printer categories and includes a dedicated breakpoint marker comment.

Build the manual test target:

cmake --build build --target test_pretty_printers - -j 8

Start GDB with the produced ELF:

arm-none-eabi-gdb build/tests/test_pretty_printers.elf

Inside GDB:

source D:/CLionProjects/stdgba/gdb/stdgba.py
break main
run
# Step/next until the BREAKPOINT HERE marker in test_pretty_printers.cpp
print fix8_val
print angle_90
print key_combo
print test_pool

Expected output is human-readable (for example fixed-point decimal form and decoded key masks), rather than only raw integer fields.

Notes

  • test_pretty_printers is listed in tests/CMakeLists.txt under MANUAL_TESTS, so it is intentionally excluded from CTest automation.
  • Pretty-printers are a debugger convenience only; they do not affect generated ROM code or runtime behaviour.
  • If GDB warns about auto-load restrictions, allow the script path in your local GDB security settings before sourcing the file.

EWRAM & IWRAM Overlays

The GBA has two work RAM regions:

  • EWRAM (256 KB at 0x02000000) - external, 16-bit bus, 2 wait states
  • IWRAM (32 KB at 0x03000000) - internal, 32-bit bus, 0 wait states

Both regions are limited. Overlays let you swap different data or code into the same RAM region at runtime, effectively multiplying the usable space.

How overlays work

The toolchain linker script defines 10 overlay slots for each region (.ewram0-.ewram9 and .iwram0-.iwram9). All overlays of the same type share the same RAM address - only one can be active at a time. The initialisation data for each overlay is stored separately in ROM.

ROM:   [overlay 0 data] [overlay 1 data] [overlay 2 data] ...
         |                  |
         v                  v
RAM:   [ shared region ] - only one at a time

Placing data in overlays

Use the [[gnu::section]] attribute:

// Level 1 map data in EWRAM overlay 0
[[gnu::section(".ewram0")]]
int level1_map[1024] = { /* ... */ };

// Level 2 map data in EWRAM overlay 1
[[gnu::section(".ewram1")]]
int level2_map[1024] = { /* ... */ };

Alternatively, name source files with the overlay pattern (e.g., level1.ewram0.cpp) and the linker will route their .text sections automatically.

Getting overlay metadata

<gba/overlay> provides section descriptors with ROM source, WRAM destination, and byte size - but does not perform the copy. You choose how to load:

#include <gba/overlay>

auto ov = gba::overlay::ewram<0>;
// ov.rom   - pointer to initialization data in ROM
// ov.wram  - pointer to shared WRAM destination
// ov.bytes - size of the section in bytes

The template parameter provides compile-time bounds checking: ewram<10> is a compile error.

Loading overlays

You pick the copy method that suits your situation:

#include <gba/overlay>
#include <gba/bios>
#include <gba/dma>
#include <cstring>

auto ov = gba::overlay::ewram<0>;

// Option 1: memcpy
std::memcpy(ov.wram, ov.rom, ov.bytes);

// Option 2: CpuSet (BIOS)
gba::CpuSet(ov.rom, ov.wram, {.count = ov.bytes / 4, .set_32bit = true});

// Option 3: DMA (zero CPU time, good for large overlays)
gba::reg_dma[3] = gba::dma::copy(ov.rom, ov.wram, ov.bytes / 4);

Switching overlays

Loading a new overlay into the same region simply overwrites the previous one:

// Load level 1 data
auto ov0 = gba::overlay::ewram<0>;
std::memcpy(ov0.wram, ov0.rom, ov0.bytes);
// level1_map is now accessible

// Switch to level 2 (replaces level 1 in RAM)
auto ov1 = gba::overlay::ewram<1>;
std::memcpy(ov1.wram, ov1.rom, ov1.bytes);
// level2_map is now accessible (level1_map is overwritten)

IWRAM code overlays

IWRAM is fast - ARM code runs at full speed with no wait states. Use IWRAM overlays to swap performance-critical code modules:

// In physics.iwram0.cpp - placed in overlay 0 automatically
void physics_update() { /* hot loop */ }

// In render.iwram1.cpp - placed in overlay 1 automatically
void render_scene() { /* hot loop */ }
// Load physics code into IWRAM and run it
auto ov = gba::overlay::iwram<0>;
gba::CpuSet(ov.rom, ov.wram, {.count = ov.bytes / 4, .set_32bit = true});
physics_update();

// Swap in rendering code
auto ov1 = gba::overlay::iwram<1>;
gba::CpuSet(ov1.rom, ov1.wram, {.count = ov1.bytes / 4, .set_32bit = true});
render_scene();

Both functions occupy the same IWRAM addresses but contain different code. Only one can be called at a time.

Warning: calling a function from an overlay that is not currently loaded will execute whatever garbage is in RAM. Always load before calling.

ARM Codegen

<gba/codegen> compiles ARM instruction sequences at C++ consteval time, installs them into executable RAM at runtime, and provides zero-overhead patching to fill in runtime values without re-copying.

Quick start

The main power of codegen is patching: compile the ARM instruction sequence once, then replace runtime values (like loop counts, thresholds, or offsets) without re-copying.

#include <gba/codegen>
#include <gba/args>
#include <cstring>
using namespace gba::codegen;
using namespace gba::literals;

// 1. Define a template with named patch arguments
static constexpr auto add_const = arm_macro([](auto& b) {
    b.add_imm(arm_reg::r0, arm_reg::r0, "c"_arg)  // r0 = r0 + c
     .bx(arm_reg::lr);
});

// 2. Install into executable RAM (once)
alignas(4) std::uint32_t code[add_const.size()] = {};
std::memcpy(code, add_const.data(), add_const.size_bytes());

// 3. Patch and call - reuse the same code buffer with different constants
constexpr auto patch = add_const.patcher<int(int)>();

auto add_10 = patch(code, "c"_arg = 10u);
int result = add_10(5);  // 15 = 5 + 10

auto add_100 = patch(code, "c"_arg = 100u);
result = add_100(5);  // 105 = 5 + 100

Named placeholders such as "c"_arg are filled at patch time. No re-copy needed - the same code buffer switches from adding 10 to adding 100.


Building templates

arm_macro (preferred)

static constexpr auto tpl = arm_macro([](auto& b) {
    b.mov_imm(arm_reg::r0, 42)
     .bx(arm_reg::lr);
});

arm_macro infers the required capacity automatically. All instruction encodings are validated at consteval time - invalid operands are compile errors, not runtime surprises.

arm_macro_builder<N> (explicit capacity)

Use when the capacity must be fixed at the call site, for example inside a constinit variable or a constexpr template:

constexpr auto tpl = [] {
    auto b = arm_macro_builder<4>{};
    b.mov_imm(arm_reg::r0, 42).bx(arm_reg::lr);
    return b.compile();
}();

b.mark() returns the current word index - useful for computing forward branch targets before emitting the branch instruction.

compiled_block<N> accessors

MemberTypeDescription
data()const arm_word*Pointer to first instruction word
size()std::size_tNumber of instruction words
size_bytes()std::size_tByte count (size() * 4)
operator[]arm_wordRead a single instruction word

Patch arguments

Codegen supports two patching styles:

  • named arguments: "name"_arg
  • positional slots: imm_slot(n), s12_slot(n), b_slot(n), instr_slot(n)

Positional slots use an index n (0-31) that maps to a call-site argument.

SlotInstruction(s)Value
imm_slot(n)mov_imm, add_imm, sub_imm, orr_imm, and_imm, eor_imm, bic_imm, mvn_imm, rsb_imm, cmp_imm, tst_imm0-255
s12_slot(n)ldr_imm, str_imm-4095 … +4095
b_slot(n)b_to, b_if24-bit signed word offset
instr_slot(n)instruction(...) / word(...) / literal_word(...)Any 32-bit word

word_slot and literal_slot are aliases for instr_slot.

// Named patch args (primary)
static constexpr auto named_tpl = arm_macro([](auto& b) {
    b.mov_imm(arm_reg::r0, "x"_arg)
     .add_imm(arm_reg::r0, arm_reg::r0, "y"_arg)
     .bx(arm_reg::lr);
});

// Positional slots (alternative)
static constexpr auto slot_tpl = arm_macro([](auto& b) {
    b.mov_imm(arm_reg::r0, imm_slot(0))              // arg 0 -> 8-bit immediate
     .ldr_imm(arm_reg::r1, arm_reg::r2, s12_slot(1)) // arg 1 -> +/-4095 byte offset
     .instruction(instr_slot(2))                      // arg 2 -> full 32-bit word
     .bx(arm_reg::lr);
});

Patching

The primary workflow uses compiled_block::patcher() with named arguments. This keeps call sites self-documenting and order-independent.

Preferred: compiled_block::patcher() (named args)

static constexpr auto tpl = arm_macro([](auto& b) {
    b.mov_imm(arm_reg::r0, "value"_arg).bx(arm_reg::lr);
});

constexpr auto patch = tpl.patcher<int()>();

alignas(4) std::uint32_t code[tpl.size()] = {};
std::memcpy(code, tpl.data(), tpl.size_bytes());

auto fn = patch(code, "value"_arg = 42u);  // patch + typed function pointer

Named patch arguments are order-independent and self-documenting.

Zero-overhead variant: block_patcher<tpl> (positional)

Use this when you want fully compile-time patch metadata and positional patch values.

static constexpr auto tpl = arm_macro([](auto& b) {
    b.mov_imm(arm_reg::r0, imm_slot(0)).bx(arm_reg::lr);
});

constexpr auto fn_patch = block_patcher<tpl>{}.typed<int()>();
auto fn = fn_patch(code, 42u);

Generic Runtime Dispatch: apply_patches<Sig>(...)

Generic runtime function for when the block is not available as a constexpr at the call site, or when patching arguments need to be packed into an array before application.

Variadic form - arguments passed directly:

auto fn = apply_patches<int(int)>(tpl, code, tpl.size(), 42u);

Packed array form - pre-assembled argument array:

std::uint32_t args[] = {30u, 12u};
auto fn = apply_patches_packed<int(int)>(tpl, code, tpl.size(), args, 2);

Whole-instruction patching

Reserve an instruction word and replace it entirely at patch time. Use the checked helpers to build valid instruction values:

static constexpr auto op_tpl = arm_macro([](auto& b) {
    b.mov_imm(arm_reg::r2, imm_slot(0))
     .instruction(instr_slot(1))        // replaced at runtime
     .bx(arm_reg::lr);
});

alignas(4) std::uint32_t code[op_tpl.size()] = {};
std::memcpy(code, op_tpl.data(), op_tpl.size_bytes());

// Pick the operation at runtime
auto add_fn = apply_patches<int(int)>(op_tpl, code, op_tpl.size(),
    5u, add_reg_instr(arm_reg::r0, arm_reg::r0, arm_reg::r2));

auto sub_fn = apply_patches<int(int)>(op_tpl, code, op_tpl.size(),
    5u, sub_reg_instr(arm_reg::r0, arm_reg::r0, arm_reg::r2));

Available checked instruction helpers:

nop_instr()
add_reg_instr(rd, rn, rm)   sub_reg_instr(rd, rn, rm)
orr_reg_instr(rd, rn, rm)   and_reg_instr(rd, rn, rm)   eor_reg_instr(rd, rn, rm)
lsl_imm_instr(rd, rm, shift)   lsr_imm_instr(rd, rm, shift)
mul_instr(rd, rm, rs)

Callback Patching: apply_word_patches(...)

When instruction word patches are generated dynamically at runtime, use the callback-based apply_word_patches function instead of apply_patches. This is useful for multi-operation switching or complex patch-value computation:

static constexpr auto op_tpl = arm_macro([](auto& b) {
    b.mov_imm(arm_reg::r2, imm_slot(0))
     .instruction(instr_slot(1))        // replaced at runtime via callback
     .bx(arm_reg::lr);
});

alignas(4) std::uint32_t code[op_tpl.size()] = {};
std::memcpy(code, op_tpl.data(), op_tpl.size_bytes());

// Use a callback to generate instruction words based on patch index
apply_word_patches(op_tpl, code, op_tpl.size(), [](std::size_t patch_idx) -> std::uint32_t {
    // patch_idx == 1 here (the instruction slot)
    // Return the desired instruction word
    if (some_condition) {
        return add_reg_instr(arm_reg::r0, arm_reg::r0, arm_reg::r2);
    } else {
        return sub_reg_instr(arm_reg::r0, arm_reg::r0, arm_reg::r2);
    }
});

Instruction reference

All instructions are available as builder methods on arm_macro_builder<N> and accepted by the arm_macro lambda.

Data movement

Builder methodEffect
mov_imm(rd, imm8)rd = imm8 (0-255)
mov_imm(rd, imm_slot(n))rd = arg[n] at patch time
mov_reg(rd, rm)rd = rm

Arithmetic

MethodEffectPatch variant
add_imm(rd, rn, imm8)rd = rn + imm8imm_slot
add_reg(rd, rn, rm)rd = rn + rm
sub_imm(rd, rn, imm8)rd = rn - imm8imm_slot
sub_reg(rd, rn, rm)rd = rn - rm
rsb_imm(rd, rn, imm8)rd = imm8 - rnimm_slot
rsb_reg(rd, rn, rm)rd = rm - rn
adc_imm(rd, rn, imm8)rd = rn + imm8 + C
adc_reg(rd, rn, rm)rd = rn + rm + C
sbc_imm(rd, rn, imm8)rd = rn - imm8 - !C
sbc_reg(rd, rn, rm)rd = rn - rm - !C

Bitwise

MethodEffectPatch variant
orr_imm(rd, rn, imm8)rd = rn | imm8imm_slot
orr_reg(rd, rn, rm)rd = rn | rm
and_imm(rd, rn, imm8)rd = rn & imm8imm_slot
and_reg(rd, rn, rm)rd = rn & rm
eor_imm(rd, rn, imm8)rd = rn ^ imm8imm_slot
eor_reg(rd, rn, rm)rd = rn ^ rm
bic_imm(rd, rn, imm8)rd = rn & ~imm8imm_slot
bic_reg(rd, rn, rm)rd = rn & ~rm
mvn_imm(rd, imm8)rd = ~imm8imm_slot
mvn_reg(rd, rm)rd = ~rm

Shifts and rotates

MethodShift amountRange
lsl_imm(rd, rm, shift)Immediate0-31
lsr_imm(rd, rm, shift)Immediate1-32
asr_imm(rd, rm, shift)Immediate1-32
ror_imm(rd, rm, shift)Immediate1-31
lsl_reg(rd, rm, rs)Register rs
lsr_reg(rd, rm, rs)Register rs
asr_reg(rd, rm, rs)Register rs
ror_reg(rd, rm, rs)Register rs

Comparison / flag-setting

These set CPSR flags without writing a destination register.

MethodFlags set on
cmp_imm(rn, imm8) / cmp_reg(rn, rm)rn - operand
cmn_imm(rn, imm8) / cmn_reg(rn, rm)rn + operand
tst_imm(rn, imm8) / tst_reg(rn, rm)rn & operand
teq_imm(rn, imm8) / teq_reg(rn, rm)rn ^ operand

cmp_imm and tst_imm also accept imm_slot(n).

Memory - word and byte

MethodAccess
ldr_imm(rd, rn, offset) / str_imm(rd, rn, offset)32-bit word, offset -4095…+4095; accepts s12_slot
ldrb_imm(rd, rn, offset) / strb_imm(rd, rn, offset)Unsigned byte, immediate offset
ldrb_reg(rd, rn, rm) / strb_reg(rd, rn, rm)Unsigned byte, register offset

Memory - halfword and signed forms

MethodAccess
ldrh_imm(rd, rn, offset) / strh_imm(rd, rn, offset)Unsigned halfword, immediate offset
ldrh_reg(rd, rn, rm) / strh_reg(rd, rn, rm)Unsigned halfword, register offset
ldrsb_imm(rd, rn, offset) / ldrsb_reg(rd, rn, rm)Signed byte
ldrsh_imm(rd, rn, offset) / ldrsh_reg(rd, rn, rm)Signed halfword

Multi-register and stack

Build a register bitmask with reg_list(r0, r4, lr, ...).

MethodARM mnemonic
push(regs)STMDB SP!, {regs}
pop(regs)LDMIA SP!, {regs}
ldmia(rn, regs [,wb])LDMIA rn[!], {regs}
stmia(rn, regs [,wb])STMIA rn[!], {regs}
ldmib(rn, regs [,wb])LDMIB rn[!], {regs}
stmib(rn, regs [,wb])STMIB rn[!], {regs}
ldmda(rn, regs [,wb])LDMDA rn[!], {regs}
stmda(rn, regs [,wb])STMDA rn[!], {regs}
ldmdb(rn, regs [,wb])LDMDB rn[!], {regs}
stmdb(rn, regs [,wb])STMDB rn[!], {regs}
b.push(reg_list(arm_reg::r4, arm_reg::r5, arm_reg::lr));
// ... body ...
b.pop(reg_list(arm_reg::r4, arm_reg::r5, arm_reg::pc));

Multiply

ARM7TDMI constraint: rd must differ from rm.

MethodEffect
mul(rd, rm, rs)rd = rm * rs
mla(rd, rm, rs, rn)rd = rm * rs + rn

Branches

MethodEffect
b_to(target)Unconditional, by word index
b_to(b_slot(n))Patchable branch offset
b_if(cond, target)Conditional, by word index
b_if(cond, b_slot(n))Patchable conditional branch
bl_to(target)Branch with link
bx(rm)Branch exchange - use for function returns
blx(rm)Branch exchange with link

arm_cond values: eq ne cs/hs cc/lo mi pl vs vc hi ls ge lt gt le al

Branching patterns

b_to and b_if take a target word index - the index of the instruction you want to jump to. Use b.mark() to read the current word index at any point during construction:

// Loop: count down from r0 to zero
const auto loop_top = b.mark();          // remember top of loop
b.sub_imm(arm_reg::r0, arm_reg::r0, 1); // r0--
b.cmp_imm(arm_reg::r0, 0);
b.b_if(arm_cond::ne, loop_top);         // branch back while r0 != 0
b.bx(arm_reg::lr);

For forward branches, emit the branch first, then record where the target lands:

b.cmp_imm(arm_reg::r0, 100);
const auto branch_instr = b.mark();      // index of the b_if we're about to emit
b.b_if(arm_cond::ge, 0);                // target unknown yet - placeholder
b.add_imm(arm_reg::r0, arm_reg::r0, 5); // only reached when r0 < 100
// ... forward code goes here ...

Note: Forward branches where the target index is not yet known require arm_macro_builder<N> with explicit capacity, since you need to emit the branch before you know the target. With arm_macro you can structure control flow so that all targets are emitted before the branch (back-branches) or known from b.mark() arithmetic.


AAPCS calling convention

Generated leaf functions receive and return values through the standard ARM AAPCS convention used on GBA. No special setup is needed - just cast the destination pointer to the right type.

RoleRegister
Argument 0r0
Argument 1r1
Argument 2r2
Argument 3r3
Return valuer0

Register-form instructions (add_reg, sub_reg, mul, …) operate directly on call-time arguments without any patch slots.


Examples

Patched constant (simplest case)

This is the Quick start pattern - add a call-time argument to a patched constant:

static constexpr auto add_const = arm_macro([](auto& b) {
    b.add_imm(arm_reg::r0, arm_reg::r0, imm_slot(0))
     .bx(arm_reg::lr);
});

alignas(4) std::uint32_t code[add_const.size()] = {};
std::memcpy(code, add_const.data(), add_const.size_bytes());

constexpr block_patcher<add_const> patch{};
auto fn = patch.entry<int(int)>(code, 42u);
int result = fn(8);  // 50 = 8 + 42

Function with two call-time arguments

Both arguments come through AAPCS registers; no patching needed:

static constexpr auto add_fn = arm_macro([](auto& b) {
    b.add_reg(arm_reg::r0, arm_reg::r0, arm_reg::r1)
     .bx(arm_reg::lr);
});

alignas(4) std::uint32_t code[add_fn.size()] = {};
std::memcpy(code, add_fn.data(), add_fn.size_bytes());
auto fn = reinterpret_cast<int (*)(int, int)>(code);
int result = fn(30, 12);  // 42

Loop with patched iteration count

Count down from a patched limit:

// int countdown_by_step(int start) - counts down with a patched step size
static constexpr auto countdown_loop = arm_macro([](auto& b) {
    b.mov_imm(arm_reg::r1, 0);                        // count = 0
    const auto loop_start = b.mark();                 // loop top: index 1
    b.sub_imm(arm_reg::r0, arm_reg::r0, imm_slot(0)); // start -= step_size (patched)
    b.add_imm(arm_reg::r1, arm_reg::r1, 1);           // count++
    b.cmp_imm(arm_reg::r0, 0);                        // if start <= 0, exit
    b.b_if(arm_cond::gt, loop_start);                 // if start > 0, loop
    b.mov_reg(arm_reg::r0, arm_reg::r1);              // return count
    b.bx(arm_reg::lr);
});

alignas(4) std::uint32_t code[countdown_loop.size()] = {};
std::memcpy(code, countdown_loop.data(), countdown_loop.size_bytes());

constexpr block_patcher<countdown_loop> patch{};

// Patch step size = 1
auto count_by_1 = patch.entry<int(int)>(code, 1u);
int loops_by_1 = count_by_1(10);  // 10 iterations: 10, 9, 8, ..., 1, 0

// Re-patch: step size = 2 (no re-copy needed!)
auto count_by_2 = patch.entry<int(int)>(code, 2u);
int loops_by_2 = count_by_2(10);  // 5 iterations: 10, 8, 6, 4, 2, 0

Mixed: call-time arguments and patch-time constant

// x * 4 + c  - x is a call-time argument, c is patched in
static constexpr auto scale_add = arm_macro([](auto& b) {
    b.add_reg(arm_reg::r0, arm_reg::r0, arm_reg::r0) // *2
     .add_reg(arm_reg::r0, arm_reg::r0, arm_reg::r0) // *4
     .add_imm(arm_reg::r0, arm_reg::r0, imm_slot(0)) // + c
     .bx(arm_reg::lr);
});

constexpr block_patcher<scale_add> patch{};

alignas(4) std::uint32_t code[scale_add.size()] = {};
std::memcpy(code, scale_add.data(), scale_add.size_bytes());

auto fn = patch.entry<int(int)>(code, 2u);  // 4x + 2
int r = fn(10);  // 42

Callee-save register pattern

// int compute(int a, int b, int c)  -  (a * b) + (c << 2)
static constexpr auto compute = arm_macro([](auto& b) {
    b.push(reg_list(arm_reg::r4, arm_reg::lr));
    b.mul(arm_reg::r4, arm_reg::r0, arm_reg::r1); // r4 = a * b  (r4 != r0)
    b.lsl_imm(arm_reg::r0, arm_reg::r2, 2);       // r0 = c << 2
    b.add_reg(arm_reg::r0, arm_reg::r4, arm_reg::r0);
    b.pop(reg_list(arm_reg::r4, arm_reg::pc));
});

Conditional loop with comparison

// Count iterations from `start` until value reaches `limit`
static constexpr auto count_loop = arm_macro([](auto& b) {
    b.mov_imm(arm_reg::r2, 0);              // count = 0; index 0
    // loop top: index 1
    b.cmp_reg(arm_reg::r0, arm_reg::r1);
    b.b_if(arm_cond::ge, 5);               // exit if r0 >= limit; index 2
    b.add_imm(arm_reg::r0, arm_reg::r0, 1);// r0++; index 3
    b.add_imm(arm_reg::r2, arm_reg::r2, 1);// count++; index 4
    b.b_to(1);                             // back to loop top; index 5 - exit
    b.mov_reg(arm_reg::r0, arm_reg::r2);   // return count; index 6
    b.bx(arm_reg::lr);
});

Patchable threshold

// Returns value * 2 if below threshold, value + 10 otherwise
static constexpr auto threshold_fn = arm_macro([](auto& b) {
    b.cmp_imm(arm_reg::r0, imm_slot(0));          // index 0
    b.b_if(arm_cond::ge, 3);                       // index 1 - skip to else
    b.add_reg(arm_reg::r0, arm_reg::r0, arm_reg::r0); // *2; index 2
    b.b_to(4);                                     // index 3 - skip else
    b.add_imm(arm_reg::r0, arm_reg::r0, 10);       // +10; index 4
    b.bx(arm_reg::lr);                             // index 5
});

alignas(4) std::uint32_t code[threshold_fn.size()] = {};
std::memcpy(code, threshold_fn.data(), threshold_fn.size_bytes());

// Install with threshold = 50; re-patch any time without re-copying
constexpr block_patcher<threshold_fn> patch{};
auto fn = patch.entry<int(int)>(code, 50u);

Halfword OAM update (GBA sprite system)

// void update_sprite(volatile std::uint16_t* oam, int x, int y)
static constexpr auto update_sprite = arm_macro([](auto& b) {
    // attr0: clear Y field, insert new Y
    b.ldrh_imm(arm_reg::r3, arm_reg::r0, 0);
    b.bic_imm(arm_reg::r3, arm_reg::r3, 0xFF);
    b.orr_reg(arm_reg::r3, arm_reg::r3, arm_reg::r2);
    b.strh_imm(arm_reg::r3, arm_reg::r0, 0);
    // attr1: clear X field, insert new X
    b.ldrh_imm(arm_reg::r3, arm_reg::r0, 2);
    b.bic_imm(arm_reg::r3, arm_reg::r3, 0xFF);
    b.orr_reg(arm_reg::r3, arm_reg::r3, arm_reg::r1);
    b.strh_imm(arm_reg::r3, arm_reg::r0, 2);
    b.bx(arm_reg::lr);
});

Safety notes

  • The destination buffer must be word-aligned (alignas(4)) and located in executable RAM (IWRAM or EWRAM on GBA).
  • Encoding errors (immediate out of range, invalid register combination) are compile errors in consteval context.
  • b_to / b_if targets are in instruction words, not bytes.
  • mul / mla: rd ≠ rm (ARM7TDMI hardware constraint).
  • These APIs cover leaf-function patterns (AAPCS r0-r3 arguments, r0 return). Stack-passed arguments, calls to other functions, and floating-point are not abstracted.

Green Low Bit (grn_lo)

The GBA colour word is often described as 15-bit colour (R5G5B5), but bit 15 is not always inert.

What bit 15 is

Bit:  15      14-10  9-5    4-0
      grn_lo  Blue   Green  Red

grn_lo is the low bit of an internal 6-bit green path used by colour special effects.

  • Without blending effects, grn_lo is not visibly distinguishable.
  • With brighten/darken/alpha effects enabled, the hardware pipeline can use that extra green precision.
  • Some emulators still treat bit 15 as unused, so they render colours as if grn_lo does not exist.

Demo: hidden text using grn_lo

This demo draws two colours that differ only by bit 15, then enables brightness increase. On hardware, the hidden text becomes visible; on many emulators, it stays flat/invisible.

#include <gba/video>

static constexpr unsigned char glyphs[][5] = {
    {0b101, 0b101, 0b111, 0b101, 0b101}, // H
    {0b111, 0b100, 0b111, 0b100, 0b111}, // E
    {0b100, 0b100, 0b100, 0b100, 0b111}, // L
    {0b100, 0b100, 0b100, 0b100, 0b111}, // L
    {0b111, 0b101, 0b101, 0b101, 0b111}, // O
};

static void draw_glyph(int g, int px, int py, int scale, unsigned short color) {
    for (int row = 0; row < 5; ++row) {
        for (int col = 0; col < 3; ++col) {
            if (!(glyphs[g][row] & (4 >> col))) continue;
            for (int sy = 0; sy < scale; ++sy)
                for (int sx = 0; sx < scale; ++sx)
                    gba::mem_vram[(px + col * scale + sx) + (py + row * scale + sy) * 240] = color;
        }
    }
}

int main() {
    gba::reg_dispcnt = {.video_mode = 3, .enable_bg2 = true};

    constexpr short base = 12 << 5;                     // green=12
    constexpr unsigned short hidden = base | (1 << 15); // green=12, grn_lo=1

    for (int i = 0; i < 240 * 160; ++i) gba::mem_vram[i] = base;

    constexpr int scale = 6, ox = (240 - 19 * scale) / 2, oy = (160 - 5 * scale) / 2;
    for (int i = 0; i < 5; ++i) draw_glyph(i, ox + i * 4 * scale, oy, scale, hidden);

    // Brightness increase on BG2 - hardware processes the full 6-bit
    // green channel, revealing the hidden text on real hardware
    gba::reg_bldcnt = {.dest_bg2 = true, .blend_op = gba::blend_op_brighten};
    using namespace gba::literals;
    gba::reg_bldy = 0.25_fx;

    for (;;) {}
}

Comparison screenshots

PlatformResultScreenshot
mGBA (0.11-8996-6a99e17f5)Text is invisiblemGBA screenshot - word is invisible
Analogue Pocket (FPGA)Text is faintly visibleAnalogue Pocket screenshot - word is faintly visible
Real GBA hardwareText is visibleReal GBA hardware - word is visible

Practical guidance

  • For normal palette authoring, treat colours as 15-bit.
  • If you rely on hardware colour effects and exact output parity, test on real hardware (or FPGA implementations that model this behaviour).
  • Keep this behaviour in mind when debugging “looks different on emulator vs hardware” reports.

Undocumented Namespace

stdgba exposes a small set of BIOS calls and hardware registers through gba::undocumented. These are real features of the hardware, but they sit outside the better-traveled part of the public GBA programming model.

Use them when you know exactly why you need them. For everyday game code, prefer the documented BIOS wrappers and peripheral registers first.

What lives in gba::undocumented

Two public headers contribute to the namespace:

  • <gba/peripherals> for undocumented memory-mapped registers
  • <gba/bios> for undocumented BIOS SWIs

Why these APIs are separate

The namespace is a warning label as much as an API grouping:

  • behaviour is less commonly documented in community references
  • emulator support can be uneven
  • some features are useful mostly for diagnostics, boot-state inspection, or hardware experiments
  • some settings can break assumptions if changed casually

BIOS: GetBiosChecksum()

<gba/bios> exposes one undocumented BIOS helper:

#include <gba/bios>

auto checksum = gba::undocumented::GetBiosChecksum();
if (checksum == 0xBAAE187F) {
	// Official GBA BIOS checksum
}

This is mainly useful for:

  • sanity-checking the BIOS on real hardware
  • emulator/debug diagnostics
  • research tools that want to distinguish known BIOS images

Undocumented registers

<gba/peripherals> exposes these registers:

AddressAPITypeTypical use
0x4000002reg_stereo_3dboolHistorical GREENSWAP / stereo-3D experiment
0x4000300reg_postflgboolCheck whether the system has already passed the BIOS boot sequence
0x4000301reg_haltcnthalt_controlLow-power mode control
0x4000410reg_obj_centervolatile charRare OBJ-centre hardware experiment register
0x4000800reg_memcntmemory_controlBIOS/EWRAM control

The Undocumented Registers reference page lists the raw addresses. This page focuses on when they are practically useful.

reg_stereo_3d

#include <gba/peripherals>

gba::undocumented::reg_stereo_3d = true;

This register is historically known as GREENSWAP. It is not part of normal rendering workflows, and support can vary across emulators and hardware interpretations.

It is best treated as a curiosity or research feature, not a mainstream graphics tool.

If you are investigating colour-path behaviour, also see Green Low Bit (grn_lo).

reg_postflg

#include <gba/peripherals>

bool booted_via_bios = gba::undocumented::reg_postflg;

POSTFLG is useful when you need to know whether the machine has already passed the BIOS startup path. That mostly comes up in:

  • diagnostics
  • boot-time experiments
  • research around soft reset or alternate loaders

Most games never need to read it.

reg_haltcnt

#include <gba/peripherals>

gba::undocumented::reg_haltcnt = { .low_power_mode = true };

This directly controls low-power behaviour. In normal code, prefer the documented BIOS wrappers from <gba/bios>:

  • gba::Halt() to sleep until interrupt
  • gba::Stop() to enter deeper low-power mode

Those helpers are clearer and easier to read in application code. reg_haltcnt is most useful when you want exact register-level control.

reg_obj_center

#include <gba/peripherals>

gba::undocumented::reg_obj_center = 0;

It is unknown what this register does, but no emulator supports it. Needs additional experimentation on real hardware to determine its behaviour, if any.

reg_memcnt

#include <gba/peripherals>

gba::undocumented::reg_memcnt = {
	.ewram = true,
	.ws_ewram = 0xd,
};

MEMCNT is the most practically interesting entry in the namespace. It controls:

  • BIOS swap state
  • whether the CGB BIOS is disabled
  • whether EWRAM is enabled
  • EWRAM wait-state configuration

This makes it relevant for:

  • hardware experiments
  • boot/loader code
  • benchmarking memory timing changes

It is also one of the easiest ways to make the system unstable if you write nonsense values, so treat it carefully.

Testing expectations

Because these APIs are outside the mainline path:

  • test on real hardware when possible
  • expect emulator differences
  • isolate undocumented writes behind small helper functions so the rest of the codebase stays understandable

That is the main reason stdgba keeps them behind an explicit namespace instead of mixing them into the everyday API surface.

ECS Overview

gba::ecs is stdgba’s static Entity-Component-System for fixed-capacity Game Boy Advance projects.

It exists for the same reason most of stdgba exists: many modern patterns are nice on desktop, but they only make sense on GBA if they can be made deterministic, fixed-size, and cheap to iterate.

Why GBA needs a different ECS

Classic GBA games organise data in one of two ways:

  1. Array-per-concept: player_positions[], player_velocities[], enemy_states[], etc.

    • Fast to iterate
    • Easy to understand
    • Scales poorly (dozens of arrays become unwieldy)
  2. Object-heavy: C++ objects with pointers holding player/enemy state

    • Natural to write
    • Introduces indirection and unpredictable memory access patterns
    • ARM7TDMI has no branch predictor; pointer chasing kills frame time

gba::ecs takes a third approach: flat dense arrays organised by the ECS, but with compile-time component lists and shift-based addressing tuned for GBA’s constraints.

The result is data-oriented design without sacrificing readability.

Core principles

gba::ecs is designed around:

  • zero heap allocation – all storage is stack-allocated or embedded in EWRAM/IWRAM structs
  • compile-time component lists – types are resolved at link-time, not runtime
  • predictable iteration costs – no sparse sets, no type-erased callbacks
  • flat dense storage – all-of-type component arrays in memory order
  • generation-based entity handles – 16-bit packed handles with stale-handle detection
  • power-of-two component sizes – enables shift-based pool addressing instead of multiplies
  • constexpr safety – invalid operations fail at compile time in constant-evaluation contexts

The mental model

entity_id     -> 16-bit handle (8-bit slot + 8-bit generation)
registry      -> owns all component arrays inline in EWRAM
group         -> compile-time logical grouping of components (zero runtime cost)
view<Cs...>   -> lightweight filtered iterator over entities matching all Cs
match<Cs...>  -> ordered per-entity conditional dispatch by component query cases
system        -> plain function operating on one or more views

Example: physics movement system

void physics_system(world_type& world) {
	world.view<position, velocity>().each_arm([](position& pos, const velocity& vel) {
		pos.x += vel.vx;
		pos.y += vel.vy;
	});
}

Every ECS operation is deterministic and measurable – no hidden allocation, no callback chains.

Quick start

#include <gba/ecs>

struct position { int x, y; };
struct velocity { int vx, vy; };
struct health   { int hp; };

using world_type = gba::ecs::registry<128, position, velocity, health>;

world_type world;

auto player = world.create();
world.emplace<position>(player, 10, 20);
world.emplace<velocity>(player, 1, 0);
world.emplace<health>(player, 100);

for (auto [pos, vel] : world.view<position, velocity>()) {
	pos.x += vel.vx;
	pos.y += vel.vy;
}

Writing a system

The most important mental shift is that systems are just functions over views.

#include <gba/ecs>
#include <gba/fixed_point>

struct position {
	gba::fixed<int, 8> x;
	gba::fixed<int, 8> y;
};

struct velocity {
	gba::fixed<int, 8> vx;
	gba::fixed<int, 8> vy;
};

struct health { int hp; };

struct sprite_id {
	std::uint8_t id;
	gba::ecs::pad<3> _;
};

using world_type = gba::ecs::registry<128, position, velocity, health, sprite_id>;

void movement_system(world_type& world) {
	world.view<position, velocity>().each_arm([](position& pos, const velocity& vel) {
		pos.x += vel.vx;
		pos.y += vel.vy;
	});
}

void damage_system(world_type& world) {
	world.view<health>().each([](health& hp) {
		if (hp.hp > 0) --hp.hp;
	});
}

Use .each() when you want the most portable, straightforward path. Use .each_arm() for hot loops that you have measured and want running from ARM mode + IWRAM.

Complete API Reference

Registry construction

// Simple: list all components
using world = gba::ecs::registry<128, position, velocity, health>;

// With groups: organise components logically
using physics = gba::ecs::group<position, velocity, acceleration>;
using graphics = gba::ecs::group<sprite_id, palette_bank>;
using world = gba::ecs::registry<128, physics, graphics, health>;

Both are equivalent at runtime; groups flattened to individual components at compile time.

Entity lifecycle

OperationSignatureNotes
create()-> entity_idAllocate a new entity slot
destroy(e)(entity_id) -> voidDestroy entity; increment generation
valid(e)(entity_id) -> boolCheck if entity handle is still alive
clear()() -> voidDestroy all entities at once
size()() -> std::size_tCurrent count of alive entities

Component operations

OperationSignatureNotes
emplace<C>(e, args...)-> C&Add component C to entity e; construct with args
remove<C>(e)(entity_id) -> voidRemove component C from entity e
remove_unchecked<C>(ref)(C&) -> voidRemove by component reference (faster)
get<C>(e)(entity_id) -> C&Access component (unchecked)
try_get<C>(e)(entity_id) -> C*Access component (returns nullptr if absent)

Queries and predicates

OperationSignatureNotes
all_of<Cs...>(e)(entity_id) -> boolEntity has all listed components
any_of<Cs...>(e)(entity_id) -> boolEntity has any listed component

Iteration APIs

APIBest for
view<Cs...>() and range-forErgonomic gameplay systems with structured bindings
.each(fn)Portable systems; constexpr-friendly
.each_arm(fn)Measured hot loops requiring ARM mode + IWRAM
.each(entity_id, fn)Systems that need the entity ID alongside components

Conditional dispatch APIs

APIBest for
with<Query...>(e, fn)Single guarded callback when all queried components are present
match<Cases...>(e, fn1, fn2, ...)Ordered multi-case dispatch for one entity; all matched cases run
match_arm<Cases...>(e, fn1, fn2, ...)ARM/IWRAM hot-path version of match(...) for measured dispatch loops

match(...) snapshots case matches before callbacks run, then executes matched cases in the order declared.

// Range-for with structured bindings
for (auto [pos, vel] : world.view<position, velocity>()) {
	pos.x += vel.vx;
}

// Callback style
world.view<position, velocity>().each([](position& pos, velocity& vel) {
	pos.x += vel.vx;
});

// With entity ID
world.view<health>().each([](gba::ecs::entity_id id, health& hp) {
	if (hp.hp <= 0) world.destroy(id);
});

// ARM-mode hot loop
world.view<position, velocity>().each_arm([](position& pos, velocity& vel) {
	pos.x += vel.vx;  // Runs from ARM mode + IWRAM
});

match(...) example

using physics = gba::ecs::group<position, velocity>;

world.match<physics, health>(player,
	[](position& pos, velocity& vel) {
		pos.x += vel.vx;
		pos.y += vel.vy;
	},
	[](health& hp) {
		if (hp.hp > 0) --hp.hp;
	}
);

For an entity that has both physics and health, both callbacks run in order. For an entity that only has one case, only that callback runs. The return value is true if at least one case matched.

Why the component list is compile-time

gba::ecs asks you to name every component type up front:

using world_type = gba::ecs::registry<128, position, velocity, health>;

That buys the implementation several things:

  • no runtime type registry
  • no sparse-set hash maps
  • direct type-to-bit and type-to-pool lookup
  • compile-time diagnostics when you request a component the world does not own

It is a strong fit for GBA projects, where the total set of gameplay component types is usually small and stable.

Power-of-two component sizes

Each component type must have a power-of-two sizeof(T).

struct sprite_id {
	std::uint8_t id;
	gba::ecs::pad<3> _;
};

static_assert(sizeof(sprite_id) == 4);

This is not just a style rule - it supports the simple shift-based pool addressing the implementation is built around.

Constexpr-friendly behaviour

All core registry operations are constexpr. In constant-evaluation contexts, invalid operations produce compile-time failures instead of silent bad state.

static constexpr auto result = [] {
	gba::ecs::registry<8, int, short> reg;
	auto e = reg.create();
	reg.emplace<int>(e, 42);
	reg.emplace<short>(e, short{7});
	return reg.get<int>(e) * 100 + reg.get<short>(e);
}();

static_assert(result == 4207);

Memory consumption in EWRAM

Registry memory is all inline – no heap allocation or indirection. For a typical game setup:

gba::ecs::registry<128, position, velocity, health> world;
CategorySizeNotes
Metadata~900 bytesPer-entity tracking + free stack
Component pools~2,560 bytes128 × (8 + 8 + 4) bytes
Total~3.5 KB~26% overhead, 74% actual data

Key insight: Metadata grows linearly per entity slot (7 bytes/slot) regardless of component count. Adding more components adds component-pool storage, not metadata overhead.

Scaling examples

  • 64 entities, 3 components: ~1.7 KB
  • 128 entities, 3 components: ~3.5 KB (typical action game)
  • 256 entities, 6 components: ~8.8 KB (large world)

For context: GBA has 256 KB EWRAM and 32 KB IWRAM. A 128-entity registry uses ~1.4% of EWRAM, leaving room for graphics buffers, tilemaps, and multiple registries if needed.

Optimising EWRAM usage

If registry memory is tight:

  1. Reduce capacity: Each entity slot = 7 bytes overhead

    • 64 entities instead of 128 saves 448 bytes metadata
  2. Combine sparse components: If only 10% of entities need a component, you still allocate space for 100%

    • Consider whether to split into separate registries
  3. Careful padding: Power-of-two sizes are required but not wasteful

    • 1-byte component -> 1 byte (pad to 1, not 4)
    • 3-byte component -> needs padding to 4

Why ECS benefits GBA game architecture

Predictable memory access patterns

Arrays-of-components means systems iterate only the memory regions they need, reducing bus traffic:

View iteration over position + velocity:
  Read sequential position array
  Read sequential velocity array
  
  vs

Array-of-structs (without ECS):
  Read interleaved position/velocity/health data
  Fetch unused health values into memory bus

Without ECS, every sprite iteration would pull extra data into the memory bus even if only position is needed. Arrays keep access patterns linear and predictable.

No hidden allocations during gameplay

  • Registry is pre-allocated at startup
  • All memory lives in EWRAM or IWRAM
  • Zero dynamic allocation in the game loop
  • Deterministic frame time (no GC pauses, no allocation failures)

Flexible game architecture

  • Physics system operates on <position, velocity>
  • Rendering system operates on <sprite_id, depth>
  • Destruction system operates on <health> (with entity IDs)

Each system only touches the data it needs, keeping working set small and predictable on GBA’s 32 KB IWRAM.

Small learning curve

If you know how to write for (auto& entity : entities), you can write an ECS system. The mental model is straightforward: views are filtered arrays, systems operate on views.

Where to go next

ECS Architecture

gba::ecs uses a static, flat-storage architecture tuned for ARM7TDMI constraints. The design goal is straightforward: make the common operations for a small fixed-capacity game world cheap enough that you can reason about them without a profiler open all day.

File layout and public interface

include/gba/ecs              -> public facade
  +- registry<Capacity, Components...>
  +- group<Components...>
  +- entity_id (handle with generation)
  +- pad<N> (padding utility)

include/gba/bits/ecs/        -> internal implementation
  +- entity.hpp
  +- group.hpp
  +- group_metadata.hpp
  +- registry.hpp

Why this ECS is static

Many desktop ECS libraries optimise for:

  • unlimited entity counts
  • runtime component registration
  • dynamic archetype churn
  • scheduler/tooling integration

gba::ecs optimises for something entirely different:

  • a known maximum entity count (fits in 8 bits; max 255 entities)
  • a small compile-time component set (max 31 components)
  • simple arrays that can live inline inside one registry object
  • predictable loops for handheld game logic

That is why the registry type specifies everything at compile time:

using world_type = gba::ecs::registry<128, position, velocity, health>;

The type itself answers the architectural questions: maximum 128 live entities, exactly three component pools.

Registry storage model

Every registry owns its storage inline – no heap allocation, no indirection.

registry<Capacity, Components...>
|
+- hot metadata (cached in Thumb mode)
|  +- m_component_count[N]      (1 byte/component)
|  +- m_free_top                (1 byte)
|  +- m_next_slot               (1 byte)
|  +- m_alive                   (1 byte)
|  +- m_dense_prefix            (1 byte)
|
+- per-slot tracking
|  +- m_mask[Capacity]          (4 bytes/slot)
|  +- m_gen[Capacity]           (1 byte/slot)
|  +- m_free_stack[Capacity]    (1 byte/slot)
|  +- m_alive_list[Capacity]    (1 byte/slot)
|  +- m_alive_index[Capacity]   (1 byte/slot)
|
+- component pools
   +- std::array<C1, Capacity>  (Capacity x sizeof(C1))
   +- std::array<C2, Capacity>  (Capacity x sizeof(C2))
   +- ...

No heap allocation, sparse sets, or type-erased component maps are involved.

Memory consumption breakdown

For gba::ecs::registry<128, position, velocity, health>:

ItemSizeNotes
Metadata overhead
Hot scalars (5 bytes)5 B
Per-slot tracking (7 × 128)896 Bm_mask + m_gen + stacks + indices
Per-component count (3)3 B
Metadata subtotal904 B(26% of total)
Component pools
position (8 × 128)1024 B
velocity (8 × 128)1024 B
health (4 × 128)512 B
Data subtotal2560 B(74% of total)
Total3464 B(~3.4 KB)

General formula

For a registry with Capacity slots and N components:

Metadata = Capacity × 7 + N + 5

Component data = Capacity × Σ(sizeof(Component))

Total = Metadata + Component data

Scaling characteristics

Metadata grows linearly per slot (7 bytes) but is independent of component count. Adding more components only adds to the pool size, not metadata.

ConfigMetadataDataTotal% Overhead
64 entities, 3 components453 B1280 B1733 B26%
128 entities, 3 components904 B2560 B3464 B26%
256 entities, 3 components1803 B5120 B6923 B26%
128 entities, 6 components904 B4608 B5512 B16%

Larger registries and more components both reduce metadata percentage, making large game worlds more efficient.

Component groups and logical organisation

Component groups provide compile-time organisation without runtime overhead.

// Define conceptual groups
using physics = gba::ecs::group<position, velocity, acceleration>;
using rendering = gba::ecs::group<sprite_id, palette_bank, x_offset>;

// Use groups in registry declaration
gba::ecs::registry<128, physics, rendering, health> world;

// Internally flattened to:
// gba::ecs::registry<128, position, velocity, acceleration,
//                        sprite_id, palette_bank, x_offset, health>

Groups are completely erased at compile time. They exist for code organisation and readability, not runtime behaviour.

Why groups matter

  • Logical namespace: Physics components stay together in the code
  • No runtime cost: Groups are pure templates; zero overhead
  • No ambiguity: The registry type fully specifies what exists
  • Iterating unchanged: Use view<position, velocity>() regardless of groups
// All of these work the same way:
world.view<position, velocity>().each([](position& p, velocity& v) {
	p.x += v.vx;
});

Entity identity

entity_id is a 16-bit handle:

BitsMeaning
low 8 bitsslot index
high 8 bitsgeneration counter
15            8 7             0
+---------------+---------------+
| generation | slot |
+---------------+---------------+

Consequences:

  • maximum slots per registry: 255
  • 0xFFFF is reserved for gba::ecs::null
  • stale handles become invalid after destroy() increments generation

This is a very good match for GBA games, where worlds are usually dozens or low hundreds of entities, not tens of thousands.

Presence tracking with one mask per slot

Each slot has one std::uint32_t mask:

  • bit 31 = entity alive flag
  • bits 0-30 = component presence bits

That supports cheap queries:

  • all_of<Cs...>() -> bitwise AND against a compile-time mask
  • any_of<Cs...>() -> bitwise AND against a compile-time mask
  • view filtering -> compare the slot mask with a required mask

Logical per-entity layout vs physical storage

One of the easiest ways to misunderstand the registry is to imagine each entity as one packed struct. That is not what happens.

Logical view

For example, with these components:

ComponentSize
position8 bytes
velocity8 bytes
health4 bytes
sprite_id1 byte

The logical entity data is 21 bytes of component payload.

Physical view

The registry stores them as separate arrays:

position pool: [p0][p1][p2][p3] ...
velocity pool: [v0][v1][v2][v3] ...
health pool:   [h0][h1][h2][h3] ...
sprite pool:   [s0][s1][s2][s3] ...

That is why a view<position, velocity>() can iterate directly over only the pools it needs.

Metadata arrays and what they buy you

FieldRole
m_component_count[]Count of alive entities owning each component
m_free_topSize of the free-slot stack
m_next_slotNext never-before-used slot
m_aliveCurrent alive entity count
m_mask[]Alive + component presence bits
m_gen[]Per-slot generation counters
m_free_stack[]Recycled slot stack
m_alive_list[]Dense list of alive slots
m_alive_index[]Reverse map for O(1) removal from m_alive_list

This is the backbone of the ECS. The component pools are simple; the metadata is what makes creation, destruction, and iteration cheap.

View dispatch strategy

view<Cs...>() does not use one always-generic loop. It picks among three runtime paths:

PathConditionCost profile
Dense + all-matchevery alive entity has every requested component, and alive slots are still dense from 0..N-1no alive-list lookup, no mask check
All-match with gapsevery alive entity has every requested component, but slots are no longer densealive-list lookup, no mask check
Mixedsome alive entities are missing requested componentsalive-list lookup plus per-slot mask check

This matters because many gameplay worlds spend most of their time in one of the first two cases.

Iteration styles

APIBest for
range-for over view<Cs...>()ergonomic gameplay code
.each(fn)explicit callback style, constexpr-friendly code
.each_arm(fn)measured hot loops where ARM-mode + IWRAM placement matters

Example:

world.view<position, velocity>().each([](position& pos, const velocity& vel) {
    pos.x += vel.vx;
    pos.y += vel.vy;
});

Power-of-two component sizes

Every component type must have a power-of-two sizeof(T).

SizeAllowed?
1yes
2yes
4yes
8yes
3, 5, 6, 7, …no

If a type is almost right, pad it:

struct sprite_id {
    std::uint8_t id;
    gba::ecs::pad<3> _;
};

This rule exists to support cheap shift-based addressing in the component pools.

What the architecture intentionally omits

To stay small and predictable, gba::ecs deliberately does not include:

  • runtime component registration
  • dynamic archetype storage
  • event buses or schedulers
  • system graphs or task runners
  • serialisation or reflection

The expectation is that you compose those policies at a higher layer if your project needs them.

See Internal Implementation for the field ordering, alive-list mechanics, and the fast-path details that fall out of this architecture.

Internal Implementation

This page covers the mechanics behind gba::ecs: how entities are recycled, why metadata is ordered the way it is, and how the iteration fast paths are selected.

Field ordering inside registry

registry.hpp places small hot metadata first and large pools later:

OrderFieldWhy it is near the front
1m_component_count[]touched by view setup and component attach/remove
2m_free_toptouched by create() and destroy()
3m_next_slottouched by create()
4m_alivetouched by create/destroy/view setup
5+masks, generations, stacks, alive listsstill hot, but larger
lastcomponent poolslarge bulk storage; offset cost matters less

The comment in registry.hpp explains the main codegen reason: in Thumb-mode call paths such as create(), destroy(), and emplace(), low offsets make for cheaper loads and stores.

How entity creation works

Creation prefers recycled slots, then falls back to a never-used slot.

if free stack not empty:
	pop slot from m_free_stack
else:
	use m_next_slot and increment it

mark slot alive
append slot to m_alive_list
record reverse index in m_alive_index
increment m_alive
return entity_id(slot, generation)

That makes slot reuse deterministic and cheap.

How destruction works

Destroying an entity performs four distinct jobs:

  1. decrement component counts for every component present on that slot
  2. clear the mask and increment the generation
  3. push the slot onto m_free_stack
  4. remove the slot from m_alive_list with swap-and-pop

The important bit is swap-and-pop:

alive list before: [ 4, 7, 2, 9 ]
destroy slot 7
swap in last slot 9
alive list after:  [ 4, 9, 2 ]

That keeps removal O(1) instead of shifting a long list.

Why there is both m_alive_list and m_alive_index

FieldRole
m_alive_list[Capacity]dense list of alive slots in iteration order
m_alive_index[Capacity]reverse map from slot -> index in m_alive_list

You need both to delete from the dense list in O(1). Without the reverse map, destruction would have to scan the list to find the removed slot.

Component count tracking and fast-path selection

m_component_count[] stores how many alive entities currently own each component type.

Before iterating, a view checks whether every requested component count equals m_alive.

If true, then every alive entity has every requested component, and the loop can skip per-entity mask checks.

That is the basis of the three iteration paths:

PathConditionInner-loop work
Dense + all-matchm_alive == m_next_slot and all requested component counts equal m_alivedirect slot walk
All-match with gapsall requested component counts equal m_alive, but dense-slot condition is falsewalk m_alive_list only
Mixednot all alive entities have the requested componentswalk m_alive_list and test mask

This is a simple but effective optimisation. Many game systems operate on worlds where almost every live entity in a layer shares the same core components.

Iterator vs callback style

Both range-for and .each() are implemented on top of the same storage model, but they serve slightly different goals:

StyleBest trait
range-forergonomic syntax with structured bindings
.each()explicit callback, easy to specialise or switch to .each_arm()
.each_arm()hottest runtime path

The callback path also auto-detects whether your lambda wants an entity_id first:

world.view<health>().each([](gba::ecs::entity_id e, health& hp) {
	// id-aware system
});

match() dispatch semantics

match<Case1, Case2, ...>(entity, fn1, fn2, ...) is implemented in two phases:

  1. Evaluate all case queries and snapshot which cases match.
  2. Invoke callbacks for matched cases in declaration order.

This gives predictable dispatch when one entity can satisfy multiple cases.

PropertyBehaviour
Match timingsnapshotted before callbacks run
Callback ordersame order as case template arguments
Return valuetrue if at least one case matched
Hot-path variantmatch_arm(...) in ARM mode + IWRAM

each_arm() and why it exists

basic_view::each_arm() is annotated to build for ARM mode and live in IWRAM:

  • gnu::target("arm")
  • gnu::section(".iwram._gba_ecs_each")
  • gnu::flatten

That combination is intended for the loops you run every frame on hardware.

Why it can be faster

ChoiceBenefit
ARM modemore registers and richer addressing modes than Thumb
IWRAM placementfaster instruction fetch on target hardware
flattened callback bodybetter inlining in tight loops

In the benchmark suite, this is the path used for runtime movement and full-update loops.

Compile-time safety behaviour

The registry uses if consteval checks for invalid operations such as:

  • capacity overflow in create()
  • destroying an invalid entity
  • double-emplacing the same component
  • removing from an invalid entity

That means a misuse inside a static constexpr setup produces a compiler error instead of a bad runtime state.

The power-of-two size rule, internally

The registry enforces this with:

static_assert(((std::has_single_bit(sizeof(Components))) && ...),
			  "all component sizes must be powers of two");

It is not just stylistic. The implementation is tuned around simple addressing and predictable pool layout. If you have a 3-byte or 12-byte component, pad it to 4 or 16 bytes.

struct sprite_id {
	std::uint8_t id;
	gba::ecs::pad<3> _;
};

A concrete storage example

For this registry:

using world_type = gba::ecs::registry<128, position, velocity, health, sprite_id>;

With the component sizes:

ComponentSizePool storage
position8128 × 8 = 1024 bytes
velocity8128 × 8 = 1024 bytes
health4128 × 4 = 512 bytes
sprite_id4128 × 4 = 512 bytes (padded from 1)

Metadata breakdown:

FieldSizeNotes
m_component_count[4]4 B
Hot scalars4 Bfree_top, next_slot, alive, dense_prefix
m_mask[128]512 B4 bytes × 128 slots
m_gen[128]128 B1 byte × 128 slots
m_free_stack[128]128 B1 byte × 128 slots
m_alive_list[128]128 B1 byte × 128 slots
m_alive_index[128]128 B1 byte × 128 slots
Metadata subtotal1040 B
Component pools3072 B
Total4112 B(~4 KB)

Logical payload per entity is 21 bytes (or 25 with padding), but physical storage is split into arrays. That split is what makes selective views iterate only the data they need, keeping memory access patterns linear and predictable.

The implementation is best understood alongside these files:

  • public API: include/gba/ecs
  • implementation: include/gba/bits/ecs/registry.hpp
  • entity ID helpers: include/gba/bits/ecs/entity.hpp
  • tests: tests/ecs/test_ecs.cpp
  • runtime benchmark: benchmarks/bench_ecs.cpp
  • debug benchmark: benchmarks/bench_ecs_debug.cpp

The tests exercise lifecycle, generation invalidation, view filtering, structured bindings, constexpr use, and padding rules. The benchmarks show why the implementation keeps leaning so hard into dense arrays and low-overhead iteration.

Practical examples and patterns

Setting up a game world with groups

#include <gba/ecs>
#include <gba/fixed_point>

// Define component groups
struct position {
	gba::fixed<int, 8> x, y;
};

struct velocity {
	gba::fixed<int, 8> vx, vy;
};

struct sprite_id {
	std::uint8_t id;
	gba::ecs::pad<3> _;
};

struct health {
	int hp;
};

// Group for physics (reusable organisation)
using physics = gba::ecs::group<position, velocity>;
using rendering = gba::ecs::group<sprite_id>;

// Single registry with multiple groups
using world_type = gba::ecs::registry<256, physics, rendering, health>;

world_type world;

This is readable and scales: you can see exactly what the world contains without searching through code.

Writing systems with different iteration strategies

// Ergonomic: range-based for with structured bindings
void movement_system(world_type& world) {
	for (auto [pos, vel] : world.view<position, velocity>()) {
		pos.x += vel.vx;
		pos.y += vel.vy;
	}
}

// Portable: callback style (works in constexpr contexts)
void render_system(world_type& world) {
	world.view<sprite_id>().each([](sprite_id& sprite) {
		// upload sprite to OAM
	});
}

// Hot-path: ARM mode + IWRAM for every-frame updates
void collision_system(world_type& world) {
	world.view<position, health>().each_arm([](position& pos, health& hp) {
		// tight loop runs from IWRAM in ARM mode
		if (hp.hp <= 0) {
			// destruction handled separately
		}
	});
}

// With entity IDs for selective destruction
void health_system(world_type& world) {
	world.view<health>().each([](gba::ecs::entity_id e, health& hp) {
		if (hp.hp <= 0) {
			world.destroy(e);  // safe due to generation
		}
	});
}

Typical frame loop

int main() {
	world_type world;

	// Setup entities
	auto player = world.create();
	world.emplace<position>(player, 0, 0);
	world.emplace<velocity>(player, 0, 0);
	world.emplace<sprite_id>(player, 0);

	while (true) {
		gba::VBlankIntrWait();

		// Update phase
		movement_system(world);    // all physics
		collision_system(world);   // all collisions
		health_system(world);      // remove dead entities

		// Render phase
		render_system(world);      // upload to hardware

		// Handle input, etc.
	}
}

Every system has predictable cost. No hidden allocations, no iteration overhead.

gba::keypad Reference

gba::keypad is the high-level input state tracker from <gba/keyinput>. It wraps active-low keypad hardware semantics and provides frame-based edge detection helpers.

For raw register details (reg_keyinput, reg_keycnt), see Peripheral Registers: Keypad.

Include

#include <gba/keyinput>
#include <gba/peripherals>

Type summary

struct keypad {
    constexpr keypad& operator=(key_control keys) noexcept;

    template<template<typename> typename LogicalOp = std::logical_and, typename... Keys>
    constexpr bool held(Keys... keys) const noexcept;

    template<template<typename> typename LogicalOp = std::logical_and, typename... Keys>
    constexpr bool pressed(Keys... keys) const noexcept;

    template<template<typename> typename LogicalOp = std::logical_and, typename... Keys>
    constexpr bool released(Keys... keys) const noexcept;

    constexpr int xaxis() const noexcept;
    constexpr int i_xaxis() const noexcept;
    constexpr int yaxis() const noexcept;
    constexpr int i_yaxis() const noexcept;
    constexpr int lraxis() const noexcept;
    constexpr int i_lraxis() const noexcept;
};

Frame update contract

keypad stores previous and current state internally. Update it by assigning from gba::reg_keyinput once per game frame:

gba::keypad keys;

for (;;) {
    gba::VBlankIntrWait();
    keys = gba::reg_keyinput;

    // Query after exactly one sample per frame
}

Sampling multiple times in one frame advances history multiple times, which can make pressed()/released() behaviour appear inconsistent.

Query methods

Keys... must be gba::key masks (gba::key_a, gba::key_left, etc.).

held(keys...)

Returns whether keys are currently down.

if (keys.held(gba::key_a)) {
    // A is down this frame
}

pressed(keys...)

Returns whether keys transitioned up -> down on this frame.

if (keys.pressed(gba::key_start)) {
    // Start edge this frame
}

released(keys...)

Returns whether keys transitioned down -> up on this frame.

if (keys.released(gba::key_b)) {
    // B release edge this frame
}

Logical operators

All three query methods default to std::logical_and semantics for multiple keys.

if (keys.held(gba::key_l, gba::key_r)) {
    // L and R both held
}

You can also select std::logical_or or std::logical_not:

if (keys.pressed<std::logical_or>(gba::key_a, gba::key_b)) {
    // A or B was newly pressed
}

Axis helpers

Axis helpers are tri-state (-1, 0, 1) from the current key sample.

  • xaxis(): -1 left, +1 right
  • i_xaxis(): inverted horizontal axis
  • yaxis(): -1 down, +1 up (mathematical convention)
  • i_yaxis(): inverted vertical axis (+1 down for screen-space movement)
  • lraxis(): -1 L, +1 R
  • i_lraxis(): inverted shoulder axis

Key masks and combos

Use operator| on gba::key constants to build combinations:

auto combo = gba::key_a | gba::key_b;
if (keys.held(combo)) {
    // A+B held
}

gba::reset_combo is predefined as A + B + Select + Start.

gba::object Reference

gba::object is the regular (non-affine) OAM object entry type from <gba/video>.

Use it with gba::obj_mem when you want standard sprite placement with optional horizontal/vertical flipping.

For affine objects, see gba::object_affine.

Include

#include <gba/video>

Type summary

struct object {
    // Attribute 0
    unsigned short y : 8;
    bool : 1;
    bool disable : 1;
    gba::mode mode : 2;
    bool mosaic : 1;
    gba::depth depth : 1;
    gba::shape shape : 2;

    // Attribute 1
    unsigned short x : 9;
    short : 3;
    bool flip_x : 1;
    bool flip_y : 1;
    unsigned short size : 2;

    // Attribute 2
    unsigned short tile_index : 10;
    unsigned short background : 2;
    unsigned short palette_index : 4;
};

sizeof(gba::object) == 6 bytes.

Typical usage

gba::obj_mem[0] = {
    .y = 80,
    .x = 120,
    .shape = gba::shape_square,
    .size = 1,          // 16x16 for square sprites
    .depth = gba::depth_4bpp,
    .tile_index = 0,
    .palette_index = 0,
};

Field notes

  • disable: hide this object without clearing its other fields.
  • mode: object blend/window mode (mode_normal, mode_blend, mode_window).
  • depth: choose depth_4bpp (16-colour banked palette) or depth_8bpp (256-colour OBJ palette).
  • shape + size: together determine dimensions.
  • flip_x/flip_y: valid for regular objects.
  • background: OBJ priority relative to backgrounds (0 highest, 3 lowest).

Regular vs affine comparison

Aspectgba::object (regular)gba::object_affine
Typed OAM viewgba::obj_memgba::obj_aff_mem
Attr0 mode bitdisable hide flagaffine enabled, optional double_size
Attr1 control bitsflip_x / flip_yaffine_index (0..31)
Rotation/scalingNot supportedSupported via affine matrix
Transform sourceFlip bits onlymem_obj_affa/b/c/d entry selected by affine_index
Shared fieldsx, y, shape, size, tile_index, background, palette_index, depth, mode, mosaicSame shared fields
Best fitStandard sprites, mirroring, UI, low overheadRotating/scaling sprites, camera-facing effects

Shape/size table

ShapeSize 0Size 1Size 2Size 3
Square8x816x1632x3264x64
Wide16x832x832x1664x32
Tall8x168x3216x3232x64
  • gba::obj_mem - typed OAM as object[128]
  • gba::tile_index(ptr) - compute OBJ tile index from an OBJ VRAM pointer
  • gba::mem_vram_obj - raw object VRAM

gba::object_affine Reference

gba::object_affine is the affine OAM object entry type from <gba/video>.

Use it with gba::obj_aff_mem when sprite rotation/scaling (OBJ affine transform) is required.

For regular objects with flip bits, see gba::object.

Include

#include <gba/video>

Type summary

struct object_affine {
    // Attribute 0
    unsigned short y : 8;
    bool affine : 1 = true;
    bool double_size : 1;
    gba::mode mode : 2;
    bool mosaic : 1;
    gba::depth depth : 1;
    gba::shape shape : 2;

    // Attribute 1
    unsigned short x : 9;
    unsigned short affine_index : 5;
    unsigned short size : 2;

    // Attribute 2
    unsigned short tile_index : 10;
    unsigned short background : 2;
    unsigned short palette_index : 4;
};

sizeof(gba::object_affine) == 6 bytes.

Typical usage

gba::obj_aff_mem[0] = {
    .y = 80,
    .x = 120,
    .affine_index = 0,
    .shape = gba::shape_square,
    .size = 1,
    .depth = gba::depth_4bpp,
    .tile_index = 0,
};

// Configure affine matrix 0 through mem_obj_affa/b/c/d as needed.

Field notes

  • affine: set for affine rendering mode (enabled by default in the struct).
  • double_size: doubles the render box so rotated/scaled sprites are less likely to clip.
  • affine_index: selects one of 32 affine parameter sets (0..31).
  • shape + size: still determine the base dimensions before affine transform.
  • flip_x/flip_y do not exist on affine entries; transform comes from the affine matrix.

Affine parameter memory

<gba/video> provides these typed views over OAM affine parameters:

  • gba::mem_obj_affa (pa)
  • gba::mem_obj_affb (pb)
  • gba::mem_obj_affc (pc)
  • gba::mem_obj_affd (pd)

Embedded Sprite Type Reference

gba::embed::indexed4() and gba::embed::indexed8() expose sprite-facing helpers in slightly different shapes.

Include

#include <gba/embed>

indexed4 result summary

template<unsigned int Width, unsigned int Height, std::size_t PaletteSize, std::size_t TileCount, std::size_t MapSize>
struct indexed4_result {
    std::array<gba::color, PaletteSize> palette;
    gba::sprite4<Width, Height, TileCount> sprite;
    std::array<gba::screen_entry, MapSize> map;
};

Key members

  • palette: indexed palette data
  • sprite: 4bpp tile payload + obj() / obj_aff() OAM helpers
  • map: background-style tilemap (screenblock order)

indexed8 result summary

template<unsigned int Width, unsigned int Height, std::size_t PaletteSize, std::size_t TileCount, std::size_t MapSize>
struct indexed8_result {
    std::array<gba::color, PaletteSize> palette;
    std::array<gba::tile8bpp, TileCount> tiles;
    std::array<gba::screen_entry, MapSize> map;

    static constexpr gba::object obj(unsigned short tile_index = 0);
    static constexpr gba::object_affine obj_aff(unsigned short tile_index = 0);
};

indexed8 exposes OAM helpers directly on the result type instead of through a nested sprite field.

OAM helpers (4bpp)

obj(tile_index)

Returns a regular (non-affine) gba::object entry pre-configured with:

  • sprite dimensions from the source image
  • tile index set to tile_index (default 0)
  • 4bpp/8bpp depth matching the source
  • all other fields zeroed (position, flip, palette bank)
constexpr auto sprite = gba::embed::indexed4<gba::embed::dedup::none>([] {
    return std::to_array<unsigned char>({
#embed "hero.png"
    });
});

gba::obj_mem[0] = sprite.sprite.obj(tile_base);
gba::obj_mem[0].x = 120;
gba::obj_mem[0].y = 80;

obj_aff(tile_index)

Returns an affine gba::object_affine entry pre-configured the same way as obj(), but with:

  • affine flag always set
  • affine_index zeroed (assign your affine matrix index after)
gba::obj_aff_mem[0] = sprite.sprite.obj_aff(tile_base);
gba::obj_aff_mem[0].affine_index = 0;
gba::obj_aff_mem[0].x = 120;
gba::obj_aff_mem[0].y = 80;

Valid sprite sizes

The sprite type is only created when the source image dimensions match a legal GBA OBJ size:

ShapeSizes
Square8x8, 16x16, 32x32, 64x64
Wide16x8, 32x8, 32x16, 64x32
Tall8x16, 8x32, 16x32, 32x64

If the source does not match, the converter rejects it at compile time.

Upload pattern

// Copy tile data to OBJ VRAM
const auto base_tile = gba::tile_index(gba::memory_map(gba::mem_vram_obj));
std::memcpy(gba::memory_map(gba::mem_vram_obj), sprite.sprite.data(), sprite.sprite.size());

// Copy palette to OBJ palette RAM
std::copy(sprite.palette.begin(), sprite.palette.end(), gba::pal_obj_bank[0]);

// Create OAM entry
gba::obj_mem[0] = sprite.sprite.obj(base_tile);

Animated Sprite Sheet Type Reference

The result structure returned by gba::embed::indexed4_sheet<FrameW, FrameH>() holds frame-packed tile data and compile-time animation builders.

Include

#include <gba/embed>

Sheet result type summary

template<unsigned int FrameW, unsigned int FrameH, unsigned int Cols, unsigned int Rows, std::size_t PaletteSize>
struct sheet4_result {
    static constexpr unsigned int frame_count = Cols * Rows;
    static constexpr unsigned int tiles_per_frame = (FrameW / 8u) * (FrameH / 8u);
    static constexpr std::size_t total_tiles = frame_count * tiles_per_frame;

    std::array<gba::color, PaletteSize> palette;
    gba::sprite4<FrameW, FrameH, total_tiles> sprite;
    
    // Frame indexing
    static constexpr unsigned int tile_offset(unsigned int frame) noexcept;
    static constexpr gba::object frame_obj(unsigned short base_tile, unsigned int frame, unsigned short palette_index = 0);
    static constexpr gba::object_affine frame_obj_aff(unsigned short base_tile, unsigned int frame, unsigned short palette_index = 0);
    
    // Animation builders (return flipbook types with .frame(tick) methods)
    static consteval auto forward<Start, Count>();
    static consteval auto ping_pong<Start, Count>();
    static consteval auto sequence<"...">();
    static consteval auto row<R>();
};

Members

  • palette - 16-colour OBJ palette shared across all frames
  • sprite - frame-packed 4bpp tile payload ready for OBJ VRAM upload

Frame access

tile_offset(frame)

Returns the tile offset (in tiles, not bytes) for a given frame. Used when manually managing OBJ VRAM layout.

const auto base_tile = gba::tile_index(gba::memory_map(gba::mem_vram_obj));
auto offset = actor.tile_offset(frame_index);
gba::obj_mem[0].tile_index = base_tile + offset;

frame_obj(base_tile, frame, palette_index)

Returns a regular (non-affine) gba::object entry for a specific frame.

gba::obj_mem[0] = actor.frame_obj(base_tile, current_frame, 0);
gba::obj_mem[0].x = 120;
gba::obj_mem[0].y = 80;

frame_obj_aff(base_tile, frame, palette_index)

Returns an affine gba::object_affine entry for a specific frame.

gba::obj_aff_mem[0] = actor.frame_obj_aff(base_tile, current_frame, 0);
gba::obj_aff_mem[0].affine_index = 0;

Animation builders

All animation builders are compile-time helpers that return a flipbook type with a .frame(tick) method.

forward<Start, Count>()

Compile-time sequential flipbook: frames play in order once.

static constexpr auto idle = actor.forward<0, 4>();

unsigned int frame = idle.frame(tick / 8);  // Cycles: 0, 1, 2, 3, 0, 1, 2, 3, ...

ping_pong<Start, Count>()

Compile-time forward-then-reverse flipbook: frames play forward, then reverse (excluding the endpoints to avoid doubling them).

static constexpr auto walk = actor.ping_pong<0, 4>();

unsigned int frame = walk.frame(tick / 8);  // Cycles: 0, 1, 2, 3, 2, 1, 0, 1, 2, 3, 2, 1, ...

sequence<"...">()

Explicit frame sequence via string literal. Characters 0-9 map to frames 0-9; a-z continue from frame 10 upward, and A-Z map the same way as lowercase.

static constexpr auto attack = actor.sequence<"01232100">();

unsigned int frame = attack.frame(tick / 10);  // Cycles through the specified sequence

row<R>()

Returns a row-scoped builder for multi-row sprite sheets (e.g., one direction per row).

static constexpr auto down  = actor.row<0>().ping_pong<0, 3>();
static constexpr auto left  = actor.row<1>().ping_pong<0, 3>();
static constexpr auto right = actor.row<2>().ping_pong<0, 3>();
static constexpr auto up    = actor.row<3>().ping_pong<0, 3>();

The result is still a sheet-global frame index, so it plugs directly into frame_obj().

Flipbook .frame(tick) method

All animation builders return a flipbook type with:

constexpr std::size_t frame(std::size_t tick) const;

This maps a monotonically-increasing tick value to a frame index within the animation sequence.

unsigned int tick = 0;
const auto walk = actor.ping_pong<0, 4>();

while (true) {
    gba::VBlankIntrWait();
    unsigned int frame = walk.frame(tick / 8);  // Update every 8 ticks
    gba::obj_mem[0] = actor.frame_obj(base_tile, frame, 0);
    ++tick;
}

Sheet layout

Frames are laid out contiguously in OBJ VRAM. The converter ensures:

  • whole sheet uses one shared 15-colour palette + transparent index 0
  • frames are tile-aligned for simple base_tile + tile_offset(frame) indexing
  • no runtime repacking is needed

Upload pattern

#include <algorithm>
#include <cstring>
#include <gba/embed>

static constexpr auto actor = gba::embed::indexed4_sheet<16, 16>([] {
    return std::to_array<unsigned char>({
#embed "actor.png"
    });
});

// Copy tile data and palette to hardware
const auto base_tile = gba::tile_index(gba::memory_map(gba::mem_vram_obj));
std::memcpy(gba::memory_map(gba::mem_vram_obj), actor.sprite.data(), actor.sprite.size());
std::copy(actor.palette.begin(), actor.palette.end(), gba::pal_obj_bank[0]);

// Use frame_obj() to create OAM entries
auto walk = actor.ping_pong<0, 4>();
gba::obj_mem[0] = actor.frame_obj(base_tile, walk.frame(tick / 8), 0);

Constraints

  • all frames must fit within one 15-colour palette (index 0 always transparent)
  • frame dimensions must match a legal GBA OBJ size
  • frame width x height must divide the source image evenly

Violations are rejected at compile time.

Peripheral Register Reference

This is a complete reference of every memory-mapped I/O register exposed by stdgba. Registers are grouped by subsystem and listed by hardware address.

All registers are declared in <gba/peripherals> unless noted otherwise. DMA registers are in <gba/dma>, palette memory symbols are in <gba/color>, and VRAM/OAM symbols are in <gba/video>.

How to read this reference

Each entry shows:

  • stdgba name - the inline constexpr variable you use in code
  • Address - the memory-mapped hardware address
  • Access - R (read), W (write), or RW (read-write)
  • Type - the bitfield struct or integer type
  • tonclib name - the equivalent #define from tonclib/libtonc

Array registers are written as name[N] with their element stride.

LCD

AddressstdgbaAccessTypetonclib
0x4000000reg_dispcntRWdisplay_controlREG_DISPCNT
0x4000004reg_dispstatRWdisplay_statusREG_DISPSTAT
0x4000006reg_vcountRconst unsigned shortREG_VCOUNT

display_control

struct display_control {
    unsigned short video_mode : 3; // Video mode (0-5)
    bool cgb : 1;                  // CGB mode flag (read-only)
    unsigned short page : 1;       // Page select for mode 4/5
    bool hblank_oam_free : 1;      // Allow OAM access during HBlank
    bool linear_obj_tilemap : 1;   // OBJ VRAM 1D mapping
    bool disable : 1;              // Force blank
    bool enable_bg0 : 1;
    bool enable_bg1 : 1;
    bool enable_bg2 : 1;
    bool enable_bg3 : 1;
    bool enable_obj : 1;
    bool enable_win0 : 1;
    bool enable_win1 : 1;
    bool enable_obj_win : 1;
};
gba::reg_dispcnt = { .video_mode = 3, .enable_bg2 = true };

display_status

struct display_status {
    const bool currently_vblank : 1;
    const bool currently_hblank : 1;
    const bool currently_vcount : 1;
    bool enable_irq_vblank : 1;
    bool enable_irq_hblank : 1;
    bool enable_irq_vcount : 1;
    short : 2;
    unsigned short vcount_setting : 8; // VCount trigger value
};
gba::reg_dispstat = { .enable_irq_vblank = true };

Backgrounds

AddressstdgbaAccessTypetonclib
0x4000008reg_bgcnt[4]RWbackground_control[4]REG_BG0CNT..REG_BG3CNT
0x4000010reg_bgofs[4][2]Wvolatile short[4][2]REG_BG0HOFS etc.
0x4000020reg_bgp[2][4]Wvolatile fixed<short>[2][4]REG_BG2PA etc.
0x4000028reg_bgx[2]Wvolatile fixed<int,8>[2]REG_BG2X, REG_BG3X
0x400002Creg_bgy[2]Wvolatile fixed<int,8>[2]REG_BG2Y, REG_BG3Y
0x4000020reg_bg_affine[2]Wvolatile background_matrix[2]REG_BG_AFFINE

background_control

struct background_control {
    unsigned short priority : 2;    // BG priority (0 = highest)
    unsigned short charblock : 2;   // Character base block (0-3)
    short : 2;
    bool mosaic : 1;                // Enable mosaic effect
    bool bpp8 : 1;                  // 8bpp mode (false = 4bpp)
    unsigned short screenblock : 5; // Screen base block (0-31)
    bool wrap_affine_tiles : 1;     // Wrap for affine BGs
    unsigned short size : 2;        // BG size
};
gba::reg_bgcnt[0] = { .screenblock = 31, .charblock = 0 };

background_matrix

struct background_matrix {
    fixed<short> p[4]; // pa, pb, pc, pd
    fixed<int, 8> x;   // Reference point X
    fixed<int, 8> y;   // Reference point Y
};

The scroll registers reg_bgofs[bg][axis] are indexed as [bg_index][0=x, 1=y]. The affine registers reg_bgp[bg][coeff] are indexed relative to BG2 (index 0 = BG2, index 1 = BG3).

Windowing

AddressstdgbaAccessTypetonclib
0x4000040reg_winh[2]Wvolatile unsigned char[2]REG_WIN0H
0x4000044reg_winv[2]Wvolatile unsigned char[2]REG_WIN0V
0x4000048reg_winin[2]RWwindow_control[2]REG_WININ
0x400004Areg_winoutRWwindow_controlREG_WINOUT
0x400004Breg_winobjRWwindow_controlREG_WINOUT (hi byte)

window_control

struct window_control {
    bool enable_bg0 : 1;
    bool enable_bg1 : 1;
    bool enable_bg2 : 1;
    bool enable_bg3 : 1;
    bool enable_obj : 1;
    bool enable_color_effect : 1;
};
gba::reg_winin[0] = { .enable_bg0 = true, .enable_obj = true };

Mosaic

AddressstdgbaAccessTypetonclib
0x400004Creg_mosaicbgRWmosaic_controlREG_MOSAIC (lo)
0x400004Dreg_mosaicobjRWmosaic_controlREG_MOSAIC (hi)

mosaic_control

struct mosaic_control {
    unsigned char add_h : 4; // Horizontal stretch (0-15)
    unsigned char add_v : 4; // Vertical stretch (0-15)
};

Colour Effects

AddressstdgbaAccessTypetonclib
0x4000050reg_bldcntRWblend_controlREG_BLDCNT
0x4000052reg_bldalpha[2]RWfixed<unsigned char>[2]REG_BLDALPHA
0x4000054reg_bldyRWfixed<unsigned char>REG_BLDY

blend_control

struct blend_control {
    bool dest_bg0 : 1;    // 2nd target layers
    bool dest_bg1 : 1;
    bool dest_bg2 : 1;
    bool dest_bg3 : 1;
    bool dest_obj : 1;
    bool dest_backdrop : 1;
    blend_op blend_op : 2; // none / alpha / brighten / darken
    bool src_bg0 : 1;     // 1st target layers
    bool src_bg1 : 1;
    bool src_bg2 : 1;
    bool src_bg3 : 1;
    bool src_obj : 1;
    bool src_backdrop : 1;
};
gba::reg_bldcnt = {
    .src_bg0 = true,
    .dest_bg1 = true,
    .blend_op = gba::blend_op_alpha
};
gba::reg_bldalpha[0] = 0.5_fx; // EVA (source weight)
gba::reg_bldalpha[1] = 0.5_fx; // EVB (target weight)

Sound

Channel 1 (Square with Sweep)

AddressstdgbaAccessTypetonclib
0x4000060reg_sound1cnt_lRWsound1_sweepREG_SND1SWEEP
0x4000062reg_sound1cnt_hRWsound_duty_envelopeREG_SND1CNT
0x4000064reg_sound1cnt_xRWsound_frequencyREG_SND1FREQ

sound1_sweep

struct sound1_sweep {
    unsigned short shift : 3;     // Sweep shift (0-7)
    unsigned short direction : 1; // 0 = increase, 1 = decrease
    unsigned short time : 3;      // Sweep time (units of 7.8ms)
};

sound_duty_envelope

Shared by channels 1 and 2.

struct sound_duty_envelope {
    unsigned short length : 6;        // Sound length (0-63)
    unsigned short duty : 2;          // Duty cycle (0=12.5%, 1=25%, 2=50%, 3=75%)
    unsigned short env_step : 3;      // Envelope step time
    unsigned short env_direction : 1; // 0 = decrease, 1 = increase
    unsigned short env_volume : 4;    // Initial volume (0-15)
};

sound_frequency

Shared by channels 1, 2, and 3.

struct sound_frequency {
    unsigned short rate : 11; // Frequency rate (131072/(2048-rate) Hz)
    unsigned short : 3;
    bool timed : 1;           // false = continuous, true = use length
    bool trigger : 1;         // Write true to start/restart
};
gba::reg_sound1cnt_l = { .shift = 2, .time = 3 };
gba::reg_sound1cnt_h = { .duty = 2, .env_volume = 15 };
gba::reg_sound1cnt_x = { .rate = 1750, .trigger = true }; // ~440 Hz

Channel 2 (Square)

AddressstdgbaAccessTypetonclib
0x4000068reg_sound2cnt_lRWsound_duty_envelopeREG_SND2CNT
0x400006Creg_sound2cnt_hRWsound_frequencyREG_SND2FREQ

Uses the same sound_duty_envelope and sound_frequency types as channel 1.

Channel 3 (Wave)

AddressstdgbaAccessTypetonclib
0x4000070reg_sound3cnt_lRWsound3_controlREG_SND3SEL
0x4000072reg_sound3cnt_hRWsound3_length_volumeREG_SND3CNT
0x4000074reg_sound3cnt_xRWsound_frequencyREG_SND3FREQ
0x4000090reg_wave_ram[4]RWunsigned int[4]REG_WAVE_RAM

sound3_control

struct sound3_control {
    unsigned short : 5;
    bool bank_mode : 1;   // false = 2x32 samples, true = 1x64
    bool bank_select : 1; // Select bank (0 or 1) for 2x32
    bool enable : 1;
};

sound3_length_volume

struct sound3_length_volume {
    unsigned short length : 8; // Sound length (0-255)
    unsigned short : 5;
    unsigned short volume : 2; // 0=mute, 1=100%, 2=50%, 3=25%
    bool force_75 : 1;         // Force 75% volume
};

Channel 4 (Noise)

AddressstdgbaAccessTypetonclib
0x4000078reg_sound4cnt_lRWsound4_envelopeREG_SND4CNT
0x400007Creg_sound4cnt_hRWsound4_frequencyREG_SND4FREQ

sound4_envelope

struct sound4_envelope {
    unsigned short length : 6;
    unsigned short : 2;
    unsigned short env_step : 3;
    unsigned short env_direction : 1; // 0 = decrease, 1 = increase
    unsigned short env_volume : 4;    // Initial volume (0-15)
};

sound4_frequency

struct sound4_frequency {
    unsigned short div_ratio : 3; // Frequency divider ratio
    bool width : 1;               // Counter width (false=15-bit, true=7-bit)
    unsigned short shift : 4;     // Shift clock frequency
    unsigned short : 6;
    bool timed : 1;
    bool trigger : 1;
};

Master Control

AddressstdgbaAccessTypetonclib
0x4000080reg_soundcnt_lRWsound_control_lREG_SNDDMGCNT
0x4000082reg_soundcnt_hRWsound_control_hREG_SNDDSCNT
0x4000084reg_soundcnt_xRWsound_control_xREG_SNDSTAT
0x4000088reg_soundbiasRWsound_biasREG_SNDBIAS
0x40000A0reg_fifo_aWvolatile unsigned intREG_FIFO_A
0x40000A4reg_fifo_bWvolatile unsigned intREG_FIFO_B

sound_control_l - PSG volume and routing

struct sound_control_l {
    unsigned short volume_right : 3; // Right master volume (0-7)
    unsigned short : 1;
    unsigned short volume_left : 3;  // Left master volume (0-7)
    unsigned short : 1;
    bool enable_1_right : 1;
    bool enable_2_right : 1;
    bool enable_3_right : 1;
    bool enable_4_right : 1;
    bool enable_1_left : 1;
    bool enable_2_left : 1;
    bool enable_3_left : 1;
    bool enable_4_left : 1;
};

sound_control_h - DirectSound/mixer

struct sound_control_h {
    unsigned short psg_volume : 2;  // PSG volume (0=25%, 1=50%, 2=100%)
    bool dma_a_volume : 1;         // DMA A volume (0=50%, 1=100%)
    bool dma_b_volume : 1;         // DMA B volume (0=50%, 1=100%)
    unsigned short : 4;
    bool dma_a_right : 1;
    bool dma_a_left : 1;
    bool dma_a_timer : 1;          // 0=timer0, 1=timer1
    bool dma_a_reset : 1;          // Reset FIFO
    bool dma_b_right : 1;
    bool dma_b_left : 1;
    bool dma_b_timer : 1;
    bool dma_b_reset : 1;
};

sound_control_x - Master enable

struct sound_control_x {
    bool sound1_on : 1; // (read-only)
    bool sound2_on : 1; // (read-only)
    bool sound3_on : 1; // (read-only)
    bool sound4_on : 1; // (read-only)
    unsigned short : 3;
    bool master_enable : 1;
};
gba::reg_soundcnt_x = { .master_enable = true };
gba::reg_soundcnt_l = {
    .volume_right = 7, .volume_left = 7,
    .enable_1_right = true, .enable_1_left = true
};

DMA

Declared in <gba/dma>.

AddressstdgbaAccessTypetonclib
0x40000B0reg_dmasad[4]Wconst void* volatile[4]REG_DMA0SAD..REG_DMA3SAD
0x40000B4reg_dmadad[4]Wvoid* volatile[4]REG_DMA0DAD..REG_DMA3DAD
0x40000B8reg_dmacnt_l[4]Wvolatile unsigned short[4]REG_DMA0CNT_L..REG_DMA3CNT_L
0x40000BAreg_dmacnt_h[4]RWdma_control[4]REG_DMA0CNT_H..REG_DMA3CNT_H
0x40000B0reg_dma[4]Wvolatile dma[4]-

All DMA arrays have a stride of 12 bytes between channels.

dma_control

struct dma_control {
    short : 5;
    dest_op dest_op : 2;   // increment / decrement / fixed / increment_reload
    src_op src_op : 2;     // increment / decrement / fixed
    bool repeat : 1;
    dma_type dma_type : 1; // half (16-bit) / word (32-bit)
    bool gamepak_drq : 1;
    dma_cond dma_cond : 2; // now / vblank / hblank / sound_fifo (or video_capture)
    bool irq_on_finish : 1;
    bool enable : 1;
};

dma - high-level descriptor

struct dma {
    const void* source;
    void* destination;
    unsigned short units;
    dma_control control;

    static constexpr dma copy(const void* src, void* dst, std::size_t count);
    static constexpr dma copy16(const void* src, void* dst, std::size_t count);
    static constexpr dma fill(const void* val, void* dst, std::size_t count);
    static constexpr dma fill16(const void* val, void* dst, std::size_t count);
    static constexpr dma on_vblank(const void* src, void* dst, std::size_t count);
    static constexpr dma on_hblank(const void* src, void* dst, std::size_t count);
    static constexpr dma to_fifo_a(const void* samples);
    static constexpr dma to_fifo_b(const void* samples);
};
gba::reg_dma[3] = gba::dma::copy(src, dst, 256);

Timers

AddressstdgbaAccessTypetonclib
0x4000100reg_tmcnt_l[4]RWunsigned short[4]REG_TM0D..REG_TM3D
0x4000100reg_tmcnt_l_stat[4]Rconst unsigned short[4]REG_TM0D (read)
0x4000100reg_tmcnt_l_reload[4]Wvolatile unsigned short[4]REG_TM0D (write)
0x4000102reg_tmcnt_h[4]RWtimer_control[4]REG_TM0CNT..REG_TM3CNT
0x4000100reg_tmcnt[4]RWtimer_config[4]-

All timer arrays have a stride of 4 bytes between channels.

timer_control

struct timer_control {
    cycles cycles : 2; // cycles_1 / cycles_64 / cycles_256 / cycles_1024
    bool cascade : 1;  // Cascade from previous timer
    short : 3;
    bool overflow_irq : 1;
    bool enabled : 1;
};

timer_config is a plex<unsigned short, timer_control> that writes the reload value and control register as a single 32-bit store.

gba::reg_tmcnt_h[0] = { .cycles = gba::cycles_1024, .enabled = true };

Serial Communication

AddressstdgbaAccessTypetonclib
0x4000120reg_siodata32RWunsigned intREG_SIODATA32
0x4000120reg_siomulti[4]RWunsigned short[4]REG_SIOMULTI0..3
0x4000128reg_siocntRWsio_controlREG_SIOCNT
0x4000128reg_siocnt_multiRWsio_multi_controlREG_SIOCNT
0x400012Areg_siodata8RWunsigned charREG_SIODATA8
0x400012Areg_siomlt_sendRWunsigned shortREG_SIOMLT_SEND
0x4000134reg_rcntRWrcnt_controlREG_RCNT
0x4000140reg_joycntRWjoycnt_controlREG_JOYCNT
0x4000150reg_joy_recvRconst unsigned intREG_JOY_RECV
0x4000154reg_joy_transWvolatile unsigned intREG_JOY_TRANS
0x4000158reg_joystatRWjoystat_statusREG_JOYSTAT

The serial registers at 0x4000120-0x400012A are aliased for different modes. Use reg_siocnt for Normal mode and reg_siocnt_multi for Multi-Player mode. Likewise reg_siodata32 / reg_siomulti share the same address.

Keypad

AddressstdgbaAccessTypetonclib
0x4000130reg_keyinputRconst key_controlREG_KEYINPUT
0x4000132reg_keycntRWkey_controlREG_KEYCNT

key_control

struct key_control {
    bool a : 1;
    bool b : 1;
    bool select : 1;
    bool start : 1;
    bool right : 1;
    bool left : 1;
    bool up : 1;
    bool down : 1;
    bool r : 1;
    bool l : 1;
    short : 4;
    bool irq_enabled : 1;
    bool irq_all : 1; // IRQ when ALL selected keys pressed
};

reg_keyinput is active low - a button reads false when pressed.

if (!gba::reg_keyinput.a) { /* A is held */ }

For the high-level input helper (gba::keypad) with held()/pressed()/released() and axis helpers, see book/src/reference/keypad.md.

Interrupts

AddressstdgbaAccessTypetonclib
0x4000200reg_ieRWirqREG_IE
0x4000202reg_ifRWirqREG_IF
0x4000202reg_if_statRconst irqREG_IF (read)
0x4000202reg_if_ackWvolatile irqREG_IF (write)
0x4000208reg_imeRWboolREG_IME

irq

struct irq {
    bool vblank : 1;
    bool hblank : 1;
    bool vcounter : 1;
    bool timer0 : 1;
    bool timer1 : 1;
    bool timer2 : 1;
    bool timer3 : 1;
    bool serial : 1;
    bool dma0 : 1;
    bool dma1 : 1;
    bool dma2 : 1;
    bool dma3 : 1;
    bool keypad : 1;
    bool gamepak : 1;
};
gba::reg_ie = { .vblank = true };
gba::reg_ime = true;

System

AddressstdgbaAccessTypetonclib
0x4000204reg_waitcntRWwaitcntREG_WAITCNT

waitcnt

waitcnt is the GBA wait-control register (WAITCNT), also referred to as waitctl in some documentation.

struct waitcnt {
    unsigned short sram : 2{3};
    unsigned short ws0_first : 2{1};
    unsigned short ws0_second : 1{1};
    unsigned short ws1_first : 2{};
    unsigned short ws1_second : 1{};
    unsigned short ws2_first : 2{3};
    unsigned short ws2_second : 1{};
    unsigned short phi : 2{};
    short : 1;
    bool prefetch : 1{true};
    const bool is_cgb : 1{};
};

Default-initializing with {} sets optimal ROM access timings and enables the prefetch buffer:

gba::reg_waitcnt = {};

Video Memory

Palette memory symbols are declared in <gba/color>. VRAM and OAM symbols are declared in <gba/video>.

AddressstdgbaTypetonclib
0x5000000mem_palshort[512]pal_mem
0x5000000mem_pal_bgshort[256]pal_bg_mem
0x5000200mem_pal_objshort[256]pal_obj_mem
0x5000000pal_bg_memcolor[256]pal_bg_mem
0x5000200pal_obj_memcolor[256]pal_obj_mem
0x5000000pal_bg_bankcolor[16][16]pal_bg_bank
0x5000200pal_obj_bankcolor[16][16]pal_obj_bank
0x6000000mem_vramshort[0xC000]vid_mem
0x6000000mem_vram_bgshort[0x8000]vid_mem
0x6010000mem_vram_objshort[0x4000]tile_mem_obj
0x6000000mem_tile_4bpptile4bpp[4][512]tile_mem
0x6000000mem_tile_8bpptile8bpp[4][256]tile8_mem
0x6000000mem_sescreen_entry[32][1024]se_mem
0x7000000mem_oamshort[128][3]oam_mem
0x7000000obj_memobject[128]obj_mem
0x7000000obj_aff_memobject_affine[128]obj_aff_mem
0x7000006mem_obj_afffixed<short>[128]-
0x7000006mem_obj_affafixed<short>[32]obj_aff_mem[n].pa
0x700000Emem_obj_affbfixed<short>[32]obj_aff_mem[n].pb
0x7000016mem_obj_affcfixed<short>[32]obj_aff_mem[n].pc
0x700001Emem_obj_affdfixed<short>[32]obj_aff_mem[n].pd

Undocumented Registers

These are functional but not part of the community-documented register set. Access via the gba::undocumented namespace.

AddressstdgbaAccessTypeCommon Name
0x4000002undocumented::reg_stereo_3dRWboolGREENSWAP
0x4000300undocumented::reg_postflgRWboolPOSTFLG
0x4000301undocumented::reg_haltcntRWhalt_controlHALTCNT
0x4000410undocumented::reg_obj_centerWvolatile char-
0x4000800undocumented::reg_memcntRWmemory_controlInternal Memory Control