Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Internal Implementation

This page covers the mechanics behind gba::ecs: how entities are recycled, why metadata is ordered the way it is, and how the iteration fast paths are selected.

Field ordering inside registry

registry.hpp places small hot metadata first and large pools later:

OrderFieldWhy it is near the front
1m_component_count[]touched by view setup and component attach/remove
2m_free_toptouched by create() and destroy()
3m_next_slottouched by create()
4m_alivetouched by create/destroy/view setup
5+masks, generations, stacks, alive listsstill hot, but larger
lastcomponent poolslarge bulk storage; offset cost matters less

The comment in registry.hpp explains the main codegen reason: in Thumb-mode call paths such as create(), destroy(), and emplace(), low offsets make for cheaper loads and stores.

How entity creation works

Creation prefers recycled slots, then falls back to a never-used slot.

if free stack not empty:
	pop slot from m_free_stack
else:
	use m_next_slot and increment it

mark slot alive
append slot to m_alive_list
record reverse index in m_alive_index
increment m_alive
return entity_id(slot, generation)

That makes slot reuse deterministic and cheap.

How destruction works

Destroying an entity performs four distinct jobs:

  1. decrement component counts for every component present on that slot
  2. clear the mask and increment the generation
  3. push the slot onto m_free_stack
  4. remove the slot from m_alive_list with swap-and-pop

The important bit is swap-and-pop:

alive list before: [ 4, 7, 2, 9 ]
destroy slot 7
swap in last slot 9
alive list after:  [ 4, 9, 2 ]

That keeps removal O(1) instead of shifting a long list.

Why there is both m_alive_list and m_alive_index

FieldRole
m_alive_list[Capacity]dense list of alive slots in iteration order
m_alive_index[Capacity]reverse map from slot -> index in m_alive_list

You need both to delete from the dense list in O(1). Without the reverse map, destruction would have to scan the list to find the removed slot.

Component count tracking and fast-path selection

m_component_count[] stores how many alive entities currently own each component type.

Before iterating, a view checks whether every requested component count equals m_alive.

If true, then every alive entity has every requested component, and the loop can skip per-entity mask checks.

That is the basis of the three iteration paths:

PathConditionInner-loop work
Dense + all-matchm_alive == m_next_slot and all requested component counts equal m_alivedirect slot walk
All-match with gapsall requested component counts equal m_alive, but dense-slot condition is falsewalk m_alive_list only
Mixednot all alive entities have the requested componentswalk m_alive_list and test mask

This is a simple but effective optimisation. Many game systems operate on worlds where almost every live entity in a layer shares the same core components.

Iterator vs callback style

Both range-for and .each() are implemented on top of the same storage model, but they serve slightly different goals:

StyleBest trait
range-forergonomic syntax with structured bindings
.each()explicit callback, easy to specialise or switch to .each_arm()
.each_arm()hottest runtime path

The callback path also auto-detects whether your lambda wants an entity_id first:

world.view<health>().each([](gba::ecs::entity_id e, health& hp) {
	// id-aware system
});

match() dispatch semantics

match<Case1, Case2, ...>(entity, fn1, fn2, ...) is implemented in two phases:

  1. Evaluate all case queries and snapshot which cases match.
  2. Invoke callbacks for matched cases in declaration order.

This gives predictable dispatch when one entity can satisfy multiple cases.

PropertyBehaviour
Match timingsnapshotted before callbacks run
Callback ordersame order as case template arguments
Return valuetrue if at least one case matched
Hot-path variantmatch_arm(...) in ARM mode + IWRAM

each_arm() and why it exists

basic_view::each_arm() is annotated to build for ARM mode and live in IWRAM:

  • gnu::target("arm")
  • gnu::section(".iwram._gba_ecs_each")
  • gnu::flatten

That combination is intended for the loops you run every frame on hardware.

Why it can be faster

ChoiceBenefit
ARM modemore registers and richer addressing modes than Thumb
IWRAM placementfaster instruction fetch on target hardware
flattened callback bodybetter inlining in tight loops

In the benchmark suite, this is the path used for runtime movement and full-update loops.

Compile-time safety behaviour

The registry uses if consteval checks for invalid operations such as:

  • capacity overflow in create()
  • destroying an invalid entity
  • double-emplacing the same component
  • removing from an invalid entity

That means a misuse inside a static constexpr setup produces a compiler error instead of a bad runtime state.

The power-of-two size rule, internally

The registry enforces this with:

static_assert(((std::has_single_bit(sizeof(Components))) && ...),
			  "all component sizes must be powers of two");

It is not just stylistic. The implementation is tuned around simple addressing and predictable pool layout. If you have a 3-byte or 12-byte component, pad it to 4 or 16 bytes.

struct sprite_id {
	std::uint8_t id;
	gba::ecs::pad<3> _;
};

A concrete storage example

For this registry:

using world_type = gba::ecs::registry<128, position, velocity, health, sprite_id>;

With the component sizes:

ComponentSizePool storage
position8128 × 8 = 1024 bytes
velocity8128 × 8 = 1024 bytes
health4128 × 4 = 512 bytes
sprite_id4128 × 4 = 512 bytes (padded from 1)

Metadata breakdown:

FieldSizeNotes
m_component_count[4]4 B
Hot scalars4 Bfree_top, next_slot, alive, dense_prefix
m_mask[128]512 B4 bytes × 128 slots
m_gen[128]128 B1 byte × 128 slots
m_free_stack[128]128 B1 byte × 128 slots
m_alive_list[128]128 B1 byte × 128 slots
m_alive_index[128]128 B1 byte × 128 slots
Metadata subtotal1040 B
Component pools3072 B
Total4112 B(~4 KB)

Logical payload per entity is 21 bytes (or 25 with padding), but physical storage is split into arrays. That split is what makes selective views iterate only the data they need, keeping memory access patterns linear and predictable.

The implementation is best understood alongside these files:

  • public API: include/gba/ecs
  • implementation: include/gba/bits/ecs/registry.hpp
  • entity ID helpers: include/gba/bits/ecs/entity.hpp
  • tests: tests/ecs/test_ecs.cpp
  • runtime benchmark: benchmarks/bench_ecs.cpp
  • debug benchmark: benchmarks/bench_ecs_debug.cpp

The tests exercise lifecycle, generation invalidation, view filtering, structured bindings, constexpr use, and padding rules. The benchmarks show why the implementation keeps leaning so hard into dense arrays and low-overhead iteration.

Practical examples and patterns

Setting up a game world with groups

#include <gba/ecs>
#include <gba/fixed_point>

// Define component groups
struct position {
	gba::fixed<int, 8> x, y;
};

struct velocity {
	gba::fixed<int, 8> vx, vy;
};

struct sprite_id {
	std::uint8_t id;
	gba::ecs::pad<3> _;
};

struct health {
	int hp;
};

// Group for physics (reusable organisation)
using physics = gba::ecs::group<position, velocity>;
using rendering = gba::ecs::group<sprite_id>;

// Single registry with multiple groups
using world_type = gba::ecs::registry<256, physics, rendering, health>;

world_type world;

This is readable and scales: you can see exactly what the world contains without searching through code.

Writing systems with different iteration strategies

// Ergonomic: range-based for with structured bindings
void movement_system(world_type& world) {
	for (auto [pos, vel] : world.view<position, velocity>()) {
		pos.x += vel.vx;
		pos.y += vel.vy;
	}
}

// Portable: callback style (works in constexpr contexts)
void render_system(world_type& world) {
	world.view<sprite_id>().each([](sprite_id& sprite) {
		// upload sprite to OAM
	});
}

// Hot-path: ARM mode + IWRAM for every-frame updates
void collision_system(world_type& world) {
	world.view<position, health>().each_arm([](position& pos, health& hp) {
		// tight loop runs from IWRAM in ARM mode
		if (hp.hp <= 0) {
			// destruction handled separately
		}
	});
}

// With entity IDs for selective destruction
void health_system(world_type& world) {
	world.view<health>().each([](gba::ecs::entity_id e, health& hp) {
		if (hp.hp <= 0) {
			world.destroy(e);  // safe due to generation
		}
	});
}

Typical frame loop

int main() {
	world_type world;

	// Setup entities
	auto player = world.create();
	world.emplace<position>(player, 0, 0);
	world.emplace<velocity>(player, 0, 0);
	world.emplace<sprite_id>(player, 0);

	while (true) {
		gba::VBlankIntrWait();

		// Update phase
		movement_system(world);    // all physics
		collision_system(world);   // all collisions
		health_system(world);      // remove dead entities

		// Render phase
		render_system(world);      // upload to hardware

		// Handle input, etc.
	}
}

Every system has predictable cost. No hidden allocations, no iteration overhead.