ECS Architecture
gba::ecs uses a static, flat-storage architecture tuned for ARM7TDMI constraints. The design goal is straightforward: make the common operations for a small fixed-capacity game world cheap enough that you can reason about them without a profiler open all day.
File layout and public interface
include/gba/ecs -> public facade
+- registry<Capacity, Components...>
+- group<Components...>
+- entity_id (handle with generation)
+- pad<N> (padding utility)
include/gba/bits/ecs/ -> internal implementation
+- entity.hpp
+- group.hpp
+- group_metadata.hpp
+- registry.hpp
Why this ECS is static
Many desktop ECS libraries optimise for:
- unlimited entity counts
- runtime component registration
- dynamic archetype churn
- scheduler/tooling integration
gba::ecs optimises for something entirely different:
- a known maximum entity count (fits in 8 bits; max 255 entities)
- a small compile-time component set (max 31 components)
- simple arrays that can live inline inside one registry object
- predictable loops for handheld game logic
That is why the registry type specifies everything at compile time:
using world_type = gba::ecs::registry<128, position, velocity, health>;
The type itself answers the architectural questions: maximum 128 live entities, exactly three component pools.
Registry storage model
Every registry owns its storage inline – no heap allocation, no indirection.
registry<Capacity, Components...>
|
+- hot metadata (cached in Thumb mode)
| +- m_component_count[N] (1 byte/component)
| +- m_free_top (1 byte)
| +- m_next_slot (1 byte)
| +- m_alive (1 byte)
| +- m_dense_prefix (1 byte)
|
+- per-slot tracking
| +- m_mask[Capacity] (4 bytes/slot)
| +- m_gen[Capacity] (1 byte/slot)
| +- m_free_stack[Capacity] (1 byte/slot)
| +- m_alive_list[Capacity] (1 byte/slot)
| +- m_alive_index[Capacity] (1 byte/slot)
|
+- component pools
+- std::array<C1, Capacity> (Capacity x sizeof(C1))
+- std::array<C2, Capacity> (Capacity x sizeof(C2))
+- ...
No heap allocation, sparse sets, or type-erased component maps are involved.
Memory consumption breakdown
For gba::ecs::registry<128, position, velocity, health>:
| Item | Size | Notes |
|---|---|---|
| Metadata overhead | ||
| Hot scalars (5 bytes) | 5 B | |
| Per-slot tracking (7 × 128) | 896 B | m_mask + m_gen + stacks + indices |
| Per-component count (3) | 3 B | |
| Metadata subtotal | 904 B | (26% of total) |
| Component pools | ||
| position (8 × 128) | 1024 B | |
| velocity (8 × 128) | 1024 B | |
| health (4 × 128) | 512 B | |
| Data subtotal | 2560 B | (74% of total) |
| Total | 3464 B | (~3.4 KB) |
General formula
For a registry with Capacity slots and N components:
Metadata = Capacity × 7 + N + 5
Component data = Capacity × Σ(sizeof(Component))
Total = Metadata + Component data
Scaling characteristics
Metadata grows linearly per slot (7 bytes) but is independent of component count. Adding more components only adds to the pool size, not metadata.
| Config | Metadata | Data | Total | % Overhead |
|---|---|---|---|---|
| 64 entities, 3 components | 453 B | 1280 B | 1733 B | 26% |
| 128 entities, 3 components | 904 B | 2560 B | 3464 B | 26% |
| 256 entities, 3 components | 1803 B | 5120 B | 6923 B | 26% |
| 128 entities, 6 components | 904 B | 4608 B | 5512 B | 16% |
Larger registries and more components both reduce metadata percentage, making large game worlds more efficient.
Component groups and logical organisation
Component groups provide compile-time organisation without runtime overhead.
// Define conceptual groups
using physics = gba::ecs::group<position, velocity, acceleration>;
using rendering = gba::ecs::group<sprite_id, palette_bank, x_offset>;
// Use groups in registry declaration
gba::ecs::registry<128, physics, rendering, health> world;
// Internally flattened to:
// gba::ecs::registry<128, position, velocity, acceleration,
// sprite_id, palette_bank, x_offset, health>
Groups are completely erased at compile time. They exist for code organisation and readability, not runtime behaviour.
Why groups matter
- Logical namespace: Physics components stay together in the code
- No runtime cost: Groups are pure templates; zero overhead
- No ambiguity: The registry type fully specifies what exists
- Iterating unchanged: Use
view<position, velocity>()regardless of groups
// All of these work the same way:
world.view<position, velocity>().each([](position& p, velocity& v) {
p.x += v.vx;
});
Entity identity
entity_id is a 16-bit handle:
| Bits | Meaning |
|---|---|
| low 8 bits | slot index |
| high 8 bits | generation counter |
15 8 7 0
+---------------+---------------+
| generation | slot |
+---------------+---------------+
Consequences:
- maximum slots per registry: 255
0xFFFFis reserved forgba::ecs::null- stale handles become invalid after
destroy()increments generation
This is a very good match for GBA games, where worlds are usually dozens or low hundreds of entities, not tens of thousands.
Presence tracking with one mask per slot
Each slot has one std::uint32_t mask:
- bit 31 = entity alive flag
- bits 0-30 = component presence bits
That supports cheap queries:
all_of<Cs...>()-> bitwise AND against a compile-time maskany_of<Cs...>()-> bitwise AND against a compile-time mask- view filtering -> compare the slot mask with a required mask
Logical per-entity layout vs physical storage
One of the easiest ways to misunderstand the registry is to imagine each entity as one packed struct. That is not what happens.
Logical view
For example, with these components:
| Component | Size |
|---|---|
position | 8 bytes |
velocity | 8 bytes |
health | 4 bytes |
sprite_id | 1 byte |
The logical entity data is 21 bytes of component payload.
Physical view
The registry stores them as separate arrays:
position pool: [p0][p1][p2][p3] ...
velocity pool: [v0][v1][v2][v3] ...
health pool: [h0][h1][h2][h3] ...
sprite pool: [s0][s1][s2][s3] ...
That is why a view<position, velocity>() can iterate directly over only the pools it needs.
Metadata arrays and what they buy you
| Field | Role |
|---|---|
m_component_count[] | Count of alive entities owning each component |
m_free_top | Size of the free-slot stack |
m_next_slot | Next never-before-used slot |
m_alive | Current alive entity count |
m_mask[] | Alive + component presence bits |
m_gen[] | Per-slot generation counters |
m_free_stack[] | Recycled slot stack |
m_alive_list[] | Dense list of alive slots |
m_alive_index[] | Reverse map for O(1) removal from m_alive_list |
This is the backbone of the ECS. The component pools are simple; the metadata is what makes creation, destruction, and iteration cheap.
View dispatch strategy
view<Cs...>() does not use one always-generic loop. It picks among three runtime paths:
| Path | Condition | Cost profile |
|---|---|---|
| Dense + all-match | every alive entity has every requested component, and alive slots are still dense from 0..N-1 | no alive-list lookup, no mask check |
| All-match with gaps | every alive entity has every requested component, but slots are no longer dense | alive-list lookup, no mask check |
| Mixed | some alive entities are missing requested components | alive-list lookup plus per-slot mask check |
This matters because many gameplay worlds spend most of their time in one of the first two cases.
Iteration styles
| API | Best for |
|---|---|
range-for over view<Cs...>() | ergonomic gameplay code |
.each(fn) | explicit callback style, constexpr-friendly code |
.each_arm(fn) | measured hot loops where ARM-mode + IWRAM placement matters |
Example:
world.view<position, velocity>().each([](position& pos, const velocity& vel) {
pos.x += vel.vx;
pos.y += vel.vy;
});
Power-of-two component sizes
Every component type must have a power-of-two sizeof(T).
| Size | Allowed? |
|---|---|
| 1 | yes |
| 2 | yes |
| 4 | yes |
| 8 | yes |
| 3, 5, 6, 7, … | no |
If a type is almost right, pad it:
struct sprite_id {
std::uint8_t id;
gba::ecs::pad<3> _;
};
This rule exists to support cheap shift-based addressing in the component pools.
What the architecture intentionally omits
To stay small and predictable, gba::ecs deliberately does not include:
- runtime component registration
- dynamic archetype storage
- event buses or schedulers
- system graphs or task runners
- serialisation or reflection
The expectation is that you compose those policies at a higher layer if your project needs them.
See Internal Implementation for the field ordering, alive-list mechanics, and the fast-path details that fall out of this architecture.