Fixed-Point Math¶

Fixed-point arithmetic, precomputed lookup tables, and tile decompression for NGPC homebrew.

1. Fixed-Point Conventions¶

1.1 Formats Used¶

Fixed-point is the standard technique for sub-pixel positions and smooth velocities on NGPC (no FPU, no float on TLCS-900H).

Common formats in NGPC homebrew:

Format	Precision	Typical use
8.8 (u16, 256ths)	1/256 pixel per unit	Fine-grained position
1.7 (u8, 128ths)	1/128 pixel per unit	Compact position/velocity
9.7 (u16, 128ths)	1/128 pixel per unit	Most common: fits screen+off-screen

Typical velocity range: 64..384 units/frame = 0.5..3 pixels/frame (at 128ths resolution).

1.2 Common Operations¶

/* Convert fixed-point to screen pixel */
s16 screen_x = fixed_x >> 7;    /* 128ths -> pixels */
s16 screen_x = fixed_x >> 8;    /* 256ths -> pixels */

/* Move entity */
fixed_x += velocity_x;          /* velocity in same units */

/* Collision: compare in fixed-point units, not pixels */
s16 dx = (s16)(a_x - b_x);      /* both in fixed units */
if (dx < 0) dx = -dx;
if (dx < HITBOX_W << 7) { /* hit */ }

/* Integer multiply (constant divisor = shift) */
s16 half  = x >> 1;   /* x / 2 */
s16 sixth = x / 6;    /* division — cc900 calls C9H_divlu (software) */

1.3 Overflow and Promotion¶

u16 * u16 with the same type → hardware MUL (fast).
u16 * u8 mixed → software C9H_mullu (slow-ish).
If the result can exceed u16, cast to u32 before multiplying:

/* Safe: intermediate result in u32 */
u32 r = (u32)a * b;
s16 px = (s16)(r >> 7);

/* WRONG: overflow if a * b > 65535 */
s16 px = (s16)((a * b) >> 7);

Never use float — there is no FPU. Every floating-point operation would call a software emulation library that does not exist on NGPC.

2. Math API¶

2.1 Trigonometry¶

s8 ngpc_sin(u8 angle);   /* sine lookup, returns -127..127 */
s8 ngpc_cos(u8 angle);   /* cosine lookup, same range */

Angle encoding: 8-bit circle — 0-255 maps to 0°-360° (wraps).

Value	Degrees
0	0°
64	90°
128	180°
192	270°
256	wraps to 0

Usage:

/* Move in direction `angle` at speed 1 (in 1/127 pixel units) */
entity_x += ngpc_cos(angle);
entity_y += ngpc_sin(angle);

2.2 Random Number Generation¶

void ngpc_rng_seed(void);        /* seed from VBlank counter */
u16  ngpc_random(u16 max);       /* returns 0..max (LCG, good quality) */
void ngpc_qrandom_init(void);    /* shuffle pre-built table (call after rng_seed) */
u8   ngpc_qrandom(void);         /* ultra-fast table read, returns 0..255 */

ngpc_random() is a u32 LCG. It extracts bits 16-30 of the state → result should modulo down to 0..max, but on cc900 the result % ((u32)max + 1) step is broken (u32 modulo runtime helper miscompiles). In practice the returned value stays in 0..32767 regardless of max.

Consequence — hardware confirmed: - (u8)ngpc_random(6) → value ≥ 5 in ~98% of calls (expected 2/7) - (u8)ngpc_random(1) → random 0..255 instead of 0..1 - Any gameplay roll using ngpc_random(max) with small max will ignore max

Rule: never rely on ngpc_random(max) for a bounded gameplay value. Use ngpc_qrandom() + u8 modulo instead — cc900 handles u8 arithmetic cleanly:

/* Bounded roll, correct on cc900 */
u8 roll  = ngpc_qrandom() % 7u;          /* 0..6 */
u8 bonus = ngpc_qrandom() % attack_pow;  /* 0..attack_pow-1 */

ngpc_qrandom() is zero-cost (table index increment + read). The pre-shuffle is done with ngpc_random internally so the permutation is biased, but the output is still a permutation over 0..255, so qrandom() % N distributes correctly.

For wide ranges where a single qrandom() call isn't enough, combine two:

u16 big_rand = (u16)ngpc_qrandom() | ((u16)ngpc_qrandom() << 8); /* 0..65535 */

2.3 32-bit Multiply¶

s32 ngpc_mul32(s32 a, s32 b);   /* 32-bit signed multiply */

Useful when both operands may be large and the result needs full 32-bit range.

Note: on cc900, s16 * s16 calls C9H_mulls (software). u16 * u16 (same width) uses the hardware MUL opcode. For best performance, keep operands the same width.

3. Lookup Table Math¶

u8  ngpc_lut_atan2(s8 dx, s8 dy);   /* angle in 0-255 format */
u8  ngpc_lut_sqrt16(u16 n);          /* integer sqrt, returns 0-255 */
u16 ngpc_lut_dist(s8 dx, s8 dy);     /* approx distance, ~4% error */
u16 ngpc_lut_div(u16 n, u8 divisor); /* fast division via reciprocal multiply */

All use fixed-point or integer arithmetic. Zero FPU. Minimal CPU cost.

Usage example — aim a bullet at a target:

s8 dx = (s8)(target_x - bullet_x);
s8 dy = (s8)(target_y - bullet_y);
u8 angle = ngpc_lut_atan2(dx, dy);   /* 0-255 angle toward target */
bullet_vx = ngpc_cos(angle);
bullet_vy = ngpc_sin(angle);

Use ngpc_lut_dist() for range checks instead of sqrt(). For AABB overlap, avoid distance entirely — compare rect edges directly (see Collision).

4. BCD Conversion¶

Binary-coded decimal is the cheap way to render scores, timers, and counters as digits without per-frame division (which calls the slow C9H_divlu helper).

4.1 Binary to Packed BCD (no division)¶

Convert a 32-bit binary value to 8-digit packed BCD using a power-of-10 table and repeated subtract-and-count per digit — no division required. For each decimal place, subtract the corresponding power of 10 while it fits, counting the subtractions to get that digit, then shift the digit into the result with sla 4,XWA (shift the accumulator left one nibble). Worst case is ~54 subtractions for a full 8-digit value.

for each power-of-10 P (10^7 down to 10^0):
    digit = 0
    while value >= P:  value -= P;  digit++   ; count subtractions
    result = (result << 4) | digit            ; sla 4,XWA then OR in the nibble

4.2 Byte-Level BCD Helpers¶

For single packed-BCD bytes (two digits):

/* packed BCD byte -> binary 0..99 */
u8 bcd_to_bin(u8 bcd) { return (u8)((bcd >> 4) * 10 + (bcd & 0x0F)); }

/* binary 0..99 -> packed BCD byte */
u8 bin_to_packed(u8 v) { return (u8)(((v % 10) << 4) | (v / 10)); }

5. Tile Decompression¶

5.1 Runtime Decompression API¶

/* RLE — simple, fast, ~2:1 ratio */
u16  ngpc_rle_decompress(void *dst, const void NGP_FAR *src, u16 src_len);
void ngpc_rle_to_tiles(const void NGP_FAR *src, u16 src_len, u16 tile_offset);

/* LZ77/LZSS — better ratio, ~3:1 to 4:1 */
u16  ngpc_lz_decompress(void *dst, const void NGP_FAR *src, u16 src_len);
void ngpc_lz_to_tiles(const void NGP_FAR *src, u16 src_len, u16 tile_offset);

The _to_tiles functions use a 2 KB internal buffer (~128 tiles maximum per call). For larger tilesets, call in chunks with increasing tile_offset.

Usage example:

/* Load compressed tileset at tile slot 96 */
extern const u8 NGP_FAR level1_tiles_lz[];     /* compressed, in ROM */
extern const u16 level1_tiles_lz_len;
ngpc_lz_to_tiles(level1_tiles_lz, level1_tiles_lz_len, 96);

5.2 Offline Compression Tool¶

The ngpc_compress tool compresses raw tile binary data:

# LZ77 (default) — emits <name>_lz[] + <name>_lz_len
ngpc_compress tiles.bin -o tiles_lz.c -n level1_tiles --header

# RLE — emits <name>_rle[] + <name>_rle_len
ngpc_compress tiles.bin -o tiles_rle.c -m rle -n level1_tiles --header

# Auto-pick smallest
ngpc_compress tiles.bin -o tiles_best.c -m both -n level1_tiles --header

Generated symbols:

Symbol	Meaning
`<name>_lz[]` / `<name>_rle[]`	Compressed data array
`<name>_lz_len`	Compressed size in bytes — pass to runtime functions
`<name>_raw_len`	Decompressed size (informational)

5.3 Compression Constraints and Notes¶

Input must be a raw binary file (.bin or any byte stream — not a PNG).
The tool verifies roundtrip integrity by default (compress → decompress → compare).
Naming convention: _lz (not _lz77), _rle.
Use the tilemap tool's --tiles-bin flag to generate the raw binary input:

ngpc_tilemap level.png -o level.c -n level --header --tiles-bin level_tiles.bin
# then compress:
ngpc_compress level_tiles.bin -o level_tiles_lz.c -n level_tiles --header

_to_tiles functions use a 2 KB internal buffer — max ~128 tiles per call. For tilesets > 128 tiles, decompress in two or more calls with offset stepping.

Quick Reference¶

Item	Value / Pattern
Fixed-point 128ths	`>> 7` to get pixels
Fixed-point 256ths	`>> 8` to get pixels
Typical velocity	64..384 units/frame (128ths) = 0.5..3 px/frame
Safe u16×u16	Same-width cast: `(u16)a * (u16)b` → HW MUL
Safe large multiply	`(u32)a * b` before shifting down
Angle range	0-255 → 0-360° (64=90°, 128=180°, 192=270°)
`ngpc_sin/cos` range	Returns -127..127
`ngpc_random` range	Broken on cc900 — ignores `max`, returns 0..32767. Use `ngpc_qrandom() % N` for bounded rolls
`ngpc_qrandom`	Zero-cost: table read, 0-255. Safe to `% N` for any N
`ngpc_lut_atan2`	Returns 0-255 angle toward target
`ngpc_lut_dist`	~4% error, no sqrt
`ngpc_lut_sqrt16`	Integer sqrt of u16
LZ77 ratio	~3:1 to 4:1
RLE ratio	~2:1
`_to_tiles` buffer	2 KB = ~128 tiles per call max
`<name>_lz_len`	Pass to `ngpc_lz_to_tiles()` — compressed size
No float	Zero FPU on TLCS-900H — never use `float` / `double`
Division by 2^n	Use right-shift (`>> n`) — much faster than `/`