King's Bounty for Amiga file formats

2021-01-21

King's Bounty for Amiga stores images and sounds in formats specific to it.

Sounds

The resource files that begin with a capital letter (Walk, Horns, Tele, and a few others) are sound files representing music and sound effects within the game. The sound files are in a relatively straightforward format. They are single-channel, 8-bit signed PCM, with a small 6-byte header. The header includes the number of samples (size in bytes), and the playback frequency (in Hz).

uint16 always_zero

uint16be buffer_size

uint16be frequency

int8[buffer_size] pcm_samples

The first two bytes are zeroes in all the files, so it is hard to tell whether they have meaning on their own or if they are perhaps the leading zeroes for the buffer size (making buffer size a 32-bit integer instead). The frequency bytes are always 0x2a60, suggesting a playback frequency of 10848 Hz. The buffer size is always 6 less than the size of the whole file, because the header is 6 bytes.

Images

The image files are a bit more complicated than the sounds, and much further from a widespread common format.

Each image file can contain multiple images. Many image files contain exactly 4 images, each representing a frame of a short animation. Some files have as many as 36 different images inside (each representing a tile) and some only have a single image.

Each file also has a palette of 32 colors, and possibly a nominated "transparent" color.

The actual image data is in compressed form, in a "bitplane" arrangement that was common on the Amiga.

The first two bytes of the header represent the number of images in the file, stored as a 16-bit integer. The next part of the header is variable length because it includes information for each of the images.

For a four-image animation, the width, height and transparency information are all the same for each image. In this case, the first 22 bytes are laid out like so:

4, ?, x, y, ?, x, y, ?, x, y, ?, x, y, ?

That is, the width and height are repeated 4 times (once for each frame), and there is also value both before and after each pair of width and height. This value is 31 for images where color 0 is transparent (such as troop animations) and 0 where color 0 is merely black.

(This ambiguity means that it is unclear whether the transparency value comes before each dimension pair or after, as well as what the additional value might represent.)

Palette

Each image file includes a selection of 32 colors. This selection is identical for almost all the image files, but a couple of files do have their own variants.

The palette is stored as a simple list of 16-bit integers (big-endian). The list is always 32 ints (that is, 64 bytes) long.

The first integer represents color index 0, the second integer represents color index 1, and so on up to the final integer which represents color index 31.

The specific colors can be seen as 4-bit RGB values packed into a 16 bit integer in the following pattern:

MSB           LSB
0000RRRR-GGGGBBBB

Thus pure white is represented as 0x0fff, pure black is represented as 0x0000, and bright red could be 0x0f00.

After the palette are a pair of zero bytes (int16 0) that I take to be an end marker of sorts.

Compressed data

After the end of the palette, there are six bytes (three uint16BEs) about the compressed bitmap data. The third integer represents the decompressed size of the bitmap data. The first integer represents the length of the compressed stream (in bytes). The remainder of the file will be that many bytes. The second of the three integers is always zero. (Perhaps the decompressed size is actually treated as a 32-bit integer, but none of the files are that big.)

A single compressed stream accounts for all the bitmap pixel data for all the images in the file. There is no division at a per-image or even per-row basis.

Decompressing the data

The image data is compressed with a technique called LZSS. The exact format varies from compressor to compressor, but the one used in this game is a pretty close match for the one called LZDD (as used in ms-compress).

Specifically, it uses a 4096-byte window initialized with space characters (0x20) and the following other algorithm paramers:

It bears some resemblance to other algorithms including the RefPack used in some EA games and the VRAM compression seen in the Nintendo Gameboy Advance. A similar format was used in The Lost Vikings for DOS.

Overview of the compression

There are three kinds of value in the stream: literal bytes (copied from input to the output), back-references (copied from earlier in the output), and flags (to distinguish between literals and back-references). Flags are one byte, literals are one byte, and back-references are two bytes.

The eight bits of the first byte of the compressed stream represents flags for the following eight items, starting with the least-significant bit. The least significant bit (value 1) of the flags byte tells the decoder whether to look for a literal or a back-reference next. If the bit is set, the following byte is a literal and can be copied straight to the output stream. If it is clear, then the following byte is instead the first half of a back-reference.

Example: Literals only

In the simplest case, there are no back-references. All eight values are literals, so all eight bits are set in the flags byte: the flags byte has the value 0xb11111111 (0xFF).

Thus a compressed stream that began with these nine values:

FF 11 22 33 44 55 66 77 88

... would "decompress" to start with these eight values:

11 22 33 44 55 66 77 88

Back-references

The back-references are more complicated than the literal bytes. They contain two pieces of information: how big the output is, and where to get it from.

Back-references are 2 bytes long but their two values are not spread evenly across the 16 bits. The least significant 4 bits of the second byte accounts for the length of the output 0b0000 for the minimum reference length of 3 bytes, 0b0001 for 4 bytes long, up to 0b1111 for the maximum 18 bytes long.

For example, the two byte back-reference BC A5 would write 5+3 = 8 bytes of output data.

The "where" is encoded as a twelve bit number where the least-significant 8 bits come from the first byte of the back-reference, and the most-significant four bits of the "where" comes from the most-significant bits of byte 2.

In our example BC-A5, the "where" value is 0xABC.

Explained in code
min_length = 3

max_length = 18

window_size = 4096


byte2hi = byte2 >> 4

byte2lo = byte & 0x0f

length_code = byte2lo

distance_code = (byte2hi << 8) + byte1
Making use of the distance code

This part is surprisingly complicated. I did find a single expression that fits almost all of the real cases, but it doesn't quite make sense.

length = length_code + min_length

distance = (output_so_far - distance_code - max_length) % window_size

Also occasionally a code will refer to output from before the output buffer started. In this case you can treat the byte before the buffer started as 0x20

The decompressed bitplane image data

The Amiga graphics hardware made extensive use of the "bitplane" storage method, and these images rely on that format. The decompressed data likely requires further transform to be useful in a modern environment.

Each of the images in the file are represented by a contiguous series of bytes. The 32-color palette requires 5 bits per pixel, so each pixel is 5/8 of a byte. The bitplanes are essentially 1-bit-per-pixel images, which can be overlaid on each other to find the final color index for each pixel. The bits proceed from left to right, top to bottom, with no special marker for the end of a row beyond padding to a whole byte boundary for unusual widths. Likewise, the second bitplane (value 2) starts at the byte immediately following the first bitplanes' final byte.

The most significant bit (value 128) of the first byte of the decompressed input stream represents the least significant bit (value 1) of the color index of the upper-left pixel of the first image. The remaining 7 bits in that byte represent the least-significant bit of the color indexes of the 7 pixels to the right of that pixel.

Transparency and animation

Animations do not appear to be marked in the file. You'll have to decide some other way whether the images are frames in an animation or simply different pictures.

Most images use color 0 for black. Some images (notably troops) use color index 0 for "transparent" and color index 31 for "black". These files have the value 31 before and after the image dimensions (31, width, height, 31, width, height, 31, width, height, 31, width, height, 31)