Writing a Forth for the SNES's 65816


I wanted to share some details on the Forth cross-compiler I developed as part of creating Super Sokonyan, so decided to write a quick post on my experience! Worried this might be a little long-winded, so please let me know if you have questions or suggestions on making this kind of post more useful :)


Writing the Forth itself definitely took up most of my development time during the jam. It's a subroutine-threaded Forth with its core written in Lua[^1] that cross-compiles (transliterates?) to 65816 ASM. I initially wrote the host in a direct-threaded style even though the target was subroutine threaded, but as the implementation went on I found life was much easier when I made the host and target as similar as possible. Previously, all sorts of things differed between the two. There were quite a few of these large design shifts (really, corrected mistakes) throughout development:

Host addressingInitially, every data type (integers, addresses, characters, native functions, strings, etc) took up one unit of address space on the host. This made it a nightmare to properly calculate addresses for the target. In the "finished" version, the Lua host tries to mimic the memory layout of the SNES. This made branching (both compiling and executing) _much_ easier, among other things.

Host stack width: Similar to the above, the host data stack initially used one stack element for all data types. Both of these mistakes were largely just me being lazy and hoping I could play it fast-and-loose, get a compiler off the ground quickly, and get onto the game dev in earnest, but it was not to be.

Target data stack width: I also made some awkward decisions in the initial design of the target's data stack. At first, you could push non-cell data onto the data stack, rather than only cell-sized data. This made stack manipulation a complete pain; I'd written words such as A.SWAP to SWAP two three byte addresses, AW.SWAP to swap a word underneath an address, WA.SWAP to swap an address underneath a word, etc. It was terrible xD A cell-width data stack was definitely the right decision.

Subroutine addresses on the 65816: I initially used 24-bit addresses for all subroutine calls on the target. I wanted to be able to use the full address space of the 65816, so it seemed necessary, but dealing with 24-bit addresses on a 16-bit data stack was unwieldy, slower, and at least for the scope of the game jam (4 * 32k banks of code and data), not really necessary. I eventually did add support for loading game data (tiles, tilemaps, etc) from other banks, and that didn't wind up being too hard of a lift[^2]. I spent some time thinking through how I'd add support for storing Forth words in other banks, either by using trampolines and/or copying common Forth words to each code bank, but in the end I didn't wind up needing more code space.


Writing the Forth compiler was fun, but by the time I'd gotten it to a point where I felt I could actually make a game with it I only had about two weeks left in the jam ^^; I kept putting off creating more puzzles and instead focused on adding audio/visuals/etc, so the levels are super easy and mainly just demonstrate the concept for the jam. Regardless, I'm still happy with how the game turned out!

This is really my first project of any significance written in Forth. Prior to this I'd followed the common trend of writing multiple Forth implementations and then never using them, so it was nice to finally use the language for something interesting. That said, I don't feel that this was written in an especially "Forth-y" style; It often felt like macro assembler, but I think that's probably in large part due to the low complexity of the game itself. Puzzling through Forth definitions was still fun, though! It felt like good mental exercise, though I think if I weren't doing this for fun's sake some of the stack balancing bugs would have been pretty frustrating to work through.

The compiled code is quite slow relative to assembly, but there are definitely a lot more optimizations that could be made to get closer:

  • Adding more native definitions for high-traffic words. Writing `I`, `DOLOOP`, etc in Forth itself is cute, but since those are so often used in tight loops they would really benefit from being native words.
  • Inlining commonly used functions.
  • Peephole operations like turning the Forth words of
    $addr @
    into the 65816 ASM of
    dex
    dex
    lda $addr
    sta 1,x


That's about it! Hoping to clean up the codebase some more in the coming weeks, as well as add some more levels that make use of the color-merging mechanic, but hope it might be helpful or serve as impetus for more folks to try something similar :)

[^1]: Usually I jam in Pico-8, so this also served as a Lua exercise :)

[^2]: Though that lift did, of course, only become necessary a few hours before the end of the jam when I first ran out of data space in bank zero T^T

Files

Super Sokonyan.smc 128 kB
16 days ago

Get Super Sokonyan

Leave a comment

Log in with itch.io to leave a comment.