Return of the Beast

Stewart Orchard

If you were at the 2018 Cambridge Dragon Meetup, then you would have experienced a number of cool things, including seeing some ultra-rare Dragon prototypes, getting your copy of 'Inside the Dragon' signed by one of the original authors, and having a random conversation concerning pickled fish with a Spanish guy who lives in Sweden.

You would also have been lucky enough to score an olde style cassette tape edition of Ciaran Anscomb's Dunjunz, and got the author to scribble on it, though I was too chicken to ask after an earlier name/face mix up, and had to get someone else to ask for me... But I have to say, everything about this game including the tape and inlay has been produced to a beautifully high standard. I believe some copies were still available at the time of writing so get 'em while you can. (Some info here)

Another great benefit of attending a show is that it renews your enthusiasm for your own projects. I'd lost focus on my game and started messing around with a couple of side projects to the extent that the trail had gone cold and I didn't really know what to do next.

So while I was intently listening to Steve Bamford's explanation of how the sprite engine and behavioural scripting works in his awesome Kiruke no Shima project, I started to remember I still had a lot of work to do to get my own sprite engine fit for purpose.

The Smell of the Beast

The first thing to do then was to get reacquainted with my own code. Now, as soon as I fired up the editor I was immediately hit by the unpleasant smell of "get it working now, tidy up later". Except without the "tidy up later" part. And I wasn't entirely sure about the "working" either. Yes, there was a sprite engine, but it didn't really have any organising principles, making it hard to work with, which in turn was stalling development.

As things stood, a sprite was a fairly passive object that relied on external code to make interesting things happen, such as moving in something other than a straight line. What I really wanted was for the sprite to be in charge of its own behaviour from initialisation to destruction. I wanted the game engine to trigger the creation of sprites and the sprites themselves should look after the details.

I made a fairly simple change to start with. Instead of passing the sprite directly to the drawing routine, each sprite now contains a vector pointing to code that is responsible for updating the sprite. This vector could simply point to the drawing routine, or it could point to a routine that implements more complex behaviour. When it's done, the update routine can choose to jump to the drawing routine, or skip the drawing entirely and move on to the next sprite in the list.

So what use is a sprite that doesn't always get drawn? Very useful as it turns out: If each chunk of update code changes the update vector to point to a new chunk of code, we can chain together a sequence of actions and suddenly we have ourselves some cooperative multi-tasking, with each sprite effectively running in its own thread.

Uh oh, story time

This is certainly not a new idea and is pretty much a form of round-robin scheduling, which has been around forever. When I started in my current employment, they were still using 6800-based control systems to run large pieces of automated test equipment. The software would have had something like ten separate threads of execution to run the various subsystems in parallel with a main loop that boiled down to a list of instructions of the form JSR [subsystem_n_vector], for subsystems 1 to n.

The thread for one of these subsystems would be a set of short routines. When one routine had completed, it would store the address of the next routine in the vector and then RTS to yield control back to the main loop and allow the next subsystem to do some work. The next time the subsystem got called, it would jump to the new routine thanks to the updated vector. If the subsystem needed to wait for a condition, such as a sensor turning on, or a measurement becoming available, then it would just leave the vector unchanged until the condition was met. The same routine would get called repeatedly until it decided it was time to change the vector. The key point is that every subroutine yields control without hogging too many CPU cycles to allow other threads to get some attention

One of the strangest jobs I recall was for a client who wanted us to write the software for their machine in BASIC. And not just any BASIC, but VBA, inside an Excel workbook, for the sole reason that they had some in-house experience of VBA. Mad as a box of frogs. So I had to write a multi-threaded application in BASIC, and I did it in a vaguely similar way to the old 6800-based control programs. Each thread had a SELECT...CASE statement that chose a subroutine to call based on the value of a state variable, and the subroutines would update the state variables to control the sequence.

It worked surprisingly well. There was the illusion of several things happening at the same time while still having a responsive user interface, though we were somewhat amused when the same client came back later and asked us to help them demonstrate that the system followed GAMP guidelines. (Good Automated Manufacturing Practice). Well, using a properly documented language that conforms to international standards would have been a good start...

Back to the plot

But I digress. I mentioned that not drawing sprites was a useful thing. When not drawing a sprite, we can use that time to perform initialisation tasks, spreading the work over several video frames if necessary. Only when everything is ready do we start drawing the sprite. This means we have greater control over the amount of work that occurs during each video frame, reducing the chance of doing too much and missing the video sync. (aka slowdown and juddering)

This in turn unlocked something I'd been wanting to do for a while but wasn't sure how best to go about it: Pre-shifting the sprite graphics on demand.

Just to recap, the sprites are 12 x 12 pixels. As we're in a four colour mode, this means 3 bytes x 12 rows = 36 bytes per sprite, plus another 36 bytes for the mask. 72 bytes total.

That's fine if we want to draw the sprite at whole byte offsets, but I need pixel precision, meaning that the sprite image needs to be shifted to suit. As this is quite a slow operation, I preshift the data, so that we have the same sprite image repeated in memory four times, each with a different degree of shift.

That takes up a lot of memory. The shifted images are now four bytes wide, so we have 4 bytes x 12 rows x 4 shifts = 192 bytes per sprite, plus the mask making 384 bytes total. This quickly adds up. For example, the end of level boss is built from four different sprites, and therefore requires 1536 bytes.

Late shift

All of this preshifting happens before the game loop starts, which is wasteful. Some sprite types don't appear on the screen together, and therefore don't need the preshifted graphics together in memory at the same time. It would be a more efficient use of memory if sprites could do their own preshifting just in time.

Now that I have this threading model, the task is much easier. I've broken down the preshifting task into chunks of work that each take no more time than it takes to draw a sprite. When certain types of sprite are triggered, they first preshift their graphics data into a common area, then perform any other necessary initialisation, then start running the normal update code, finally appearing on screen. This occurs over several video frames but is still a fast process in human terms.

As I write this, it suddenly occurs to me that the source data for the masks only requires one bit per pixel and could be represented by 18 bytes instead of the current 36 bytes. The preshift routines would just need an extra step to perform the decompression. I could maybe even run length encode the source data. Now there's a thought...

Pythonic png to packed pixel processing

Since moving from Windows to Linux, I had put development of graphics on the back burner, as I had become too reliant on Windows software, and the front burner already had some peas and noodles on the go.

In particular I was using Tile Studio to draw the sprites. This has a neat output scripting feature that generates source files that can be directly included in the game build, but working in Linux I was a bit stuck. I could draw stuff in Gimp, but lacked a way of getting the data into the game.

I ended up creating a conversion script in Python. This takes a source image in the form of a png file and converts it to the packed pixel format required by the game. It also generates masks from any colours that have out of range indices. In the images below, cyan and black would be interpreted as mask rather than image:

The conversion program can slice the image up into tiles, so we can create a whole sheet of sprites in Gimp and then have the conversion program extract the individual sprites. This works nicely in conjunction with a Gimp plugin that converts layers into sprite sheets.

I've still got a lot of learning to do in Python, but I was very impressed with the language features that make this kind of task easier than you might at first imagine. For example, I was able to perform the tile slicing without any calculation or array indexing. Instead, the image data is fed through a series of iterators that return progressively smaller pieces of image, with a simple matrix transposition (matrix_t = zip(*matrix)) in the right place to put the data in the right sequence for output. It might be useful for other projects so I'll publish it once I've made it a bit more user-friendly.

There are definitely fewer excuses available for not making progress in the game...

Stewart Orchard

To make things more interesting, I was going to attempt something special for this post, and write using the third person once removed. However, when I spoke to the blog author about it, he said that Stew doesn't like the idea; that at best it would only be "vaguely amusing" (his words apparently), and then wear thin with "tedious rapidity". Oh well, I'll just resort to those twin pillars of factual reporting: Exaggeration and Lies.

Bee in a jam jar

Here's an annoying thing: Programming for joystick control in a Dragon/CoCo game. It's annoying because some of the audio hardware is shared with the joystick hardware, requiring care to get the two working together.

It's possible to demonstrate the problem in Dragon BASIC by enabling audio before reading joysticks:

   10 AUDIOON
   20 PRINT JOYSTK(0);
   30 GOTO20

That makes a loud buzzing sound. The joystick read routine needs to mess with the DAC to figure out the joystick positions, and modifies the sound source select bits in order to select the joystick axis. If audio is enabled, you can hear all this happening. (I should mention that the above code snippet only has an audible effect on Dragons and possibly older CoCos. Newer CoCos appear to be smart enough to disable audio on joystick read.)

So the main thing is to make sure audio is disabled while reading the joystick. It is also important to restore the DAC to its original value and to reselect the DAC before enabling audio again. This is the list of steps:

Save the current DAC value
Disable audio
Read joystick
Ensure DAC is selected
Restore DAC value
Enable audio

Some effort can be saved by reading the right-hand joystick x-axis last, as selecting this axis also selects the DAC. (Assuming you want the DAC selected of course. If you were instead using cartridge audio, then it would be more efficient to read the left-hand x-axis last)

It is helpful to make the joystick read routine as fast as possible, to minimise the time that the audio is off. My routine is relatively fast as I'm not interested in converting a precise joystick position, just left-centre-right on the x-axis and up-centre-down on the y-axis.

Without going into too much detail, this is done with four writes to the DAC at $FF20 and four reads from the comparator bit in $FF00. One pair of writes for the y-axis and the other pair for the x-axis. After each DAC write the comparator bit is shifted into the B register, resulting in a number representing the joystick position that can be further processed when the audio has been turned back on. I use a lookup table to convert the value into an angle, but it could just as easily be a jump table.

I've had that in place for quite some time and had been happy with it until fairly recently when I recommended the same method to someone else and it didn't work so well: They were getting a buzzing sound.

Umptioned again

In my mind I imagined a capacitor somewhere in the audio circuit that would hold the voltage while the audio was disabled. The assumption was that if the audio was off for a short enough time, then the voltage wouldn't change enough to make a noticeable click when the audio was enabled again. However, after a fair bit of digging, taking measurements and running circuit simulations it turns out that is only part of the story.

The first thing I discovered was that connecting different things into the audio output each gave very different results with my joystick code:

Audio via the RF modulator gave the cleanest sound
A composite monitor gave a buzz that decayed to silence in about a second on the tail end of each sound effect. Strangely, I hadn't noticed it before until I started listening for it and now it seems obvious.
A set of powered speakers gave a continuous buzzing sound

It's starting to look like there's a complicated interplay of factors, but it's surprisingly simple to understand once you learn what is happening. I hinted that capacitors are only part of the story. What was missing is the fact that a capacitor is connected to a circuit via some resistance and this introduces a property called a 'time constant'. This defines how long the circuit takes to reach a steady state after something changes, for example after a new value is written to the DAC.

Use dumb luck. It's super effective.

In the case of the RF modulator, there is a combination of capacitance and resistance giving a time constant such that it takes about 5 milliseconds for the circuit to settle down after the DAC is updated. If the audio is disabled during this time, then you get a click because you're interrupting the current flowing in the circuit and that causes a jump in voltage. The shorter the gap between DAC update and audio off, the louder the click.

It just so happens that my joystick routine was so far away from the last DAC update that I couldn't detect any disturbance. Dumb luck in other words. When I get round to improving the sound engine, the DAC update will be getting closer and closer to the joystick read, so I may start hearing interference.

In the case of the composite monitor, this has a much larger audio input capacitance. This gives a time constant that means it takes a whole second for the circuit to stabilise. After each sound effect, the circuit is out of balance for approximately 25 game loops, each with a joystick read producing a click slightly quieter than the previous one.

The powered speakers are a bit different. There is no visible input capacitance on these. Instead there is a relatively low resistance to ground through the volume control. What this means is that the voltage on the audio output drops to a low level within microseconds of disabling audio. This results in a click on every joystick read giving a continuous buzzing.

Taking all scenarios into consideration, it would seem that the best defence against joystick buzzing is to leave audio disabled after reading the joystick, and to only enable sound at the start of a sound effect. But even then, clicking may still be audible, particularly for quiet or fading sound effects. If possible leave a few milliseconds after a DAC update before reading joysticks and that should at least keep things quiet as far as audio via the RF modulator is concerned.

Of course, this all results from investigation on a Dragon. Things may be a little different for the CoCo as the various models have custom ICs which may not be the same as the heap of discrete parts in the Dragon. Your kilometrage may vary.

And now for something completely similar

Previously I discussed my initial attempts at producing music for the SN76489, as used in the CoCo Game Master Cartridge. In particular I speculated about modulating the attenuation registers as a way of superimposing lower frequency tones that are below the sound chip's normal range.

To test that idea I would need some real hardware. I ended up making a small sound card that could be piggy-backed onto another cartridge and plugged into my Dragon. In the process I found out just how bad my eyesight had become and wished that I had the luxury of magnifiers and bright lights that I'm used to using at work. (And that I hadn't stupidly left my reading glasses at work)

Fortunately, I still ended up with a working card without too much trouble. The SN76489 audio output appears to be highly prone to unwanted oscillation, making it hard to see the audio waveform on an oscilloscope, so I added a high frequency bypass to ground on the audio output pin. The bypass consists of 100nF + 10R in series, values suggested on the datasheet for a similar part, the SN76496.

I don't always make my own cartridges, but when I do I attach them to other cartridges with tiny bits of wire

Some quick tests proved that the modulation idea worked in principle, so I extended the music player to modulate two of the SN76489 tone channels with software generated waveforms. These have a programmable pitch offset from the hardware tone and also have variable duty cycle. This results in the following sound:

It's not quite the same kind of sound as the variable duty pulse waves that I was playing with before, largely because the phase relationship between the soft and hard waveforms jumps every video frame while the next sound fragment is set up, but it does add a great edge to the sound. Definitely a keeper :)

Stewart Orchard

Ooh shiny

Well if there was any kind of schedule, I'm definitely behind it. Interesting things keep appearing and distracting me. Like that thing over there. Back in a minute...

The latest shiny is that development snapshots of xroar now have Game Master Cartridge support. This is exciting stuff: I've been thinking about the GMC for some time as a vehicle for a 'deluxe' version, and this was just the excuse I needed to have a play. The banked ROM would allow for a greater variety of sprites and a larger background map, plus the SN76489 sound chip could give more options for in-game sound. More on that in a bit.

Lost in music

I felt I was on a roll with the music, and it seems to be a hot topic at the moment, so I stuck with it and have now recorded my changes to CyD as a fork on GitHub.

I'd previously written about how I tweaked CyD to play variable duty cycle square waves and added simple support for drum samples. Since then I've improved the duty cycle control a little and added a couple more example tunes.

I originally had a bit of code like this to control the pulse duty over time:

lda duty
rate equ *+1
adda #0
cond bmi skip
sta duty
skip

The bmi instruction prevents the duty cycle being negative, so we could start with a low value and a positive rate to give a waveform that changes from narrow pulses to square, or go the other way by starting with a large positive value and have a negative rate.

As an added bonus, the bmi instruction can be modified into a brn to make the duty keep wrapping round and cycle endlessly. That sounds really good, but I wondered how much better it would be if I could have the duty ping-pong between two arbitrary limits.

The difficulty with that is there are then two conditions to check for, and which one we check depends on which direction the rate is heading. For quite a while I didn't bother because it sounded like too much complexity for the gain. I didn't want to add too much code because the time taken reduces the quality of the sound.

Then I thought of a way. Not a pretty way, but a way nonetheless. The following only adds one immediate instruction to the critical path. The other instructions only come into play briefly when one of the limits is reached:

lda duty

rate equ *+1

adda #0

cond1 equ *+1

cmpa #$e0

bls store

neg rate

cond2 equ *+1

ldd #$2024 ; bhs=$24 bls=$23

ldx cond1

stx cond2

std cond1

bra skip

store sta duty

skip

This exploits the fact that the limit value and the branch opcode are sitting right next to each other in memory and can both be written at the same time with a 16-bit store. Each time the limit is hit, the limit value and branch condition are updated and the sign of the rate is reversed. Ugly, but worth it: This gives much more variety to the sound.

Bend it like Beckham

To help with manually entered music I've added a macro to set up portamento effects. This is where one note transitions into another by smoothly sliding the frequency between two values and is a familiar element in many chiptunes. The way CyD does it is by adding a constant value to the frequency value each 'tick', or 1/50th of a second. So to slide from one note to another we need to calculate how much to add per tick by dividing the desired frequency change by the number of ticks over which it should happen. This could be done by the assembler if the frequency values are defined in the source code.

Among the CyD source files there was a nice Perl script that calculated the note frequency lookup table. I say 'was' because I've butchered this script most 'orridly (Perl just doesn't understand me) to enable the assembly source to calculate its own timings based on how the channels have been configured.

This script now also outputs the note frequency lookup values as symbols which in turn allowed me to add the helper macro to set up a portamento based on the start and end notes and the time interval. This makes it really easy to add guitar-like bends.

After much experimentation I've discovered that a snare drum sample can be lengthened and enhanced by switching to white noise near the end of the sample. The white noise then just continues until cut off by another sound. This sounds a bit odd on its own, but when mixed with other channels, gives the impression of a longer sample with reverb. I think it sounds pretty effective anyway:

All your bass are belong to us

As I've invested a lot of time in CyD and am familiar with the way it works, I hacked together a GMC version to get a feel for how things might sound. What quickly became apparent is that the lowest notes are not particularly low. It goes down to about a B2, which is nowhere near the lowest note on a guitar, let alone a bass.

One way to get lower notes is to use a lower clock frequency than the standard 4MHz. In fact Time Pilot '84, the arcade game that my game is at least trying to be loosely based on, runs three similar chips at half the NTSC subcarrier frequency, or approximately 1.79MHz. That would get you a mains-hum-like A1, the second lowest string on a standard bass, and a bit closer to what we like.

Another way is to make use of the 'periodic noise' capability. This is a function of the noise channel, but isn't really noise, just a low duty square wave with a frequency that is (I think) 1/15 of the usual frequency. It has a thin, buzzy kind of sound.

Yet another route to lower notes is to modulate the attenuation registers to superimpose lower frequencies onto the normal output. This requires CPU time so wouldn't be very suitable for in-game music, but might sound interesting enough to be good for title music, especially if the duty cycle varies over time. I'm very tempted to give it a go.

Play it again, GMC

I reworked my example tune above to play more nicely on the SN76489. I had to transpose it up 4 semitones and throw away the really low notes, which makes it sound completely different. I also ended up mashing in some other ideas just to hear what it would sound like and this is the unholy result:

The echo effect at the start is easily achieved by playing the same thing on two channels with a moderate delay between the two. In this case the delay is something like 300ms. When the notes bend it really comes alive. I've also snuck in some cheeky vibrato as well to spice things up a bit. This is just a rapid series of bends above and below the start note to make the frequency wobble around in a pleasing way.

The drum sounds are simply bursts of noise with the volume and frequency changing over time. The kick drum has a really rapid drop in frequency to make it sound like a thud while the snare has a short period of rapid frequency drop followed by a longer period with more gentle change. It works quite well but it leaves me wondering how good a result could be achieved with a more sophisticated shape to the frequency envelope.

I've put my GMC version of CyD on github, just in case anyone wants to have a play...

Stewart Orchard

I decided to take a long break from coding the game mechanics and instead have a closer look at generating some music, or "mooozik", as they say round these parts.

Because my living involves a certain amount of programming I don't need much in the way of persuasion to run as far away from coding as I can. On the other hand, playing around with music is much more appealing to me and I've been having a lot of fun over the last few weeks getting the Dragon to produce some audio that is - dare I say it? - musical.

The fruit of someone else's labour

I would like to have as much music in the game as possible: Theme music on the title screen, story music, get ready music, game over music, high score music etc. Whatever I can fit in to fill the silence. Maybe even in-game music if a sound chip is present. I'm going to aim high and see how far I get.

The biggest problem seemed to be the lack of a suitable music engine. When I started work on the game I wasn't aware of anything off the shelf that I could use and it looked like I was going to have to write something from scratch.

I learned quite a bit about music playback while working on a musical tape loader with Simon Jonassen and started to get a feel for the kind of things that might be possible. Then, as luck would have it, Ciaran Anscomb brought his marvellous CyD project to my attention.

To fill in a bit of background this is a software synth based on the CoCoSID project by Rémi Veilleux, which as the title suggests, is designed to play C64 SID tunes on the CoCo. It works by mixing three channels of synthesised waveforms and updating the synthesis parameters 50 times per second. The waveforms are synthesised from a selection of reference waveforms stored in memory. The playback parameters are determined by an offline conversion of a SID register dump obtained from a C64 music player into a pattern-based format that CoCoSID can read in real time. The results are really impressive.

What Ciaran has done with CyD is to produce a rewritten, optimised version with a new music processor allowing you to enter music data in a more memory-efficient programmatic form. It has a variety of features such as envelopes, portamento, subroutine calls, transposition and arpeggios - i.e. he's made it more like a music driver that back-in-the-day purveyors of fine tunery such as Martin Galway or Rob Hubbard might have used. "That looks handy," I remember thinking to myself, "I'm having that."

Some of you may have heard the title music from the 32K demo version of the game. This is being played using an unmodified version of CyD. It has a very distinctive and dynamic sound, far removed from the sounds that most of us Dragon owners are used to hearing. However, after the novelty wore off, I wondered if I could bend things a bit to add even more complexity to the sound.

Pushing the envelope

Generating a tone in software will very often be based around a phase accumulator implemented with something like the following piece of code:

phase equ *+1
      ldd #0
freq equ *+1
      addd #0
      std <phase

That generates a sawtooth waveform in the A register. Self-modifying code is used for performance, and direct page addressing helps too. Synthesising an arbitrary waveform is just an indexed instruction away:

      lda a,y  ; y points to a 256 byte waveform

The same lookup could be done by self-modifying the lower address byte of an extended address instruction with the upper byte pointing to the current waveform. In fact CyD does a bit of both to mix three channels together in an optimal way.

Envelopes are simulated by using separate versions of the same waveform differing only in volume level. Having just two or three volume levels works surprisingly well.

So what can we do that's different? Funnily enough, Ciaran came up with something new for his Dunjunz game, this time generating square waves directly. Instead of looking up a reference waveform stored in memory it converts the sawtooth in the A register into a square wave using something like this:

    lsla       ; put msb of A into B
    rorb       ;
    sex        ; extend sign of B into A
    anda #vol ; set volume

If A is greater than or equal to 128, it becomes set to #vol, else it gets set to zero. Nice. It's the sort of branchless conditional construct that you might see a compiler generate to avoid stalling a pipelined cpu and here it is making itself useful on the 6809.

That got me thinking about how cool it would be to be able to vary the duty cycle of the square wave i.e. generate rectangular pulses of varying width. That would make for some very interesting noises. It turns out that duty can be added for free by replacing the lsla instruction with an adda:

    adda #duty ; use duty to generate carry
    rorb       ; ...and set sign of B
    sex        ; extend sign of B into A
    anda #vol ; set volume

Varying the duty value from 0 to 255 will generate rectangular waveforms with corresponding duty ratio of 0 to nearly 100%. A duty value of 128 will generate a square wave as before.

To try it out I added a command to CyD to set the duty. To demonstrate what it sounds like, first here's a simple bass riff played using a square wave:

And now the same notes but with different duty cycles:

Things get more interesting when the duty is varied while a note is playing so I added a tiny bit of code to do just that:

Let's go mad and play two voices an octave apart with different duty settings:

Now you're talking. If you enjoy upsetting C64 fans, try calling this sound 'SID sound'. That riles them up something chronic.

Kicks like a concrete donkey

CyD gives us three channels to play with, which immediately makes me think of great power trios such as Clapton/Bruce/Baker, Lifeson/Lee/Peart and, err, Rod/Jane/Freddy. I'm thinking one channel for lead, another for bass and another for drums.

Drums can be synthesised from combinations of noise, square, triangle waveforms etc., and messing with the frequency. In fact I did this for the title music in the 32K demo, but the kick drum, while adding cool accents to the bass, did sound a bit feeble to me.

A kick drum is pretty much a noisy click followed immediately by a thud. The frequency is fairly low and drops with time with the volume swelling in the middle. The whole thing might last 50 - 300ms. Bearing this in mind I drew a 256 sample waveform freehand in Milky Tracker:

Crude, but effective

I boosted the volume somewhat, resulting in a bit of clipping, which is helpful here and makes the kick louder. This was then exported to an 8 bit wav file and converted using sox into a 256 byte raw binary file that could be included into the assembly source. To save you the pain of trying to decipher the sox manual, this is the command I used:

sox -D in.wav out.raw vol 0.33 dcshift -0.66

The -D option disables dithering. When dithering is enabled, it can make the waveform one bit too loud, potentially resulting in overflows and nasty clicking during playback.

There are still some hoops to jump through to get CyD to play the sample properly. First we need to reset the channel phase counter to the beginning of the waveform when we trigger the sample, otherwise it will play from some random point. This required a new sample playback command. We next need to make an envelope that is about as long as the sample, in this case 4/50ths of a second, and ending with a silent waveform to stop playback. Then the frequency needs to be fine tuned so that the sample ends when the envelope ends. Otherwise the sample will either be cut short, or it will start again from the beginning. I manually adjusted the first note in the frequency table to suit the sample, though I think a frequency set command will be required to allow a variety of samples to be played.

This sounds like the sort of fiddly exercise that should be automated in software, so that's yet another thing to add to the ever-growing todo list. Anyway, this is what a 256 byte kick drum sample sounds like on a Dragon:

Music puns are not my forte

To actually compose music I play around with ideas in Milky Tracker and then hand convert to CyD. (Yeah, yeah, I know I should write a conversion script.) I've also been playing with MikroWave on my phone. It's a bit of a toy but you can throw together new loops and rhythms really quickly.

Time for a quick demo tune. Now it's fair to warn you that I quite like droning repetitive music. This allows me to listen to Moby without having to gnaw off my own arm to escape, though the urge is never far away. Like all the other recordings on this page this was captured from xroar. Roll credits...

Stewart Orchard

Now there's a title: A bold running start, a title fit for an epic post. A title full of optimism and hope. A title spoiled only by the quality of what follows. You see, I've spent much of the last few weeks engaged in one form of development related navel gazing or another. Please allow me to explain...

Filthy OSes

I do all my hobby stuff on an ageing Tecra laptop. It's now over 10 years old, but I've no reason to change it as it works fine and has respectable performance. The only problem is Windows XP, which is way past its end of support. It ran off the top of that particular cliff three years ago, and very much like Wile E. Coyote, is kind of just hanging there waiting for gravity to notice. Updates for various bits of software are drying up, it's increasingly likely I have to search for old versions of drivers, and the nagware anti-virus software is a big resource hog. We hates it.

I briefly looked into upgrading to Windows 7, but Mister "None Shall Pass" Upgrade Advisor Wizard flagged up a bunch of issues with the hardware. Scratch that idea: Time to look at a free OS.

I've played with a Linux distro before. I must have been feeling particularly wealthy circa year 2000, because I walked into a PC shop and bought a boxed SuSE Linux 6.2 or 6.3 to see what all the fuss was about. I've still got the box in the roof somewhere. Anyway, I installed it on a 486 PC and it worked well enough, but I didn't really do anything with it due to lack of software that could work with the stuff I'd been doing on Windows. I lost interest and stopped using it. That was a mistake with a capital M. I've basically missed out on over 15 years of useful education.

Like I GNU you would

I guess I've got a bit of catching up to do then. Now the thing that rapidly becomes apparent is the amount of choice available for a free OS: There's a bewildering selection of distributions, kernels, and desktop managers to choose from. If your worst nightmare is based around having to make a decision then this might not be your night. To help me decide I came up with the following sketchy criteria: 'Popular', 'Lightweight' and 'Googled evidence that it works on an old Tecra'. To cut a long story short I eventually settled on Debian/Linux with an Xfce desktop.

It almost worked first time: I had to download a legacy driver to get the wifi working and the trackpad settings needed a tweak but I was very impressed at how smoothly it all went. So now game development is proceeding on *nix. If I can stop playing 2048, that is...

So, Anyway

In spite of all that I did manage get some work done on the game.

In Time Pilot '84, a boss appears after a certain number of enemy craft have been destroyed. The boss chases after the player while continuously firing homing missiles. To defeat the boss and proceed to the next level, the player must wait to acquire a missile lock and then successfully fire a missile while avoiding collision with the boss or its missiles.

The boss is quite large compared to the player, and rather than write a special large sprite routine, I chose to construct it out of four standard sprites effectively flying in tight formation.

I see five sprites

To get the boss to chase the player we need to figure out how to determine the direction of the player relative to the boss. Those who know their vectors will know that subtracting the boss coords from the player coords will give the precise direction of the player, though it would be hard to make use of it as we would need to convert that direction into a unit vector which means square roots and stuff.

More useful would be a function that returned an angle. That would allow us to steer the boss towards the player. One way to get an angle from vector components would be to use the arctangent function, but that's no better than having to calculate square roots, which leaves us wondering if there is some approximation we can use.

Mmm pie

It turns out there is something relatively simple we can do. If you imagine a cherry pie centred on the boss, and that pie is divided into four delicious slices, aka quadrants, then it's easy to figure out which quadrant the player is in just by looking at whether the difference in coord components is positive or negative. Even better, it's possible to refine it more and divide the area into eight octants by comparing the sizes of the x and y components. (And a thinner pie slice means you can have a bigger blob of double cream. Ooh you would)

The following piece of code returns an octant number in the range 0-7 based on the difference in coords. It works by progressively rotating the coords until they lie in the first quadrant then finally comparing the x and y components to figure out which octant:

      clrb       ;
      lda dy       ;
      bpl ypos     ; dy >= 0
      neg dx       ; rotate 180 cw
        neg dy       ;
      addb #4    ;
ypos lda dx       ;
      bpl xpos     ; dx >= 0
        nega         ; rotate 90 cw
      ldx dy       ; don't care about low byte
        stx dx       ; just need to swap dx & dy
      sta dy       ;
        lda dx       ;
      addb #2    ;
xpos    cmpa dy    ;
      bhs done     ; dx >= dy
      addb #1    ;
done ;

The coords have previously been scaled and subtracted and stored as 8 bit values in dx & dy. In case you're wondering what the strange business with ldx/stx is about, it's because at that point I need to keep the contents of both the A and B registers and rather than save and restore one of them, I used the X register and saved a few bytes and cycles instead. stx dx overwrites dy, but it doesn't matter because we then write to dy in the next instruction.

The number returned in B represents the direction in which we want the boss to move, and that can be used to look up a velocity from a table. Now, rather than suddenly change to the new direction like Automan's car, I increment or decrement the current direction once per update so that the boss steers more gently towards the player. But even that is too aggressive: The boss homes in on the player far too efficiently and there is no way to avoid the collision. (It does work great for homing missiles however.)

That left me wondering how the original arcade game worked as I remember it being pretty easy to avoid a collision. I tried slowing the boss down as it approached the player and slowing down the update rate but nothing seemed to give a satisfactory result.

In the end, I fed a fraction of the player steering input into the homing routine. What now happens is the boss approaches the player while the player is flying in a straight line, but if the player turns, the boss steers behind the player. Rather surprisingly this ad hoc solution feels more true to the original.

Out with a bang

For a while I wasn't sure what should happen when the player defeats the boss. I didn't think I would be able to pull off a convincing explosion, so I decided to cheat instead. When the player successfully hits the boss, a mode change occurs and the boss starts running away from the player while pouring out flames, which are just re-used enemy explosion graphics. Here's how it looks so far:

The battle is a bit one-sided at the moment. The boss still needs to fire missiles at the player, and the player should have to sweat it out for a while before getting a missile lock, but it's a good start.

I'm in two minds what to do next: I could finish the boss battle or I could try out some musical ideas while they're still fresh in my mind. Also December is almost upon us. So much to do, so little time...

Stewart Orchard

Sounds like a pain?

I was nervous about the sound effects, having found it in the past to be hard work to get good results. The Dragon and CoCo machines have a 6 bit DAC that is capable of creating some fantastic sounds, but in a rather cruel twist, things are not easy if you want sound at the same time as animation.

The problem is that sound generation requires a fairly regular rate of writes to the DAC, whereas updating graphics often involves a variety of routines of different lengths. The two things don't mesh together very well, so what usually happens is that the sound ends up being produced in short bursts in between screen updates, creating a distinctive thin and choppy kind of sound that can be heard in many Dragon and CoCo games.

Trip down memory lane

In my first game, Balldozer, there wasn't much work to do to keep the graphics updated: Just a ball, a bat and a power-up, along with the occasional brick that needed erasing. That allowed me to have relatively long bursts of sound in between the updates. Low frequencies with decaying volume were generated when the ball bounced off the bat or the sides of the screen, giving a 'boing' sound, and higher frequencies were generated on hitting the bricks, giving a 'ching' sound. This worked out pretty well:

For my second game, ROTABB, I didn't put any sound in until the last minute and it really shows. Because the sound in Balldozer worked out OK, it didn't occur to me that I would have a problem. What I discovered too late was there were no spare cycles left over, and the best sound I could achieve during normal play were some feeble explosions for destroyed enemies. The only way I could find to generate a reasonable sound effect was by having longer bursts of sound and letting the frame rate drop. So to avoid messing up the gameplay, I only did this for destroying the level boss and for the player death, where the lower frame rate gives a nice slow-motion effect. The lesson here is to allow for sound early on in the design:

For the abandoned third game, I had started experimenting with producing sound and graphics at the same time i.e. embedding the DAC update code within the graphics update code. This game spent a lot of time drawing background tiles, so I inserted a DAC write for every second tile drawn. The values to write to the DAC were read from tables, with the volume of the sound being set by a MUL instruction. This gave a much more full and satisfying sound:

Back to the Future

For the new game I would like the sounds to have something of an arcade feel. That generally means lots of noise going on, with a variety of sounds that are rich in low to mid-range frequencies. I felt that the sound in the third game was a step in the right direction, so I decided to build on those ideas:

Embed more DAC updates in the game code
Make the DAC updates more efficient by not using MUL for volume
Instead of a set of fixed wave tables, have a sound buffer that is updated by the sound engine at the start of each game loop
Sound engine that generates sounds from parameters
Choose a sound effect for playback by pointing to a parameter entry in a table

This list is saying I need a sound engine that is much more sophisticated than anything I've done before. Fortunately the task can be broken down into more manageable pieces. First of all I need a small piece of code that takes one sample from the buffer, writes it to the DAC and updates the buffer pointer using a minimum of registers so that it can be embedded in existing code without causing too much upheaval:

    lda [sndbufptr] ; Get next sample
    sta $ff20       ; Write it to the DAC
    inc sndbufptr+1 ; Update buffer pointer

That takes 20 cycles. It can be reduced slightly by using self-modifying code, but that's only worth doing in a loop with a fair number of iterations because of the overhead of copying the buffer pointer at the start and end. Note that I'm only incrementing the bottom byte of the buffer pointer. This is fine providing the buffer is completely contained within one 256 byte page.

So how many DAC updates do we need to embed in the game loop? Say we wanted an update rate of 1KHz. Assuming normal CPU speed and a frame rate of 25fps, that means an update approximately every 895 cycles, or 40 updates per game loop.

Those updates need to be sprinkled as evenly as possible throughout the code. This is not an easy thing to do, but it doesn't need to be perfect and it might be easier to change the update rate to suit the code. For example I've put a DAC update inside the loop that copies 128 byte blocks of background to the screen. That performs 21 updates spaced at 523 cycle intervals, forcing me to aim for a higher update rate of 1.7KHz

In the tile drawing routines, the loops are slightly too short, so the DAC update is preceded by a flag toggle to perform the update every second loop.

If a loop is really short, then it might be necessary to break the loop up into two or more chunks by having an inner loop doing the work and an outer loop containing the DAC updates. It's a similar story for the loop that waits for the video frame sync. I have an inner loop checking for the sync flag in the PIA, and the outer loop performs the DAC updates at the target rate.

Where's that noise coming from?

Next we need to generate some data to put into the sound buffer. To keep things simple I decided to have a single sound channel. This means only one sound effect plays at a time, and a priority system is required to prevent a less important sound from interrupting a more important sound.

There are two types of sound commonly generated by sound chips, namely square wave and noise, so we may as well start with those. The following code writes a square wave* to the buffer with adjustable frequency and volume:

snd_square
      ldx #soundbuf
      ldb snd_phase
    lda snd_vol
ssqlp sta ,x+
     addb snd_freq
    bcc ssq1
      eora snd_vol
ssq1 cmpx #soundbuf_end
      blo ssqlp
      bra donesound

The frequency and volume are values in the range 0-255. The higher they are, the higher the frequency and volume respectively. snd_phase stores the value of the B register between successive calls to the routine, so that the square wave continues where it left off on the previous game loop. (Though the placement of DAC updates would have to be very accurate for this to make much difference, and if I did get that accurate, I would have to save the volume of the square wave between calls as well)

The noise routine is a little more complex. It reads values from a table of random numbers, sets the volume by ANDing with a mask and then outputs the values to the buffer. The values are repeated in a similar way to the square waves to give some control over the frequency content of the sound. I've used self-modifying code so that the routine is not too slow compared to the square wave routine:

snd_noise
      ldx #soundbuf
      ldu #rndtable
      ldd snd_vol   ; snd_vol & snd_freq
     sta sns2+1 ; volume
     stb sns3+1 ; frequency
     ldb snd_phase
     anda ,u+      ; first sample
snslp sta ,x+
sns3 addb #0       ; frequency
     bcc sns1
      lda ,u+       ; get next random value
sns2 anda #0       ; set volume
sns1 cmpx #soundbuf_end
      blo snslp
      bra donesound

The random number table is a 64 byte table of pseudo-random values, refreshed at a rate of one new byte per game loop such that the table contents are continuously changing.

Now we need a table to define the different sound effects. Each table entry defines the type of sound, how long it lasts and how it changes over time. That suggests these parameters:

Sound duration
Start volume
Change in volume per game loop
Start frequency
Change in frequency per game loop
Address of sound generator (square or noise)

The good news is those parameters fit into seven bytes, meaning we can pack a lot of different sound effects into a small space.

When a sound effect starts, the volume and frequency are copied to variables for the sound generator to use. Then for each game loop, the sound generator is called and then the volume and frequency have the 'change' parameters added to them. The duration is used to initialise a counter that is decremented once per game loop to set the length of the sound.

The sound priority is defined very simply by the address of the parameters. Sounds nearer to the start of the table have lower priorities than sounds nearer to the end of the table. When a sound effect request is made, the address of the new sound is compared to the address of the current sound. If the new address is lower, then the sound is not started, otherwise the new sound will replace the existing sound.

If the current sound has address zero, then it means no sound is playing, which happens when the sound duration counter has expired, for example. Having the lowest possible address also means the silent state has the lowest priority, ensuring the next sound request will be granted.

Put it all together and what do you get?

I've implemented the ideas discussed but haven't to date put a lot of effort into spreading out the DAC updates evenly. There are about 44 DAC writes per loop, some bunched up, some spread out and a big gap when the sprites are being drawn. Even so, the result is surprisingly good: (Please excuse the jumpy video. The game runs much more smoothly than this)

Back to the Future II

At some point I would like to add more parameters to make the sound effects more complex, such as periodic frequency variation (vibrato) and variable duty cycle for the square waves. I also need to do a better job of spreading out the DAC updates plus there is the issue of supporting both PAL and NTSC machines, as the different frame rates mean different sizes of sound buffer. But on the whole I am extremely pleased with the new sound engine. I just wish I had thought of it back in the day :)

As a final thought it occurred to me that the same sound engine could be used to drive a variety of different sound hardware (such as sound carts) just by replacing the lowest levels of the driver. Definitely worth considering for a future 'deluxe' version.

I'd better get on and write some more code before I run out of things to write about...

*Caution: Under no circumstances attempt to generate square waves in real life. The infinite accelerations and energy densities required would rapidly bring on the end of the universe and possibly void the warranty on your speakers.

Stewart Orchard

Ooh it's a good one this time: They're doing collision detection.

Call me old fashioned...

...but I like stuff to blow up when I shoot at it. There are some enemy sprites flying around, some player bullets headed in their general direction and I'd like to know when a bullet hits a sprite so I can make it asplode.

One way is by comparing the coordinates. If you imagine subtracting the coords of one object from another, then the result will say something about how close together those objects are. Small numbers = close together, big numbers = far apart.

In fact we can tell if two rectangular regions are overlapping by subtracting the coordinates and comparing the resulting numbers against limits:

ldd XORD,y
subd XORD,u
cmpd #high_limit_x
bgt miss
cmpd #low_limit_x
blt miss
ldd YORD,y
subd YORD,u
cmpd #high_limit_y
bgt miss
cmpd #low_limit_y
blt miss
; Hit. You sunk my battleship.

That takes about 56 cycles. It can be reduced to 52 cycles by noting that each cmpd # can be replaced with a faster subd # and modifying the limits. (Also handy to know is that cmp #value can be replaced with add #-value, which allows you, for example, to compare against positive and negative versions of the same variable)

More cycles than Halfords

Now, here's the thing: We need to check each object against every other object with which we wish to detect a collision.

Taking the case of eight player bullets and eight enemy sprites, that makes 8 x 8 = 64 collision checks. At 52 cycles per check that amounts to well over 3ms. Ouch. That's 10% of the total time budget, or a very large chunk of the time remaining after the background and sprites have been drawn. And I still have to detect collisions with the player and implement sound.

I did some research on optimising collision detection, using space partitioning methods such as quadtrees, but soon began to realise that these techniques are better suited for more powerful systems dealing with larger numbers of objects. Not only that, the worst case scenario would still require the maximum number of checks, resulting in dropped frames and slow downs. Suddenly the performance of my game wasn't looking so good. Reality had invited itself round for dinner, like some kind of unwelcome freeloading walrus, and there was nothing I could do about it, except perhaps to hide in a cupboard until the problem went away.

But then I started to think about how collision detection was often done in games written in BASIC. For example in text mode games, before an object is drawn, the destination address can be checked to see what's already there. Or in graphics modes, PPOINT was often used to check the colour of a target pixel. Both amount to the same thing: Using the data in the screen buffer to detect collisions.

One of these pixels is not like the others

So the new strategy could work like this: The routine that draws the bullets records data for each bullet location. Then, after each sprite is drawn, the data is compared against the new screen contents. If something is different then we know that the sprite just drawn has been hit by the bullet just checked.

Because the bullets are only one pixel in size, we just need to record one pixel of information for each. The information is comprised of address, mask and pixel value. The address tells us where in the buffer the bullet is drawn, the mask selects the pixel of interest (out of the four pixels making up a byte), and the pixel value is the bullet colour ANDed with the mask, giving us something we can use to detect changes.

After each sprite is drawn, it will need to check for all eight bullets, using something like the following bit of code repeated eight times:

lda addr1 ; grab data from screen
anda mask1 ; select pixel of interest

cmpa pix1 ; compare with bullet colour

bne hit ; collision if not equal

As this code fragment could be run 64 times, every single cycle matters, so I've opted to use self-modifying code, with the bullet update routine storing the data directly into the collision detect code:

addr1 lda >0 ; 5 cycles

mask1 anda #0 ; 2

pix1 cmpa #0 ; 2

bne hit ; 3

We're now down to 12 cycles per check, which when added to the additional code in the bullet update routine, takes less than a third of the time taken by the coord comparison method. Much better!

Now I should mention that this method does have a flaw: If the sprite pixel is the same colour as the bullet pixel then the collision won't be detected. It's not as bad as it sounds, because there is a good chance that a missed collision will be picked up on the next frame when the bullet moves to a different part of the sprite. Consider also that the coord comparison method is not perfect either, and requires the collision region to be a rectangular approximation of the true shape of the sprite image. On balance I think the faster method easily wins.

Incoming

That leaves us with detecting collisions with the player. The coord comparison method could be viable here, because we only need one check per enemy sprite. The downside is that the accuracy will be a compromise. To avoid unfair situations, the bounding rectangles will have to be completely contained within the sprite images, which could mean large parts of sprites overlapping without a collision being detected. Instead, a more accurate screen buffer method could be used:

First draw the player
Then draw all the sprites that are dangerous to the player
Then check the screen buffer to see if anything has been drawn on top of the player

To determine if anything has been drawn on top of the player, we can use the player mask data to extract a player-shaped region of the screen. This should be identical to the player image, otherwise something must have been drawn on top. The following bit of code does just that:

loop pulu d     ; get two mask bytes
     coma       ; invert mask bytes
     comb       ;
      anda ,x    ; collect data from mask-
      andb 1,x   ; -shaped region of screen
      subd ofs,u ; compare with image data
      bne hit    ; difference means collision
      pulu d     ; get next two mask bytes
      coma       ; and do same again
      comb       ;
      anda 2,x   ;
      andb 3,x   ;
      subd ofs,u ;
      bne hit    ;
      leax 64,x ; check every other row
      leau 4,u   ;
      dec count ;
      bne loop   ; repeat until done

u is pointing to the player mask and image data, with ofs being the size of the mask data minus two, to allow for pulu d adding two. x is pointing to the screen buffer. Note that I'm testing every other row to save time, the assumption being that this is sufficient to detect a collision with a large object. If I needed to detect collisions with single pixels then I may have to check every row.

This routine takes about 500 cycles to check six rows of player sprite, which is actually a little slower than comparing coords, but has the advantage of greater accuracy. It could be accelerated by using a compiled sprite technique, unrolling and converting instructions to immediate mode, but it would consume a lot of memory, requiring one routine for each of the 16 player directions. Worth keeping in mind for a high-performance 64K version.

Another advantage over coord comparisons is that the run time is independent of the number of objects. I could decide to add a load of enemy bullets for example and the player collision check would take the same amount of time.

The downside is that we don't know which object collided with the player. This is OK for my purposes, but might not be enough for some games. One way to work out which object was in collision would be additional code after the screen buffer test to determine which object was nearest to the player. (Though be careful of situations where the nearest object may not actually be the one colliding with the player)

He's making a list

These collision detection methods require the sprites to be handled in different ways. There are collidable sprites, such as enemies, that need to be checked for collisions, and there are non-collidable sprites, such as explosions, which skip the checks. These are allocated from a pool of eight free sprites.

And checking it twice

I have implemented three linked lists, to keep track of the collidable, non-collidable and free sprites. The non-collidable sprites are drawn before the player and skip the player bullet checks. The collidable sprites are drawn after the player and before the player collision check and have the bullet checks enabled. (Drawing the non-collidable sprites before the player avoids drawing the same sprite twice in one frame as would happen when an enemy sprite is turned into an explosion.)

Oh crap it's almost Christmas

I'm using one-way linked lists to keep the overhead down (each list item having a pointer to the next list item), and this is working OK, but I'm not sure if that will be the solution used in the final version of the game. Moving an item from the collidable to the non-collidable list takes six load/stores to modify the pointers leaving me wondering if there's a better way. Perhaps a job for another time...

Thursday 2 August 2018

Picking up the Thread