Friday 24 March 2017

Tiles

At the moment I have a means of scrolling but nothing to actually scroll, other than some colourful stripes. Now I do like playing with pretty patterns, probably a little bit more than would be considered 'normal', but that's not getting the job done.

I mentioned tiles before as a way of drawing a background but didn't go into any detail. What are tiles and why do we need them?

Insert groutuitous tile pun here


A tile is simply a small graphic image, that can be drawn on screen with other tiles to build up a larger image. There's a brief discussion on Wikipedia here.

The advantage of using tiles is that it can dramatically reduce the memory requirement of a large image such as a background. This works best for images that contain many repeated elements.

To build the image, we need a map that specifies which tiles go where. This could be a simple two-dimensional array of numbers, with each location in the array corresponding to a location in the image, and the content of each location in the array specifying a particular tile from the tileset.

How big should the tiles be? There is a trade-off to be made between image size and memory use. What I'm trying to achieve is a background that's something like the one from Time Pilot '84, big enough to look interesting, and without eating up too much memory.

The Time Pilot '84 background is pretty big, 2048 x 2048 pixels judging by images I've found online. At an arcade speed of 60fps, it takes over 30 seconds to scroll all the way across, one pixel at a time. If the tiles were 8 x 8 pixels then the tile map would need to be 64K in size. 16 x 16 pixel tiles would bring that down to 16K, still too big as I would like to fit at least a demo into a 32K machine.

How much background could I fit into, say, 4K?

An 8 x 8 tile requires 16 bytes in PMODE1. We need a good selection of tiles to make an interesting image, so let's say 64 tiles for now. That would require 64 x 8 = 1K, leaving 3K for the tilemap, which lets us have an array of 64 x 48 bytes.

So the final background image would be 8 x 64 = 512 pixels wide and 8 x 48 = 384 pixels high. At the intended speed of 25fps, it would take about 20 seconds to scroll all the way across. That sounds like a reasonable size so I will go with that. If later in the project I find I have more memory available I will have the option of making the map bigger.

For initial testing, I generated the tileset and tilemap by hand, just by typing in fcb directives containing very simple patterns. Clearly this is not going to be very practical for anything complicated, so I would need to get hold of some tile editing software.

I looked around, not really sure what I was looking for. Eventually I settled on Tile Studio by Mike Wiering. For me, the killer feature is the scriptable output, allowing auto-generation of source files for any language. The visuals are a bit on the small size, making it less suitable for designing small tiles on HD displays, but other than that, I find it gets the job done and doesn't get in the way of what I want to do.

Of course, it doesn't matter what tools you use as long as you like using them. The best tools are the ones you know how to use. (And the ones you borrowed from work. They're the best ones as well.)

My artistic skills are somewhat limited but I was able to create a nice-looking background fairly quickly:

Cool. But what is it?


I've used around 90 different tiles, meaning I've overshot the budget a little, using up about 4.5K in total. Still pretty impressive though, considering that the uncompressed image would take up 48K of memory.

As the map is intended to wrap around both horizontally and vertically, the features at the edges have to line up with each other, so I've saved myself a bit of work by placing water across the left & right edges.

One problem with tiled landscapes is trying to avoid repeating patterns. For example if I had used the same 'dirt' tile throughout the cratered landscape, it wouldn't look very convincing as the eye easily picks out repeating features. The trick here is to have five or six different dirt tiles and randomly fill the area with them. That breaks up the repetition enough to hide any patterns. Tile Studio has a random fill function for this very purpose and I would imagine other tile editors have it too.

I should probably confess that I didn't draw the craters from scratch. These are simply colour-reduced-messed-about-with versions of something I liked the look of, just to get a quick result. It could all change when I get to spend more time on it.

Tile be back


OK, we now have several boxes of squeaky new tiles piled up in the hallway and there's just the small matter of sticking those bad boys to the wall. You will need adhesive, a spreader, tile cutter and beer. Three of those are metaphors...

Saturday 18 March 2017

Copy that: The conclusion

If it aint broke, fix it anyway


Sometimes I just can't leave a piece of code alone. I'm thinking there must always be another optimisation just waiting to be found. Quite often I wouldn't find anything new, but on this occasion I would find one of my favourite optimisations ever.

The copy routine that I had developed so far transferred any number of bytes by splitting the task into two parts: Copy big chunks for speed, then copy byte by byte to hit the target count. It was pretty fast, about 10% slower than a hypothetical unrolled version but I wondered if I could get a bit more out of it. If I could grab a few hundred more cycles then that would buy time for an extra sprite or better sound effects.

My attention turned to the byte by byte copy because it was so slow. If it could be sped up enough then the optimal chunk size could be increased and it would go faster still. The code looks like this:

loop
  lda ,u+  ; 6 cycles
  sta ,s+  ; 6
  decb     ; 2
  bne loop ; 3

Copying 32 bytes with this code takes 32*17 = 544 cycles. That's a long time in a game. The 6809 has 16-bit loads and stores, so why not take advantage? The problem is we need an even number of bytes to copy two bytes at a time. When the problem is stated like that, the solution almost suggests itself. The next version of the code deals with the odd byte first, then continues by copying two bytes at a time:

  lsrb       ; 2 cycles
  bcc stage2 ; 3
  lda ,u+    ; 6
  sta ,s+    ; 6
  tstb       ; 2
stage2
  beq done   ; 3
loop
  pulu x     ; 7
  stx ,s++   ; 8
  decb       ; 2
  bne loop   ; 3

It's not quite as easy to figure out the cycle times for this version because the path through the code depends on the number of bytes to be copied. Time for another spreadsheet:




The two-stage code is slower for zero or one byte copied, but after that it's faster, and much faster for higher byte counts. That's cool, I thought to myself, but there's always another optimisation...

It suddenly occurred to me that copying an odd number of words is pretty much the same scenario as copying an odd number of bytes. If that could be evened out, then the way is clear for copying four bytes at a time. I added a third stage to the code:

  lsrb       ; 2 cycles
  bcc stage2 ; 3
  lda ,u+    ; 6
  sta ,s+    ; 6
stage2
  lsrb       ; 2
  bcc stage3 ; 3
  pulu x     ; 7
  stx ,s++   ; 8
  tstb       ; 2
stage3
  beq done   ; 3
loop
  pulu x,y   ; 9
  stx ,s++   ; 8
  sty ,s++   ; 9
  decb       ; 2
  bne loop   ; 3


What does that do to the spreadsheet?




Another nice speed increase. It was at this point I realised a couple of things: One was that adding more stages would copy larger blocks of data ever faster, and the other thing was that I had turned the byte by byte copy problem into one of copying n bytes again, meaning the copy n bytes routine could be implemented more quickly with this multi-stage approach without even thinking about optimal chunk sizes.

I ended up with an eight stage routine, that copied 128 bytes per loop in the final stage, and was going to need a serious bit of number crunching to count the cycles. This time I created a spreadsheet that ran to over 3000 rows. The first few rows are shown below:




What I was trying to find out was the worst case run time for any value of buffer pointer. Recall that the copy routine has to copy the buffer in two goes to work around the end of the buffer, so I've arranged the spreadsheet as matching pairs of rows with the right hand total column showing the cycle count for a complete buffer copy.

The overhead value is the number of cycles taken by the routine even when there is no work to be done. It's the time spent manipulating and testing the byte count plus adjusting the s register when reaching the stage where octet-sized operations can be used for copying. Because the copy routine is run twice, the overhead has to be added twice. (Again, I haven't included any setup time for the copy, to allow a fair comparison with previous cycle counts)

A quick application of the MAX function revealed the worst case time to be 12342 cycles. That is more than 700 cycles faster than the bytes & chunks approach. I was really happy with that. It was hard to believe that such a weird looking bit of code could be so efficient but the numbers proved it. And sure enough, it looked fast too when running.

Interestingly, the best case run time was 12308 cycles, a difference of only 34 cycles. This means that the copy routine has a very consistent run time, which is a good thing to have in a game.

But what is it?


In summary: For an algorithm where the number of loops is not fixed, there is an efficient method of reducing loop overhead by doing an amount of work equal to the weight of each bit in the count variable. The best number of stages to implement is dependent on the average or maximum count required. (No point having the overhead of eight stages if you only want to copy at most 15 bytes)

I've tried to find out what this technique is called but have not been able to find any references. In my mind I call it 'binary loop decomposition' which sounds dangerously like technobabble. Has anyone else encountered anything like this?

Friday 17 March 2017

Copy that

Some scenes have been created for entertainment purposes


First, I need to set something up. It would appear to be the fashion on 'reality' television nowadays, that if something unexpected happens or goes wrong in an amusing way, a comedy vinyl record scratch sound will be heard. Not sure how to write that down, so I'm just going to go with "verrrrp".

Anyway, having figured out the basics of the scrolling engine, I thought I would get the easy part out of the way first. All I needed to do was quickly copy a large block of data from the buffer to the display. Easy peasy lemon squeezy...

The fastest method I know of for copying on the 6809 makes use of stack instructions:

  pulu cc,dp,d,x,y  ; read 8 bytes from u
  pshs cc,dp,d,x,y  ; write 8 bytes to s-8
  leas 16,s         ; adjust s for next write

Using the dp register means we have to be careful to not use direct page addressing during the copy routine and using the cc register means we have to switch interrupts off in the PIAs. (Assuming you don't want to get interrupted that is; the outcome won't be good while that leas instruction is there)

The leas instruction could be removed by making the buffer write routines more complex, but I decided to leave that as an optimisation for the future.

Right, we know what we're doing, let's wrap this thing up, go home early and snack out before dinner. I don't mean the mint Oreos, they taste like toothpaste. I'm talking about the Hobnobs at the back of the cupboard that my wife thinks I don't know about.*

loop
  copy stuff
  dec count  ; 7 cycles
  bne loop   ; 3

We need to run those instructions 3072/8 = 384 times. Dec and his unwelcome friend Benny need 10 cycles per loop, that's 3840 cycles just spent looping, ouch, so we'll need to unroll the loop a bit, and, oh wait, we need to wrap the read address round the circular buffer!

verrrrp

No Hobnobs


OK, so it wasn't going to be as easy as I thought. I would like to copy large chunks of data to reduce the loop overhead, but the buffer boundary could be anywhere within one of those chunks, meaning I could overrun the end of the buffer and mess things up.

If the pointer is at the very start of the buffer, we can copy the entire buffer without hitting the end. If the pointer is at the end of the buffer, we can only copy one byte before hitting the end. After hitting the end of the buffer, we need to reset the read address to the start of the buffer and then copy the balance.

The key observation here is that we will only hit the end of the buffer once during the copy and we can calculate in advance when that will happen. The steps could be broken down as follows:

  • Set destination pointer to start of screen
  • Set source pointer equal to buffer pointer
  • Copy n bytes where n = buffer size - buffer pointer
  • Set source pointer equal to buffer start
  • Destination pointer continues from where previous copy ended
  • Copy n bytes where n = buffer pointer - buffer start

So our new problem to solve is this: Find a way of quickly copying any number of bytes.

We would like to combine the speed of copying big chunks of data using stack instructions with the precision of byte by byte copying. So one approach might be to copy big chunks until there is less than one chunk left, then complete the operation byte by byte.

How big should the chunks be? Bigger chunks make the fast part of the copy faster at the cost of making the byte by byte part slower, due to there being more bytes left over after the fast copy. Is there an optimal chunk size that minimises the total time? Only one way to find out...

Spreadsheet!


This looks way too tidy to be one of my spreadsheets...

Just to clarify some terminology: I'm calling a group of eight bytes an octet, the inspiration coming from the word septet describing seven bytes in this great write-up, with a chunk formed from a number of octets. (i.e. a multiple of eight bytes)

The idea behind this spreadsheet is the two copy operations will in effect copy the whole buffer chunk by chunk, except for the chunk containing the end of the buffer. This will have to be copied byte by byte. The spreadsheet calculates the number of cycles to copy the buffer for various sizes of chunk. Here it looks like a chunk size of 64 bytes is optimal.

The great thing about spreadsheets is that you can quickly see the effect of a change. For example, suppose I change my chunk loop code from dec/bne to cmps #/bne, reducing the overhead from 10 to 8 cycles:



Now a chunk size of 32 bytes looks more optimal. It can be surprising the effect a small change makes.

(I should point out that I'm not including any time for setting up the copy, just the time spent copying. This shouldn't affect the results significantly as it would be small, and be a similar amount for each case)

If I set the chunk loop cycles to zero, then the optimal chunk size would be 8, with a total cycle count of 12009. What use is that? We can't have a zero loop overhead. True, but it demonstrates the speed of an unrolled version of the code that has 383 consecutive octet copy operations, a calculated jmp into the middle of them to do just the right amount of copying, followed by 0-7 bytes copied individually. The instructions would take up more than 2K of memory, but it's 1000 cycles faster. That's a trade off that might be worth doing on a 64K machine.

But what if I said there's another way of copying that's a lot closer in speed to the unrolled, zero overhead code but instead of taking up over 2K, it takes around 300 bytes? That sounds too good to be true. I will have the pleasure of attempting to explain how it works in my next post...


*It turns out my wife knew all along I was on to the Hobnobs. Like all good zoo keepers, she hid them to enrich my environment.

Sunday 12 March 2017

Scrolling 101

The scrolling idea I had developed so far could be broken down as follows:

  • Four screen-sized buffers, each containing the same image, but pixel-shifted different amounts.
  • A pointer that indicates the position of the top-left corner of the display within the buffers.
  • A tile-based map that provides the graphics.
  • Routines that read the map and draw new data into the buffers in the right places to appear at the screen edges as they scroll into view.
  • A fast copy routine that moves the data from one of the buffers to the screen.

I was fairly sure this would work, but I would need to start coding to test the idea. I decided to develop it one small step at a time and start with a single buffer, scrolling whole bytes instead of pixels. This would test that I was drawing into the right places in the buffer.

One problem with talking about scrolling is that the direction of scrolling is not clear. If I say 'scroll left', which way should the screen move? There's a sort of consensus on 'scrolling down' meaning the screen contents moving up, so that is the system I shall use. i.e. whatever direction we scroll in, the screen moves in the opposite direction.

For this sort of thing I find pictures are helpful in getting the numbers right. A thousand words might be a bit of a stretch, so here's a picture that paints 64 bytes: (oh dear...)


This represents a buffer. I have arranged it as an 8 x 8 square to correspond with a hypothetical 8 x 8 display. The real thing would be much larger, but this will serve to illustrate.

The numbers represent the memory addresses of the 64 locations making up the buffer and the highlighted square represents a pointer to one of those addresses. When we copy the buffer to the display, we use the pointer as the start position for the copy. In this case the display would end up looking exactly like the buffer:




Things get more interesting when the pointer starts moving around as this will make the screen contents appear to move. What happens if we add one to the pointer before copying?




The numbers illustrate where in the buffer the data came from. Compared to the previous display, everything has moved one byte to the left i.e. we've scrolled right. I've highlighted the right hand column to indicate where we expect new graphics to appear. These are the locations in the buffer where we should have written new information before copying the buffer to the screen, the start address given by pointer + width - 1.

Note the behaviour is that of a circular buffer. When the copy operation reaches the end of the buffer it continues from the start of the buffer. It will be the same with writing into the buffer. The addresses must be 'wrapped' to always stay in the buffer.

Let's now add the buffer width of eight to the pointer and copy to the display:



This time the display has moved up compared to the previous display and the new data needed to be written starting at pointer - width. Or, as I have just noticed, the old value of the pointer. (Which saves a calculation and a boundary check)

To scroll left we need to subtract one from the pointer and the new data needs to be written starting at the pointer:




That leaves scrolling up by subtracting eight  from the pointer. Again the new data will start at the pointer:




That covers the four main directions. Moving diagonally can be lazily achieved by performing separate horizontal and vertical scroll operations before copying. It's lazy because the data in a corner would get written twice where the horizontal and vertical sections overlap, but it should be possible to selectively draw in the corners depending on the situation.

These concepts can be tested fairly easily with some simple code. Four routines are required to handle each of the four scroll directions. These could be in response to key presses, and each routine could draw it's own colour or pattern to verify that data is being written to the right places.

The vertical scroll routines need to draw a horizontal stripe, with data at consecutive addresses. If the stripe reaches the end of the buffer before the full width has been drawn then it needs to continue from the start of the buffer.

The horizontal scroll routines need to draw a vertical stripe, with the address advancing by an amount equal to the buffer width. If the address falls outside of the buffer, then the buffer size needs to be subtracted to put the address back inside the buffer.

The copy routine can be a simple loop that transfers one byte per loop, checking for the end of the buffer and wrapping if necessary. It will be slow but it will work, or at least easy to debug.

It's a fairly gentle start, but I find all of this useful in understanding the problem properly, which will hopefully help with debugging later!

Monday 6 March 2017

Game On

Solving puzzles is rewarding. That's why the weekend newspapers have a big pullout section chock full of them. They want those endorphins to flow freely and influence your future newspaper buying habits.

Previously I wrote about finding a faster way of scrolling by using pre-shifted data. I was trying to solve a puzzle, one that might not have a solution, but I had made a little bit of progress, got a little bit of reward, and kept thinking about it.

One thing making it slow was having to OR the tile edges together. A way to mitigate this is to have larger tiles. Less ORing and more SToring. (If I can find a few more rhymes like that then this blog is pretty much going to write itself!)

How about tiles that are four bytes wide? What does that do to the single row code?

  puls a,x,u   ; 10 cycles
  ora ,y       ; 4
  sta ,y       ; 4
  stx 2,y      ; 6
  stu 4,y      ; 6

We're now pointing to the tile data with s so that we can use u as a temporary register. The first tile row now takes 30 cycles, or 32 cycles for subsequent rows. For a PMODE3 or 4 screen, this needs to be repeated 6144 / 4 = 1536 times, taking 32 x 1536 = 49152 cycles per screen. That's about 18 fps and 50% faster than two byte wide tiles. A decent gain just by making the tiles larger. The thing is, I'm looking for a big speed increase. Time for a compromise.

Does the graphics mode really need to be PMODE3 or 4? It's almost ingrained that you use the highest graphics resolution available. Good money was paid for all those pixels so surely the higher the resolution, the better things will look?

A PMODE1 screen has a resolution of 128 x 96. Half the vertical resolution of PMODE3, but the pixels are actually square. At 3072 bytes the display now requires half the memory, and can be updated in half the time. Good Things have been done using that graphics mode, such as GloveFlagon Bird and sixxie's vertical scroll demo.

So I've quickly talked myself into using PMODE1 and traded detail for speed. That doubles the rate to 36 fps. Things are starting to get interesting. I wonder how much memory all these pre-shifted tiles are going to take up?

A PMODE1 four byte wide square tile requires 64 bytes of storage. Shifted versions will be five bytes wide and require 80 bytes. There are four pixels in a PMODE1 byte, so to be able to scroll to any pixel we would need three shifted versions of each tile in addition to the unshifted version. That's 304 bytes per tile. It sounds excessive but you could fit 32 tiles in under 10K and have a chance at producing an interesting background.

I nearly decided that was the way to go, and I think that approach could find good uses, but I still wasn't convinced it would be fast enough for what I was trying to do. To run at 25 fps, there would only be around 10,000 cycles left over for the rest of the game and I haven't accounted for fetching the tile addresses or for drawing partial tiles at the screen edges.

So I started thinking along the lines of having ever larger tiles to reduce the tile overhead but somehow slowly building them on demand out of smaller tiles. It was that line of thinking that led to the solution I settled on: Just have four buffers, each containing the screen image with a different amount of horizontal shift. Drawing the background is reduced to a large copy operation from one of the buffers. Scrolling would be achieved by moving the buffer start position and by drawing a stripe of new image data into the buffer in the right place to appear at the edge of the screen.

This sounded pretty simple. How fast could it run? A large block copy can be done with a set of instructions like this:

  pulu cc,dp,d,x,y   ; 13 cycles
  pshs cc,dp,d,x,y   ; 13
  leas 16,s          ; 5

The leas 16,s instruction is there to compensate for pul & psh working in opposite directions. It can actually be removed if adjustments are made to the code writing into the buffer, which gives a nice improvement in speed for a pure horizontal scroll, but there are complications with combined horizontal and vertical scrolling that offset some of the gain. Maybe a topic for the future.

Using the cc register seems a bit dangerous because it contains the interrupt masks but it's fine as long as interrupts are disabled at the hardware level by programming the PIAs.

Anyway, that's 8 bytes copied every 31 cycles or less than 12,000 cycles for a PMODE1 screen. The data that needs to be written into the buffers amounts to a byte-wide column (or equivalent row) of tile fragments per game loop. This is sounding much faster.

Memory usage is heavy, as we need 12K of shift buffers, but the simplicity of the method is very appealing. Admittedly this is better suited to 64K machines but I still feel I can do something for 32K machines as the buffer requirements can be reduced by not using the whole screen. Just reserving 8 lines of display for score and status reduces the buffers by 1K.

Game On!

Saturday 4 March 2017

Initial thoughts

It takes me around 45 minutes to drive to work in the morning, and perhaps a little longer to drive home again, so I get plenty of time to think. Usually I would be thinking about projects at work, but lately I was thinking about what to do with my new found interest in the Dragon. (It's also a great opportunity to listen to the CoCo Crew Podcast, but you can't yet as it's only 2014 and the first episode doesn't come out until next year.)

Maybe I could write a game. I had written two before and knew what was involved. There was even a third partially written game that I could pick up again if I could resurrect the source code. This was a vertically-scrolling-bullet-hell type game that I abandoned after the decline of the Dragon scene. And yet there was still that desire to see a game like Time Pilot '84 on the Dragon, even though I had decided it impossible all those years before.

My son will often ask "why?" at the end of even the most carefully crafted explanations. Most kids do. I like to think it's because they're good engineers and want to challenge assumptions. If you make assumptions then you're making an ass out of u and mptions or something.

Anyway, I'm sat in traffic asking myself "why?" for a variety of reasons but in particular I'm thinking about scrolling:

It's impossible to have an arcade style multidirectional scrolling image on the Dragon.
Why?
Because horizontal scrolling is too slow.
Why?
Because you have to shift the pixels into position before storing them on the screen.
Why?
Because they are not already shifted into the right position.
Why?
Good question...

We could for example have pre-shifted tile data. Several versions of each tile making up the background could be stored in memory, each version pixel-shifted to the required amount and ready for storing directly on the screen.

Well, not exactly directly. A shifted tile will have some empty pixels in the bytes making up the left and right edges. That means those bytes belonging to neighbouring tiles need to be OR'd together because they share the same on-screen byte. Say our original tiles are two bytes wide, then the instructions to put the first row of a tile on screen might look like this:

  pulu a,x   ; 8 cycles
  ora ,y     ; 4
  sta ,y     ; 4
  stx 2,y    ; 6

where u is pointing to the shifted version of the tile and y is pointing to the screen. That takes 22 CPU cycles for the first row of the tile, though in general it would be 24 cycles per row because we would need offsets for ora/sta ,y.

So if we have a PMODE 3 or 4 hi-res screen then we will need to run those instructions 6144/2 = 3072 times to fill the screen. That's 3072 x 24 = 73728 cycles, which is about 12 fps (frames per second). Not terrible, but that doesn't include getting the tile addresses, any form of looping or indeed the rest of the game.

How fast do we actually need? From my small amount of experience of writing games, the faster the better, but 25 fps or higher starts to look like an arcade game. For points of reference, traditional cartoon animation was done at 12 fps or 24 fps to smooth out faster movement. Arcade machines were often running at the NTSC frame rate of 60 fps to give really smooth animation.

A PAL Dragon has a video refresh rate of 50 fps, so we could aim to update the game display once every two video frames or 25 fps. That would give us a little over 35000 CPU cycles per game loop so the pre-shifted tile method of scrolling is still looking way too slow. We need an improved scrolling algorithm or make compromises or both...

Thursday 2 March 2017

Not so ancient History

I would come back to the Dragon from time to time, particularly the late 90's, when I hoarded collected a lot of gear. This ended up being stored at my sister's house and then more or less forgotten about. Years passed, I moved house a couple of times, I got married, my son was born and my caffeine addiction became ever deeper.

A little over three years ago, my sister suggested to me that I really should think about moving all the stuff that I'd been hoarding storing in her house, to, for example, somewhere that was not in her house.

That was a bit of a problem. There wasn't room in my house for all my stuff. It's a mountain of old computers and associated junk that you can't just sneak into the spare room. It will be noticed and questions will be asked. Questions that don't have easy answers. I prefer to avoid those kinds of questions so that I can stay in my happy place.

Then I had an idea. I would board up the loft to make storage space. We could tidy up the spare room and make it into a guest room, and, you know, put one or two things up the loft that my sister had kindly been storing for me. Everyone's a winner.

So that's what I did, and then I made several car trips to collect all my forgotten treasure. There was a lot of stuff. There were examples of every generation of PC from XT to Pentium, monitors, printers, disk drives, hard drives, BBC micros, Amiga, Atari ST, Enterprise 64, disks, cassettes, magazines, books, sausages, hash browns, eggs, beans, mushrooms, bacon. Sorry, it's nearly the weekend and I've had muesli every morning this week.

Where was I? Oh yes: What really grabbed my interest was the Dragon collection. As I checked out the contents of each box, I began to remember how much enjoyment I used to get out of this peculiar machine. I couldn't wait to get some more free time to power up these old computers and find out if they were still working.

In the meantime I would have a look at the interwebs to see if I could turn up anything Dragon related. I was pleasantly surprised to find The Dragon Archive. This is a site that has amassed a tremendous amount of Dragon software, magazines, ROM images etc., has a wiki for everything Dragon and a forum where there were actual people, with actual Dragons, talking about actual Dragon stuff, like it was a normal thing to do. I wanted in!

Wednesday 1 March 2017

ROTABB

I enjoyed writing Ball Dozer very much and was soon thinking about what I could produce next. John Foster of Kouga Software was also keen to see what else I might come up with and he kindly gave me his ALLDREAM cartridge to help the process along.

This was so much better than using the cassette version of the assembler. Sure, you still had to load in your source code, but the system was ready to do work as soon as the power was on. And because the assembler was no longer resident in RAM, there was more room for the project. It was a more productive environment.

As Time Pilot '84 was my favourite arcade game, I decided to have a go at making my own version. It was ambitious but I knew all the machine code instructions, I could program anything. Right? Wrong...

I ran into trouble straight away with the scrolling background. It was really slow. Vertical scrolling is relatively easy: It boils down to copying data from one place to another as quickly as possible. Horizontal scrolling at a pixel level required multiple shift operations on two source bytes just to produce one screen byte. I couldn't figure out how to make it anywhere near fast enough for a game.

I didn't want to abandon the game though, and came up with the idea of having a scrolling grid to give the impression of movement. This is something that can be drawn very quickly and the Criss Crossy Lines Dimension was born. This lent itself nicely to the cliché ridden B movie plot provided by my talented friend Simon Harrison.

Title Screen. Colour had still not been invented.

Simon and I decided that the story and level names would be completely absurd, partly for fun, but also to send up games that had arbitrary story lines with little relevance. What can I say? - we were young and subversive. In your face, games industry!

Oh I'm afraid the rhubarb will be quite
operational when your friends arrive.

I was really pleased with the end result. The controls were a bit clunky and the sound effects could have been better but I felt that I had produced a good game to the best of my ability.

Game Over


The game was finished in time for a show, and it attracted favourable reviews, but the Dragon scene was looking very unhealthy at this point. Dragon User magazine had stopped publication due to declining readership and software publishers were calling it a day.

It was fun while it lasted. I drifted away from the Dragon and moved on to other things, thinking that was that. I would never have guessed the Dragon would still have an enthusiastic following nearly 30 years in the future...


Just another day in the Criss Crossy Lines Dimension.