Friday, 16 June 2017

Every which way

It's been a while, lots of distractions, such as decorating, family time, Season three of Better Call Saul, and preparing for the Cambridge Dragon Meetup, which was a lot of fun and hopefully will become a regular event. It was really good to meet the people I'd been chatting to and following these last few years.

It was also great to see other work in progress games at the show. In fact showing me how it's done properly. There was Steve Bamford's Circe's Island, with stunning Nintendo-esque graphics and sophisticated gameplay, and Ciaran Anscomb's Dunjunz, a brilliant reconstruction of an 80's game dripping with 8-bit goodness.

Why I oughta


In other news, I got trolled by my seven year old son. He got me good...

I was recently trying to diagnose a problem with one of my Dragons, where it would intermittently fail on startup, or crash after a short time. After poking it with a stick for a while, I decided the problem was due to the CPU not being connected to the rest of the computer, or "dry joints" as the condition is sometimes known.

A little bit of soldering later, the machine started looking like it was going to behave itself, so I let my son Ed have a play to give it a test. I showed him how to assign values to numeric and string variables, perform some simple operations, and print the results.

After a while, a voice calls down the stairs: "What does 'FC error' mean?"

Me: "It's short for 'function call error'. It means it doesn't like the number you're giving it"

Ed: "But it happens when I try to print F$"

Me: "That's weird. Let me have a look."

I typed in '?F$' and sure enough I saw this:

?F$
?FC ERROR
OK

Stuff like that happens when memory gets corrupted so I thought perhaps the computer still had issues. I typed a few other things and it seemed OK but ?F$ kept giving "?FC Error".

After watching with amusement for some time, Ed started laughing and told me he had previously typed in F$="?FC ERROR" and then cleared the screen. He used my own knowledge against me like some kind of mental Judo. I've never felt so simultaneously proud and stupid.

The treacherous ratbag also prefers Flagon Bird to my game.


Anyhoo, back to the plot:

How to scroll in any direction, using just horizontal and vertical scrolling


The game design has the player ship fixed in the centre of the screen, with the player controls rotating the ship to point in some desired direction. The illusion of flying is created by making the background move in the opposite direction. I decided on having a total of 16 possible directions, mainly limited by my ability to draw convincing player graphics. After much frustration and cursing, I managed to create three base images that could be rotated or flipped to give nine images covering 180° of rotation. The other 180° could be covered by drawing seven of the existing images upside down:

Just taking the ship out for a spin

We can go up, down, left or right easily enough by calling the appropriate scroll routine, but what about moving at other angles? Scrolling at 45° can be achieved by scrolling both horizontally and vertically at the same time, but there's a problem: The movement looks too fast. About 41% too fast, thanks to Pythagoras.

To scroll or not to scroll?


To move at the right speed it would seem that we need to scroll a bit less than one pixel per frame. Of course, the only options available are scroll or not scroll on any given frame, so the best we can do is to achieve an average speed as measured over a number of frames. That can be done with simple counters.

We need two counters, one for horizontal and one for vertical, and each counter will have some value added to it every frame. When a counter overflows, that means we scroll on that frame. So for the specific case of moving at 45°, we need both counters to overflow approximately 71% of the time and that will give us the desired average speed. (Pythagoras again: We want to move 0.5*√2 pixels in each axis)

It's tempting to make the counters eight bits in size, and use the carry flag to detect the counter overflows. That would work OK, except it's not possible to get an overflow rate of exactly 100%, the maximum rate actually being 255/256. To be honest it's not a huge deal; you would have to fly horizontally or vertically for several seconds to see the glitch where one frame didn't scroll, but I decided to avoid the issue and use seven-bit counters instead.

Using seven-bit counters, we can detect overflows using the sign bit, plus update both counters with 16-bit operations with something like this:

 ldd scroll_ctrs      ; load both x and y counters
 anda #$7f            ; clear x counter sign bit
 andb #$7f            ; clear y counter sign bit
 addd scroll_ctrs_inc ; add x and y increments
  std scroll_ctrs      ; store both counters
  bpl noscroll_x       ; no overflow on x counter
  jsr do_x_scroll      ; scroll one pixel horizontally
no_scroll_x
  lda scroll_ctrs+1    ; check y counter
  bpl noscroll_y       ; no overflow on y counter
  jsr do_y_scroll      ; scroll one pixel vertically
no_scroll_y


Give me a sine


All we need to do is work out the counter rates. Hey you know what we haven't had for a while? That's right: Spreadsheet!

It's got rows and columns and titles and everything

There are 16 directions, giving us angles that are multiples of 22.5°. The horizontal and vertical components can be found by taking the cosine and sine of the angle, then multiplying by 128 gives us the rate values we need to add to the seven-bit counters. (Bit of trigonometry going on there, sorry about that. I should also point out that I'm using the mathematical convention for angles i.e. zero degrees is at the 3 o'clock position and increases in the anticlockwise direction, rather than something normal like starting at 12 o'clock and going clockwise)

Note that I'm using the absolute values of the rates and specifying the direction separately. That's because we need to know which scroll routine to call when the counter overflows. Strictly speaking I should be using signed rates, but I didn't see a quick and easy way of doing it, and this method seems to work well enough in practice. The difference is fairly subtle and affects the size of the turning circle when changing direction.

I've implemented this as a lookup table containing the 16 rows of counter rates and scroll routine addresses. The direction tells us where to look in the table, and the rates and scroll routines are accessed using indexed addressing. (Indirect in the case of the scroll routines)

Know when to stop


So that's how I do that. I was disappointed I couldn't find any good Pythagoras jokes. I was forced to try and make one up: The straw in the orange juice is equal to some of the straws in the other two pints. Needs work. A lot of work.

Sunday, 14 May 2017

Time for a change

One of the many neat things in Time Pilot '84 is that the background palette is different for each level. The map stays the same, just the colours change, presumably to give an impression of time travel between levels. It adds variety and helps give each level an identity.

I wanted to do something like this in my game but the 6847 VDG doesn't exactly spoil you with choice. In four colour modes, the palette can be one of two colour sets: Green-Yellow-Blue-Red or Buff-Cyan-Magenta-Orange.

The word you are looking for is 'hideous'


Simply switching between these two colour sets doesn't look very good. The alternate colours don't make much visual sense:

Colour set 0

Colour set 1. Yuck.

The problem is that the bright parts of the image are no longer bright, resulting in a very unnatural looking image. However, if the colours are swapped around, it can be made to look a lot better:

Not so ugly
Also usable
My God, it's full of ice cream.


All the colours in half a rainbow


As I was thinking about moving colours around, I wondered how many usable palettes could be made by rearranging the original set 0 colours. It turns out there are quite a few. Each image caption below describes how the colours have been remapped from the original Green-Yellow-Blue-Red palette:

Red-Yellow-Blue-Green
Green-Yellow-Red-Blue
Blue-Yellow-Green-Red
Blue-Yellow-Red-Green
Red-Yellow-Green-Blue
Yellow-Green-Red-Blue
Yellow-Green-Blue-Red

That's a pretty good result. Some of the permutations look more alien than others, but I think that fits this type of game, and they still make visual sense because the brighter colours are where they should be.

So I've now got all these variations on the background map just by swapping the colours around. Awesomeness. But how do I go about making all these palettes actually happen?

One way would be to have a different set of tiles per palette, but that seems very wasteful of memory. It would be more efficient to manipulate the image data in memory to achieve the colour change between levels. A function is required to map the colours from one palette permutation to another.

Green is the new red


Let's say the existing palette is the usual Green-Yellow-Blue-Red and we want to change it to Red-Yellow-Blue-Green. This means that the green pixels need to be changed to red, the yellow and blue pixels stay the same, and the red pixels need to be changed to green.

Let's now change the palette to Green-Yellow-Red-Blue. This changes red to green, yellow stays the same again, blue changes to red and green changes to blue.

The colour changes are defined by comparing the new palette to the current palette. The new palette defines what we want the colours to be, but we need to refer to the current palette to define the colours we are changing from. That means we need to store the current palette for future reference, so let's have four palette registers numbered 0-3, and initialise them by storing the values 0-3 to represent the standard palette. (Green=0, Yellow=1, Blue=2, Red=3)

Updating the palette registers is easy, we just need to store the new palette values in them, but we also need to generate information to define the mapping of current colours into new colours.

To give an example, if we imagine changing the colours of a target image pixel by pixel, what we want to know is if the target pixel is green, what does it need to change to? To find out we need to look at the current palette registers, determine which one contains green, and then get the new colour from the same position in the new palette. It sounds like a lot of work.

It would be more efficient if we first laid out the new colours in a logical order: New green, new yellow, new blue, new red. Then we would know where to look to find the new colour for green. The following piece of code does exactly that. It looks at the current and new palettes, updates the current palette and creates a nicely ordered four byte mapping table:

      ldx #current_palette
      ldy #new_palette
      ldu #palette_map
      lda #4
      sta count
loop  lda ,x    ; current palette entry
      ldb ,y+   ; new palette entry
      stb ,x+   ; update current entry
      stb a,u   ; store in new mapping
      dec count
      bne loop



We can now change the colours of all four pixels in one byte using some code like the following. It recolours the pixels in A using the mapping table at U:

      ldx #4     ; 4 pixels per byte
loop  clrb       ;
      lsla       ; get one pixel in B
      rolb       ;
      lsla       ;
      rolb       ;
      ora b,u    ; look up and combine entry from mapping table
      leax -1,x  ;
      bne loop   ; next pixel


It's a little on the slow side. If I use it to directly recolour the contents of the four shift buffers as well as the tile graphics, it takes over a second. The way I've made it faster is to build a table containing the results of converting each possible value from 0 to 255. This table can then be used to quickly lookup converted colours a whole byte at a time. Much faster.

Rewriting history


I've made it sound like I arrived at a neat and tidy solution fairly quickly. The reality is very different. My first solution involved a 16 byte lookup table to convert the high and low nybbles of each byte separately. Then I came up with a faster and more simple piece of code that generated a 256 byte lookup table using recursion.

At the time, recursion seemed like a natural solution to the problem of generating a table containing every permutation of colours. It was only while writing this post that I revisited the code to remind myself how it works and settled on the even more simple current solution. As I prefer the current solution so much more than the previous ones, I've decided to pretend that was how I did it in the first place.

If we've learned anything from Bill & Ted, it's that only the winners get to go back in time and set things up. That is, as long as you don't pay much attention to the git history...

Sunday, 7 May 2017

Shift work

Would you believe it?!

All this talk about tiles has real world repercussions: The new reality is that when someone has a shower upstairs, water comes out of the cooker hood downstairs.

Now, I'm fairly sure that isn't one of the advertised functions of the cooker hood. I couldn't find any mention of it in the manual. Extract steam? Check. Provide light? Check. Dribble minging second hand shower water into my glorious and most sacred Saturday Morning Fry Up? No. I didn't think so.

So according to 'professional' opinion, it would appear that the tiles around the shower weren't done 'properly', by a person or persons 'unknown', or possibly me and my father-in-law, and now after six years it has started leaking. And I can't just regrout it. Oh no, that would be way too easy. Those tiles have got to come off, and the wall fixed. And new tiles put back, because when my Wife gets involved, stuff 'has' to change colour.

None of this could possibly be my fault. Obviously greater forces are at work and it was talked up as a result of this blog. So please, be more careful in future. Careless talk costs £31 per square metre (inc. VAT).

True story.


Back to my happy place


So far I've described methods for scrolling in any direction, by drawing a relatively small amount of graphics into a buffer, and then copying the contents of the buffer to the display. This has given us pixel by pixel vertical scrolling, but the horizontal scrolling is still byte by byte. i.e. four pixels at a time in the chosen graphics mode. So what do we need to do to make the horizontal scroll work at the pixel level?

Early on in this sorry saga, many hump days ago, I spoke of The Idea. This involved four buffers, each containing the same screen image shifted by different amounts. The only time bit-shifting would be required is when new graphics are drawn into a buffer: Just a thin strip to fill in new background as it appears at the edge of the screen. Once in the buffer, there it will stay until it scrolls out of view again, making room for new graphics as it does so.

I call these four buffers shift buffers, numbered from zero to three. Shift buffer zero contains unshifted graphics. It works in the same way as the single buffer in the byte by byte scroll. The other shift buffers contain the same graphics shifted between one and three pixels to the left.

Each time we want to scroll horizontally, we need to choose one of the buffers into which we will draw new graphics. To keep track of where we are, we need a horizontal pixel counter. This is incremented each time we scroll right one pixel. (i.e. the background moves left one pixel)

The bottom two bits of the horizontal counter can therefore represent the number of the current shift buffer and we can use those two bits to look up the address of one of four drawing routines, each dedicated to drawing into a specific shift buffer.

Scrolling right, the shift buffers will be accessed in the sequence 0-1-2-3-0-1-2-3, and the buffer and map pointers will be incremented by one byte each time we land on buffer zero. (Because we have scrolled a new byte into view)

It's a similar story for scrolling left, except the sequence this time is 3-2-1-0-3-2-1-0, and the pointers need to be decremented each time we land on buffer three.

The buffer pointer now works slightly differently, as it represents the same position in all four buffers. So instead of pointing to an absolute address within one buffer, it is now an offset to be added to the base address of the current buffer.

Fab Four


So what do the four drawing routines look like? The buffer zero routine is the same as the one we already have for byte-by-byte horizontal scrolling. The others have more work to do. They have to shift the graphics data left by an amount that suits each buffer.

Now, when the pixels are shifted left, one or more of them will fall off the left hand edge of each byte. This is no problem because these will already be in the buffer. They were drawn during a previous frame. The problem is what happens on the right hand side of each byte. Where do the new pixels come from to fill up the space left by the other pixels moving left?

We Can Work it Out


The answer is the new pixels come from the half of the tile immediately to the right of the one being pointed to by the map pointer. This might be the right hand half of the same tile, or the left hand half of the next tile, or occasionally the left half of the tile from the left edge of the map thanks to wrapping. In each case, it means we need to load two bytes of tile data to create one byte of shifted data. The first byte is found via the map pointer, the second byte via the (wrapped) map pointer plus one.

If we get the first byte in the A register and the second byte in the B register then we are ready to do some shifting:

Source bytes before shifting

Remembering that each pixel is made up of two bits, the following sequence of instructions will shift left by one pixel and form the basis of the shift buffer one code:

    lda ,u++  ; get 1st byte
    ldb ,s++  ; get 2nd byte
    lslb      ; shift left one pixel..
    rola      ; ..by shifting two bits..
    lslb      ; ..from B..
    rola      ; ..into A
    sta ,x    ; store in buffer
    leax 32,x ; next row in buffer
    ; insert code to check for buffer boundary here
    ; loop above instructions eight times for complete tile

The U and S registers have been set up to point to tile graphics data. The X register is pointing into the shift buffer. The A and B registers look like this after shifting:

Shifted one pixel left

The shift instructions can be doubled up to create the shift buffer two routine:

Shifted two pixels left

And tripled to create the shift buffer three routine:

Shifted three pixels left

Back to the LSR


But shifting three pixels left is not very efficient. We can achieve the same result by shifting one pixel right instead. It just means the result ends up in the B register instead of the A register:

Shifted one pixel right

The four drawing routines all have the same general structure, the same logic behind the pointers, and the same technique for faster drawing using unrolled loops. They are so similar in fact that I have defined them in a macro, reducing the amount of effort in development and debugging. The macro just needs to be given the number of pixels to shift left as a parameter.

Now, finally, I was able to scroll pixel by pixel horizontally, and at a very high speed. That day was a good day. Well, it was a good day until I started scrolling vertically. The graphics were not updating correctly when scrolling horizontally after scrolling vertically. It soon became obvious that the vertical scroll routine needed to draw in all four shift buffers. I added a routine to take the newly drawn vertical scroll data from shift buffer zero, and write suitably shifted versions to the other buffers. Then it worked properly.

Shameless handwaving


I made one major improvement to the horizontal scrolling after that. It comes from the observation that the shift buffers all use the same source data, and that the shift buffer zero routine is relatively fast. The improvement is to calculate tile graphics addresses during the shift buffer zero routine only, and store those same addresses for re-use by the other shift routines. There are a couple of subtleties: Addresses need to be calculated for both left and right hand edges of the screen, so that the correct addresses are available when we change direction. It also becomes necessary to scroll the contents of the address buffers up and down during vertical scrolling, to keep the addresses in sync with the display.

I think that's quite enough rambling on about the scroll engine. Next time, something else!

Tuesday, 18 April 2017

More details

At some point during development, it struck me how complicated the scroll engine was becoming. The initial concept could be summarised in a few words and was supposed to be a simple matter of moving pointers around circular buffers, copying data from one place to another. But now there seemed to be an endless list of details to consider.

In some ways, the scrolling really is simple. It does pretty much boil down to pointers and copying. Where it gets complicated is figuring out which pieces of tile to draw, and a lot of that complication comes from the interaction between horizontal and vertical scrolling. One direction messes up the other.

But I do like a challenge, and I really wanted to see this thing working...

For horizontal scrolling we need to draw a one byte wide vertical stripe of tile pieces. Right away there are a couple of problems. The first is caused by the tiles being two bytes wide and the second is caused by vertical scrolling.

The first problem is easily solved by modifying the 'tile zero' base address we use to calculate the tile image addresses. By adding one to the base address, we automatically access the right half of each tile. If the tile image data starts at an even address, then the least significant bit of the address can be set to achieve the same effect.

The information to decide if we are accessing the left or right half of a tile comes from the map column pointer. This pointer is incremented or decremented at the same time as the buffer pointer, meaning the least significant bit indicates which tile half is being pointed to.

The second problem is a bit more involved. It means we have to deal with partial tiles at the top and bottom of the display. This divides the vertical stripe into three sections: A partial tile at the top, a run of complete tiles (well, half tiles), and a partial tile at the bottom.

Sometimes the tiles will line up neatly with the top and bottom edges of the display, meaning no partial tiles, and fewer visible tiles. How does this work exactly? The screen is 12 tiles high, but if partially scrolled, 13 will be visible. We can define things a bit more precisely to help shape the algorithm:

  • The top tile will have some or all pixel rows visible.
  • The run of complete tiles will always be the same number of tiles (11 if part of a full height display).
  • The bottom tile will have some pixel rows visible, or none.

That puts the bottom tile in The Occasionally Disappearing 13th Row*. It simply gets skipped on those frames where the tiles have perfect vertical alignment with the screen.

Take it from the top


Let's look at the top tile first. When it is partially visible, the bit that is missing is the top part. We can use the vertical pixel counter to figure out the parameters. This counter is incremented each time we need to move the screen contents up one pixel. The tiles are eight pixels high, so we're interested in the bottom three bits of the counter, which gives us a number in the range zero to seven. When it is zero, all of the tile rows are visible, and when it seven, just the bottom row of the tile is visible. So what we need is to subtract this number from eight to give us the number of rows of pixels to draw.

The other thing we need to determine is an offset into the tile image data so that we draw the correct part of the tile. Just like we did for vertical scrolling, we can take the bottom three bits of the vertical counter and multiply by two to create an offset for the tile image data.

The list of hoops to jump through before we can start drawing looks like this:

  • Determine where in the buffer to start drawing using the logic discussed in Scrolling 101
  • Determine where in the map we will start reading using the map row and column pointers discussed in Details
  • Modify the tile image base address to select the left half or right half of each tile. (i.e. add one if the bottom bit of the map column pointer is set)
  • Determine the parameters for the partial top tile using the vertical counter

That sets us up for the top tile. We use the map pointer to give us the tile ID which in turn allows us to calculate the address of the image data. We can then copy tile image bytes to the buffer.

After each tile image byte is drawn into the buffer, we need to advance the buffer destination pointer to the next row and check that it hasn't crossed the end of the buffer. If it has, then the buffer size should be subtracted from the pointer so that drawing continues from the top of the buffer.

Take it to the bridge


After the top tile, we need to draw the run of full height tiles. These are relatively easy as they are all fixed height and can be drawn with two loops: An inner loop to output the fixed number of bytes per tile, and an outer loop to output the fixed number of tiles. We continue advancing and wrapping the buffer destination pointer for each byte written, and similarly advance and wrap the map pointer for each tile produced.

Having to check the buffer destination address for every byte written consumes a lot of cycles. It looks like this piece of code:

    cmpx #buffer_end
    blo no_adjust
    leax -buffer_size,x
no_adjust

As the end of the buffer can only be crossed once, this code does very little useful work. It nearly always executes just the cmpx and the blo, but that's still 96 x 7 = 672 cycles for a full height draw.

It would be nice to avoid as many of those checks as possible. The approach I've used is to check the buffer pointer before drawing each full tile. If there is room to draw a tile without reaching the end of the buffer then it draws the tile using an unrolled loop with no pointer check. Otherwise the tile is drawn byte by byte in a loop with the pointer check. That trims out a lot of cycles without adding a lot of complexity.

Throw it in the river


Finally we reach the partial tile at the bottom. This is easier to deal with than the top tile. Firstly, the part of the tile that is missing is the lower part of the tile, so there's no need to offset the image address. Secondly, the number of pixel rows we need to draw is simply the bottom three bits of the vertical pixel counter. If it's zero, we don't need to draw the tile at all as we've already reached the bottom of the screen.


What we have so far, is pixel-by-pixel vertical scrolling, but the horizontal scroll is still only byte-by-byte. To get fast horizontal scrolling working at the pixel level, we need to bring in additional buffers and expand the drawing routines to include pixel shifting. Another layer of complexity. But at least the scroll engine will then be complete. It couldn't get any more complicated than that. Could it? To be continued...

(Spoiler alert: Yeah, it could)



* The Occasionally Disappearing 13th Row is possibly a British movie of the "I say, that's inconvenient!" disaster movie sub-genre, starring Timothy Dalton as a cheesy airline boss; Bill Nighy, apparently legally required to be in every British movie; and Martin Freeman as Tim from The Office. Again.

Sunday, 2 April 2017

Details

It's fair to warn you that this one could be heavy going. Time to figure out some details. It's just drawing tiles, so you'd think it would be easy, but there are complications. Oh the horror...

Problems, problems


For the moment, I'll focus on scrolling whole bytes rather than shifting pixels, to avoid getting too complicated, too quickly. I'm not smart enough to deal with the whole problem in one go and I find it's helpful to understand it in small byte-sized pieces, if you'll forgive the terrible pun. (And you've forgiven so much to have got this far). Here's a list to start thinking about:

  • We need to know which tiles need drawing. That implies some kind of pointer into the tile map to track movement on the screen.

  • We need to convert tile numbers from the map into addresses of graphics data for drawing.

  • For reasons that used to make sense, the tiles are two bytes wide. That's a complication. Sometimes we will be drawing the left half of a tile, and sometimes we will be drawing the right half.

  • We could reach the end of the buffer before we've finished drawing a strip of tile fragments. That means we need to keep control of the drawing address and wrap it when required.

  • We could fall off the edge of the world. I mean the map. We could fall off the edge of the map while drawing tiles. That means the map pointer may need adjustment between tiles.

  • The multidirectional scrolling means we will have variable sized pieces of tile on all sides of the screen. Oh joy. It didn't have to be this way. I could have done a nice side-scroller. That would have been much easier to program and document. But not me, nooOOOooo. Scrolling in one direction just wasn't good enough for Lady-La-De-Da-Fancy-Multiway-Scroller. Now everyone has to stay late and clear up the mess. Well, I hope I'm happy with myself. Hey where'd everyone go?

Solutions, solutions


The map is a two-dimensional structure, which suggests the use of separate row and column pointers. It's tempting to use a single pointer to access the map, just like we're doing with the buffer pointer, but there is a subtle difference: When we increment the address past the left or right hand edge of the map, we find ourselves on the other side of the map, but one row up or down. Not exactly what we want. We would like to appear on the other side of the map on the same row. That seems easier to handle (to me anyway) with separate pointers.

The row pointer could be the address of the first tile of the current map row, and it would need to be updated each time we've scrolled enough to bring a new row of tiles into view. i.e. every eight pixels. If the pointer goes off the top of the map then the map size can be added to get us back into bounds and a similar adjustment done for the other direction.

That leaves the column pointer to specify an offset for a particular tile in the row. If it is incremented or decremented at the same time as the buffer pointer then it represents a half tile. (Recall that the tiles are two bytes wide). This sounds pretty useful. The least significant bit can be shifted into the carry and used to select the left or right half of the tile, and the remaining bits can be added to the row pointer to give us the map location. As the map is a power of two wide, wrapping can be achieved by ANDing the pointer with a mask. (Otherwise it would need a compare and branch)

The tile that the map pointer is pointing to is the one that should appear in the top left corner of the screen. This is fine for scrolling left or up, because the new strip of graphics will include that tile. For scrolling right or down there's an extra step: An offset will need to be added to the column or row pointer to address the tiles on the other side of the screen. It's a similar concept to the one used to decide where to draw in the buffer for each scroll direction.

Each tile image takes up 16 bytes of memory, so all we need to do to find the address of  a tile image is to load the tile number from the map, multiply it by 16 and add the address of tile number zero. Using a lookup table would be a little faster, but I'll keep it simple and stick with a MUL instruction for now.

A closer look at vertical scrolling


Vertical scrolling involves drawing a horizontal strip one pixel high, running the width of the screen. A tile is eight pixels high, so we need to pick one of the eight pixel rows to draw. This can be handled by adding an offset to the tile zero address. The offset is simply two times the number of pixels of displacement required, because each row of the tile is two bytes.

That means we need to keep track of the vertical pixel offset. I'm using a vertical counter variable that is decremented or incremented for each pixel scrolled up or down. By ANDing the counter with #7, we can detect when we cross a tile boundary and therefore need to move the map pointer. After ANDing with #7, the value can be left-shifted (i.e. multiplied by two) to create the pixel row offset.

That works neatly for scrolling up, but there is a modification required for scrolling down. Firstly, we need a different offset. If you imagine we've scrolled into a position where the tiles are neatly lined up with the top and bottom edges of the screen, the counter ANDed with #7 will be zero. If we scrolled up into this position then we needed to draw the top edge of the tiles as the zero offset suggests. On the other hand, if we had scrolled down, then the bottom edge of the tiles has just appeared. That's a different offset and we get it by ADDing #7 to the counter value before we AND with #7.

When I first implemented that, something weird happened when scrolling down. Seven out of every eight lines drawn were correct, but the ones that appeared for vertical offset zero were wrong. This didn't make any intuitive sense and it took me a while to figure out: A screen full of tiles would be 12 tiles high, but in general you would be able to see 13 rows of tiles, thanks to the partial tiles at the top and bottom. So the bottom row is usually 12 map rows below the map pointer, but only 11 rows when the vertical offset is zero. I ended up doing a clunky little correction just for offset zero. There might be a better way but I haven't thought of one so far.

As if the vertical scroll wasn't complicated enough already, there are another two cases to consider. The tiles are two bytes wide, meaning the drawing starts on either the left or right hand half of a tile. When it starts with the left half of a tile, there will be 16 tiles across the screen, all drawn in the same kind of way. I call this even drawing. But if we start with the right half of a tile, then there will be three sections: The right half of the first tile, then 15 complete tiles, then the left half of a 17th tile. No prizes for guessing I call this odd drawing.

I handled this by having two drawing routines, one even and one odd, and the one called is determined by the least significant bit of the map column pointer.

Before I get into the drawing, here's a summary of the steps so far:

Scroll up:
  • Move buffer pointer up one line, wrapping if necessary
  • Decrement vertical pixel counter
  • If pixel counter AND #7 = 7 then move map row pointer up one row (and wrap)
  • Calculate the tile image base address: tile zero address + 2 x (pixel counter AND #7)
  • Buffer drawing start address = buffer pointer
  • Map source data = Map row pointer
  • Draw even or odd depending on map column offset LSB

Scroll down:
  • Move buffer pointer down one line, wrapping if necessary
  • Increment vertical pixel counter
  • If pixel counter AND #7 = 0 then move map row pointer down one row (and wrap)
  • Calculate the tile image base address: tile zero address + 2 x ((pixel counter + 7) AND #7)
  • Buffer drawing start address = old buffer pointer (the line above the current pointer)
  • Map source data = Map row pointer + number of visible rows - 1 row
  • Draw even or odd depending on map column offset LSB

Draw some tiles already


The even drawing routine looks something like the code below. On entry, u is the tile image base address and y points to the map row.

        lda map_col      ; map column pointer
        lsra             ; get the offset
        sta coffset      ; store offset directly in instruction

        lda #16          ; draw 16 tile pieces
        sta count        ;

loop
coffset equ *+2          ; variable is part of instruction
        lda <0,y         ; get tile number from map
        ldb #16          ; calculate address of image data
        mul              ;
        ldd d,u          ; load the tile image bytes
        std ,x++         ; store tile bytes in buffer
        cmpx #BUF_END    ; check for end of buffer
        blo nowrap       ; not end of buffer
        leax -BUF_SIZE,x ; adjust back to start of buffer
nowrap lda coffset      ; move to next column in map
        inca             ;
        anda #(MAPWID-1) ; wrap map column
        sta coffset      ;

        dec count        ; do next tile
        bne loop         ;


There isn't all that much to it. I've used some self-modifying code to make life easier. The coffset variable is actually the 8 bit index mode offset used to load the tile number from the map. The < symbol tells the assembler to use an 8-bit offset so the instruction bytes are in the expected layout.

It's assumed that the two bytes of the tile image will never fall on either side of the end of the buffer. This is a safe assumption providing that the buffer size is a multiple of the tile width, and that the first tile is aligned with the start of the buffer.

Odd drawing is a bit more complicated. First there is the right half of the first tile:

        lda map_col      ; map column pointer
        lsra             ; get the offset
        lda a,y          ; get tile number
        ldb #16          ; calculate address of image data
        mul              ;
        ldd d,u          ; load the tile image bytes
        stb ,x+          ; store right half in buffer

It might seem strange to load both bytes of the tile to then throw away the left byte, but this is an easy way of accessing the right byte. If I wanted to load just one byte, then I would first have to add one to the address which would take longer.

After this we need to check and wrap the buffer address and then we can draw 15 tile pieces using the even drawing algorithm above. The only differences are we need to draw 15 pieces instead of 16, and we need to add one to the map column offset before we start.

The left half of the tile at the end is really easy. The registers already have the values we need thanks to the code that drew the 15 tiles:

        lda a,y
        ldb #16
        mul
        lda d,u
        sta ,x

Simples!

Hey, what about horizontal scrolling?


Yeah, I know, we're having so much fun and it's hard to stop, but my fingers are all wore out, what with all that typing and everything. So next time we'll have a closer look at the horizontal scrolling...

Friday, 24 March 2017

Tiles

At the moment I have a means of scrolling but nothing to actually scroll, other than some colourful stripes. Now I do like playing with pretty patterns, probably a little bit more than would be considered 'normal', but that's not getting the job done.

I mentioned tiles before as a way of drawing a background but didn't go into any detail. What are tiles and why do we need them?

Insert groutuitous tile pun here


A tile is simply a small graphic image, that can be drawn on screen with other tiles to build up a larger image. There's a brief discussion on Wikipedia here.

The advantage of using tiles is that it can dramatically reduce the memory requirement of a large image such as a background. This works best for images that contain many repeated elements.

To build the image, we need a map that specifies which tiles go where. This could be a simple two-dimensional array of numbers, with each location in the array corresponding to a location in the image, and the content of each location in the array specifying a particular tile from the tileset.

How big should the tiles be? There is a trade-off to be made between image size and memory use. What I'm trying to achieve is a background that's something like the one from Time Pilot '84, big enough to look interesting, and without eating up too much memory.

The Time Pilot '84 background is pretty big, 2048 x 2048 pixels judging by images I've found online. At an arcade speed of 60fps, it takes over 30 seconds to scroll all the way across, one pixel at a time. If the tiles were 8 x 8 pixels then the tile map would need to be 64K in size. 16 x 16 pixel tiles would bring that down to 16K, still too big as I would like to fit at least a demo into a 32K machine.

How much background could I fit into, say, 4K?

An 8 x 8 tile requires 16 bytes in PMODE1. We need a good selection of tiles to make an interesting image, so let's say 64 tiles for now. That would require 64 x 8 = 1K, leaving 3K for the tilemap, which lets us have an array of 64 x 48 bytes.

So the final background image would be 8 x 64 = 512 pixels wide and 8 x 48 = 384 pixels high. At the intended speed of 25fps, it would take about 20 seconds to scroll all the way across. That sounds like a reasonable size so I will go with that. If later in the project I find I have more memory available I will have the option of making the map bigger.

For initial testing, I generated the tileset and tilemap by hand, just by typing in fcb directives containing very simple patterns. Clearly this is not going to be very practical for anything complicated, so I would need to get hold of some tile editing software.

I looked around, not really sure what I was looking for. Eventually I settled on Tile Studio by Mike Wiering. For me, the killer feature is the scriptable output, allowing auto-generation of source files for any language. The visuals are a bit on the small size, making it less suitable for designing small tiles on HD displays, but other than that, I find it gets the job done and doesn't get in the way of what I want to do.

Of course, it doesn't matter what tools you use as long as you like using them. The best tools are the ones you know how to use. (And the ones you borrowed from work. They're the best ones as well.)

My artistic skills are somewhat limited but I was able to create a nice-looking background fairly quickly:

Cool. But what is it?


I've used around 90 different tiles, meaning I've overshot the budget a little, using up about 4.5K in total. Still pretty impressive though, considering that the uncompressed image would take up 48K of memory.

As the map is intended to wrap around both horizontally and vertically, the features at the edges have to line up with each other, so I've saved myself a bit of work by placing water across the left & right edges.

One problem with tiled landscapes is trying to avoid repeating patterns. For example if I had used the same 'dirt' tile throughout the cratered landscape, it wouldn't look very convincing as the eye easily picks out repeating features. The trick here is to have five or six different dirt tiles and randomly fill the area with them. That breaks up the repetition enough to hide any patterns. Tile Studio has a random fill function for this very purpose and I would imagine other tile editors have it too.

I should probably confess that I didn't draw the craters from scratch. These are simply colour-reduced-messed-about-with versions of something I liked the look of, just to get a quick result. It could all change when I get to spend more time on it.

Tile be back


OK, we now have several boxes of squeaky new tiles piled up in the hallway and there's just the small matter of sticking those bad boys to the wall. You will need adhesive, a spreader, tile cutter and beer. Three of those are metaphors...

Saturday, 18 March 2017

Copy that: The conclusion

If it aint broke, fix it anyway


Sometimes I just can't leave a piece of code alone. I'm thinking there must always be another optimisation just waiting to be found. Quite often I wouldn't find anything new, but on this occasion I would find one of my favourite optimisations ever.

The copy routine that I had developed so far transferred any number of bytes by splitting the task into two parts: Copy big chunks for speed, then copy byte by byte to hit the target count. It was pretty fast, about 10% slower than a hypothetical unrolled version but I wondered if I could get a bit more out of it. If I could grab a few hundred more cycles then that would buy time for an extra sprite or better sound effects.

My attention turned to the byte by byte copy because it was so slow. If it could be sped up enough then the optimal chunk size could be increased and it would go faster still. The code looks like this:

loop
  lda ,u+  ; 6 cycles
  sta ,s+  ; 6
  decb     ; 2
  bne loop ; 3

Copying 32 bytes with this code takes 32*17 = 544 cycles. That's a long time in a game. The 6809 has 16-bit loads and stores, so why not take advantage? The problem is we need an even number of bytes to copy two bytes at a time. When the problem is stated like that, the solution almost suggests itself. The next version of the code deals with the odd byte first, then continues by copying two bytes at a time:

  lsrb       ; 2 cycles
  bcc stage2 ; 3
  lda ,u+    ; 6
  sta ,s+    ; 6
  tstb       ; 2
stage2
  beq done   ; 3
loop
  pulu x     ; 7
  stx ,s++   ; 8
  decb       ; 2
  bne loop   ; 3

It's not quite as easy to figure out the cycle times for this version because the path through the code depends on the number of bytes to be copied. Time for another spreadsheet:




The two-stage code is slower for zero or one byte copied, but after that it's faster, and much faster for higher byte counts. That's cool, I thought to myself, but there's always another optimisation...

It suddenly occurred to me that copying an odd number of words is pretty much the same scenario as copying an odd number of bytes. If that could be evened out, then the way is clear for copying four bytes at a time. I added a third stage to the code:

  lsrb       ; 2 cycles
  bcc stage2 ; 3
  lda ,u+    ; 6
  sta ,s+    ; 6
stage2
  lsrb       ; 2
  bcc stage3 ; 3
  pulu x     ; 7
  stx ,s++   ; 8
  tstb       ; 2
stage3
  beq done   ; 3
loop
  pulu x,y   ; 9
  stx ,s++   ; 8
  sty ,s++   ; 9
  decb       ; 2
  bne loop   ; 3


What does that do to the spreadsheet?




Another nice speed increase. It was at this point I realised a couple of things: One was that adding more stages would copy larger blocks of data ever faster, and the other thing was that I had turned the byte by byte copy problem into one of copying n bytes again, meaning the copy n bytes routine could be implemented more quickly with this multi-stage approach without even thinking about optimal chunk sizes.

I ended up with an eight stage routine, that copied 128 bytes per loop in the final stage, and was going to need a serious bit of number crunching to count the cycles. This time I created a spreadsheet that ran to over 3000 rows. The first few rows are shown below:




What I was trying to find out was the worst case run time for any value of buffer pointer. Recall that the copy routine has to copy the buffer in two goes to work around the end of the buffer, so I've arranged the spreadsheet as matching pairs of rows with the right hand total column showing the cycle count for a complete buffer copy.

The overhead value is the number of cycles taken by the routine even when there is no work to be done. It's the time spent manipulating and testing the byte count plus adjusting the s register when reaching the stage where octet-sized operations can be used for copying. Because the copy routine is run twice, the overhead has to be added twice. (Again, I haven't included any setup time for the copy, to allow a fair comparison with previous cycle counts)

A quick application of the MAX function revealed the worst case time to be 12342 cycles. That is more than 700 cycles faster than the bytes & chunks approach. I was really happy with that. It was hard to believe that such a weird looking bit of code could be so efficient but the numbers proved it. And sure enough, it looked fast too when running.

Interestingly, the best case run time was 12308 cycles, a difference of only 34 cycles. This means that the copy routine has a very consistent run time, which is a good thing to have in a game.

But what is it?


In summary: For an algorithm where the number of loops is not fixed, there is an efficient method of reducing loop overhead by doing an amount of work equal to the weight of each bit in the count variable. The best number of stages to implement is dependent on the average or maximum count required. (No point having the overhead of eight stages if you only want to copy at most 15 bytes)

I've tried to find out what this technique is called but have not been able to find any references. In my mind I call it 'binary loop decomposition' which sounds dangerously like technobabble. Has anyone else encountered anything like this?