PC relative coding

tomaitheous · Post by **tomaitheous** » Tue Mar 30, 2010 6:00 am

That's right. PC relative coding on the PCE.

For those unfamiliar or forgot or whatever, PC relative code is code that uses all branch, jump, jsr's relative to the PC. Why? So the code can be placed anywhere in memory and execute the same regardless of where it is in the PC address range. Normally, this is important for computers. Computers, large and small, dynamically load code or runtimes in free areas of memory. If the area is dynamic, you can't use absolute address for branchs, jumps, and calls.

The HuC6280 kinda lacks this. I say kinda, but it actually does have the two instructions needed just for this type of environment; BSR and BRA. Both take a signed offset to add the to current PC. Problem is, it's only 8 bit. And 8bit signed is only -/+127 bytes. That's not terribly useful.

Secondly, you might be asking why would this even be a concern or need on the PCE? Well, easy. Memory is mapped in and out into 8k pages. Similar to loading and unloading code on a PC (note: I said similar, not exact). So the concept, or rather, the end result is desirable for one reason. Being able to map in an 8k bank into any free MPR and start executing code. Normally, you have to know what MPR to map bank to because you have a hardoffset in the CPUs logical address range, and this corresponds to that.

That got me thinking. What if there were a way to write PC relative code on the PCE? So, here are a few ideas I played around with...

a very generic method:

pseudo:

Code: Select all

 Jmp offset+PC

real:

Code: Select all

 jsr $+3  ; <- $ is the current PC.
 pla
 clc
 adc #low(offset)
 sta <zp
 pla 
 adc #high(offset)
 sta <zp+1
 jmp [zp]  ; <- the full address of ZP. You could substitute ABS for ZP as the cost of +2 cycles.

Another method, depending on how many long jumps you plan to have, is to store the address in 13bit format in a section of ram.
When you call and map the bank into the corresponding page, manually OR a table entry with the upper 3bits to form the correct
address in the table. You're basically calculating the effective address to the corresponding bank.

You'd do this one time for each MPR change of the bank (so for say a certain level, you want the bank mapped to MPR3 your alter it
once at the very beginning. I.e. initialize it and be done with it. If you ever remap it to another MPR range, you'd have to redo the
table).

It'd look something like this

pseudo:

Code: Select all

 jmp $offset+PC

real:

Code: Select all

 ldx #$nn ; <- this is the table entry for the bank
 jmp [table,x]

Also note, you're not limited to 128 "offsets". You can have more than one table. But to be honest, how many jmp instructions do you think
you'd have in a single 8k bank of relative code?

One optimization is to have multiple precalculated tables and you simply copy them into ram when reconfiguring for a different MPR range.
This is based on a rom setup. CDRAM of course could be a optimized somewhat. JMP [$nnnn,x] could be replaced with normal JMP, assuming
you have the of what opcodes to change. Again, preprocessing the code before usage.

So far, these are just relative jumps. What about relative JSRs?

Like the original method, here would be the JSR version:

pseudo:

Code: Select all

 Jsr offset+PC

real:

Code: Select all

 jsr .skip  ;<- automatically jumps to .skip label 
 bra .end  ;<- but this is the return address
.skip
 jsr $+3    ; <- $ is the current PC.
 pla
 clc
 adc #low(offset)
 sta <zp
 pla 
 adc #high(offset)
 sta <zp+1
 jmp [zp]  ; <- the full address of ZP. You could substitute ABS for ZP as the cost of +2 cycles.
.end

Now version based on the second method:

Code: Select all

 pseudo:
 jsr $offset+PC

Code: Select all

 real:
 jsr .skip
 bra .end
.skip
 ldx #$nn ; <- this is the table entry for the bank
 jmp [table,x]
.end

Obviously these would be in the form of macros (I sure wouldn't want to type that out).

The first method has the advantage that it requires no pre-processing. It's slower though. You'd have to be more resourceful with BSR and BRA as much
as you can to cut back on the macro usage. But for code that's not particular to speed, but more to flexibility - this is a definite solutions. The second method it a bit more complex because it requires a preprocessor each time your change the location of where the bank of code is mapped to (different location, same location requires no preprocessor). The benefit though, as you can see, is an increase in speed for the pseudo instructions.
Also keep in mind that if you need to preserve either A or X on these relative branches or calls, you'll need additional overhead for that. Most of the time I don't . And now for the really bizarre. Being this code existing on the "cart" side of the address range, you could have an external device monitor and track which opcodes are being fetched - and modified them on the fly (basically effective addressing by external means). But that's extreme and highly unlikely anyone would build such a device.

Anyway, this doesn't have a wide applicable/application range. The need for something like this would be pretty specific, but I find that the more I optimize and the more I tend to reserve off sections of the logical address range - the more I need to substitute some functionality similar to this. And for the record, the external hardware method is the best, so it's only obvious and fair that no one will ever use that method on the PCE. Another missed opportunity be NEC for the system card 3.0 upgrade

.

So, with the above describe... what are some of your ideas/implementations of how you'd go about doing this (or something very similar)? Let the voodoo code pasting begin

Just a few more ideas/methods:

pseudo:

Code: Select all

 jmp $offset+pc

real:

Code: Select all

 lda #low(offset)
 sta <zp
 lda #high(offset)
 ora <zp  ;<- a special EA value you could load once for the preprocessor. Super tiny overhead on the preprocessor side, little bit more over head on operation side.
 sta <zp+1
 jmp [zp]  ;<- again, you use the full absolute address of the ZP address

Not too bad of a process. Kind of a middle ground between the two above.

pseudo:

Code: Select all

 jsr $offset+pc

real:

Code: Select all

 jsr .skip
 bra .end
.skip
 lda #low(offset)
 sta <zp
 lda #high(offset)
 ora <zp  ;<- a special EA value you could load once for the preprocessor. Super tiny overhead on the preprocessor side, little bit more over head on operation side.
 sta <zp+1
 jmp [zp]  ;<- again, you use the full absolute address of the ZP address
.end

Edit: I'll update some macros for this in a bit. But it just dawned on me, that since you're doing this method with JSR, you know have a JSR.r [indirect] and JSR.r [indirect,x] pseudo instructions with just a slight detection in the macro logic (".r" meaning relative addressing. Left over mannerisms of other assemblers >_> ).