Page 1 of 4
[Translation] Marchen Maze
Posted: Wed Nov 04, 2009 9:45 am
by MooZ
This is a placeholder for Marchen Maze translation.
Thanks to Dave's awesome script extractor I managed to locate the text output routine. It starts at $87D6. The string "parsing" is at $89E5 ($5C,$5D holds the pointer to the string) and it ends at $8CD8. This routine is called from $E377. The return address is pushed by hand onto the stack and the subroutine is called via a wonderful jmp ($202C) at $E38D. The jump table is at $E390.
Heres the mbr snapshot when the pc is at $89E5:
- MPR0: FF
- MPR1: F8
- MPR2: 08
- MPR3: 01
- MPR4: 02
- MPR5: 03
- MPR6: 04
- MPR7: 00
Next step : locate string table and font.
Re: Marchen Maze translation
Posted: Wed Nov 04, 2009 8:45 pm
by peperocket
Good job, hope to hear more good news about it soon !
Re: Marchen Maze translation
Posted: Wed Nov 04, 2009 9:50 pm
by MooZ
So $5C:5D is set at $8C4B and from this you can find where the pointer table is.
And it is stored at : $11614 ($5614) and it ends $11849 (remember to add $200 to get the file offset in the pce file due to the bloody rom header).
I made a quick test by making the first entry points to the second text and it works.
Now I'll have to find where the font is stored and how it's encoded...
edit: The font already has capital letters. So you can insert translated text by editing the string.
Here's the result with 81 7e 85 85 88 9f 90 88 8b 85 7d 9f 91 91 91 ff
edit: I found the first character of the font but the second one seems to be compressed
Re: Marchen Maze translation
Posted: Mon Nov 09, 2009 8:49 pm
by MooZ
The are 4 tiles compression methods:
- Direct transfer to VRAM via the tia instruction
- RLE compression
- RLE + XOR
- Copy to VRAM using a loop
Code: Select all
f796: LDA <$7d
ORA <$7e
BEQ f7e1
JSR f888
CMP #$00
BNE f7ac
; #1
TIA $3700, $0002, $0020
BRA f7d2
f7ac: CMP #$02
BNE f7bc
; #2
JSR f817
TIA $3700, $0002, $0020
BRA f7d2
f7bc: CMP #$03
BNE f7cf
; #3
JSR f817
JSR f857
TIA $3700, $0002, $0020
BRA f7d2
f7cf: ; #4
JSR f803
f7d2: LDA <$7d
SBC #$01
STA <$7d
LDA <$7e
SBC #$00
STA <$7e
BRA f796
f7e1:
The code above is more or less a simple switch/case.
The encoding type is store in 2 bits. It's packed in a byte. They are stored starting at $B0A3 (file offset $2f2a3). f888 is used to extract and store them in A.
Code: Select all
f888: LDX <$7f
CPX #$04
BNE f897
INC <$7a
BNE f894
INC <$7b
f894: STZ <$7f
CLX
f897: LDA [$7a]
f899: DEX
BMI f8a1
ROR A
ROR A
JMP f899
f8a1: AND #$03
INC <$7f
RTS
The "RLE" routine :
Code: Select all
f817: LDA [$78]
STA <$83
LDY #$04
LDA #$09
STA <$81
STZ <$80
CLX
LDA #$20
STA <$82
f828: DEC <$81
BNE f83a
PHY
LDA #$08
STA <$81
INC <$80
LDY <$80
LDA [$78], Y
STA <$83
PLY
f83a: ROR <$83 ; the rle counter
BCS f849
STZ $3740, X
INX
DEC <$82
BNE f828
JMP f7f6
f849: LDA [$78], Y
STA $3740, X
INY
INX
DEC <$82
BNE f828
JMP f7f6
The "XOR" routine:
Code: Select all
f857: PHX
PHY
LDY #$07
CLX
f85c: LDA $3740, X
EOR $3742, X
STA $3742, X
LDA $3741, X
EOR $3743, X
STA $3743, X
LDA $3750, X
EOR $3752, X
STA $3752, X
LDA $3751, X
EOR $3753, X
STA $3753, X
INX
INX
DEY
BNE f85c
PLY
PLX
RTS
And last but not least, the copy to VRAM loop:
Code: Select all
f803: CLY
f804: LDA [$78], Y
STA $0002
INY
LDA [$78], Y
STA $0003
INY
CPY #$20
BNE f804
JMP f7f6
For the curious ones here's f7f6:
Code: Select all
f7f6: TYA
CLC
ADC <$78
STA <$78
CLA
ADC <$79
STA <$79
CLY
RTS
You can find an example of RLE+XOR data at file offset $2ec0b.
More on next post
Re: Marchen Maze translation
Posted: Tue Nov 10, 2009 4:47 pm
by Gravis
what wonderful label names.
Re: Marchen Maze translation
Posted: Tue Nov 10, 2009 9:53 pm
by MooZ
Isn't it?
I managed to insert lower case characters. The trick is to avoid any compressed character. They are easy to identify. There's usually a four bytes header and the size of the "pattern" is less than 0x20 bytes. Maybe that's enough to start working on the insertion script?
Re: Marchen Maze translation
Posted: Wed Nov 11, 2009 2:24 pm
by MooZ
Oops I forgot to attach the script (best viewed with shiftjis)
Re: Marchen Maze translation
Posted: Sun Nov 22, 2009 10:08 pm
by MooZ
Here's a short update.
The first 10 strings are used in the introduction. I wrote a small program that creates an IPS patch with the lower font, the strings and the table.
Unfortunately the final translate script may be too big. So I think I'm good for some DTE stuffs.
Tomaitheous discovered some free space. 899 bytes free starting at $1e5d (that includes the rom header offset). The whole dual char text compression may go there.
I attached a dummy script for the intro to give you an idea of the screen space occupancy.
Re: Marchen Maze translation
Posted: Sun Nov 29, 2009 9:43 pm
by MooZ
Ok people, I have the bpe encoder and decoder ready. The bpe encoder still need to be integrated into the insertion software.
If you want some academical reference for byte pair encoding (bpe) you could check this paper :
Byte Pair Encoding: A Text Compression Scheme That Accelerates Pattern Matching (1999)
Or wikipedia.
If you compile bpe.c with BPE_DEBUG, it will compress the string
ABCDEFBECEDFABBEAFDACEBFDABABEDFCFEDCAFBEBDBCABAB and output the dictionary, the decompressed string and various infos on the standard output (I know I'm lazy).
And bpe_asm is a simple program that decompress the text into the BSS ($2000). Nothing is displayed on screen. Use mednafen debugger/memory viewer for more fun.
I added the infamous Lorem ipsum in bpe.c but commented it. It's 4046 bytes long. So beware
Re: Marchen Maze translation
Posted: Mon Nov 30, 2009 10:21 pm
by MooZ
Here's the commented font display routine. It's approximatively 210 bytes long. As the bpe decoding is only 48 bytes long, I think it can fit there as we'll remove the accentuation code (#$fe and #$fb) and rewrite the rest.
But that's tomorrow job
[edit] Ok here it is... I still need to test it.