PC Engine dev forum

Posted: **Wed Nov 04, 2009 9:45 am**

This is a placeholder for Marchen Maze translation.
Thanks to Dave's awesome script extractor I managed to locate the text output routine. It starts at $87D6. The string "parsing" is at $89E5 ($5C,$5D holds the pointer to the string) and it ends at $8CD8. This routine is called from $E377. The return address is pushed by hand onto the stack and the subroutine is called via a wonderful jmp ($202C) at $E38D. The jump table is at $E390.
Heres the mbr snapshot when the pc is at $89E5:

MPR0: FF
MPR1: F8
MPR2: 08
MPR3: 01
MPR4: 02
MPR5: 03
MPR6: 04
MPR7: 00

Next step : locate string table and font.

Posted: **Wed Nov 04, 2009 8:45 pm**

Good job, hope to hear more good news about it soon !

Posted: **Wed Nov 04, 2009 9:50 pm**

So $5C:5D is set at $8C4B and from this you can find where the pointer table is.
And it is stored at : $11614 ($5614) and it ends $11849 (remember to add $200 to get the file offset in the pce file due to the bloody rom header).
I made a quick test by making the first entry points to the second text and it works.

Now I'll have to find where the font is stored and how it's encoded...

edit: The font already has capital letters. So you can insert translated text by editing the string.
Here's the result with 81 7e 85 85 88 9f 90 88 8b 85 7d 9f 91 91 91 ff

edit: I found the first character of the font but the second one seems to be compressed

Posted: **Mon Nov 09, 2009 8:49 pm**

The are 4 tiles compression methods:

Direct transfer to VRAM via the tia instruction
RLE compression
RLE + XOR
Copy to VRAM using a loop

Code: Select all

f796:   LDA <$7d
        ORA <$7e
        BEQ f7e1
        JSR f888
        CMP #$00
        BNE f7ac
                ; #1
                TIA $3700, $0002, $0020
                BRA f7d2
f7ac:   CMP #$02
        BNE f7bc
                ; #2
                JSR f817
                TIA $3700, $0002, $0020
                BRA f7d2
f7bc:   CMP #$03
        BNE f7cf
                ; #3
                JSR f817
                JSR f857
                TIA $3700, $0002, $0020
                BRA f7d2
f7cf: ; #4
        JSR f803

f7d2:   LDA <$7d
        SBC #$01
        STA <$7d
        LDA <$7e
        SBC #$00
        STA <$7e
        BRA f796

f7e1:

The code above is more or less a simple switch/case.
The encoding type is store in 2 bits. It's packed in a byte. They are stored starting at $B0A3 (file offset $2f2a3). f888 is used to extract and store them in A.

Code: Select all

f888:   LDX <$7f
        CPX #$04
        BNE f897
                INC <$7a
                BNE f894
                        INC <$7b
f894:           STZ <$7f
                CLX
f897:   LDA [$7a]
f899:   DEX
        BMI f8a1
                ROR A
                ROR A
                JMP f899
f8a1:   AND #$03
        INC <$7f
        RTS

The "RLE" routine :

Code: Select all

f817:   LDA [$78]
        STA <$83
        LDY #$04
        LDA #$09
        STA <$81
        STZ <$80
        CLX 
        LDA #$20
        STA <$82
        
f828:   DEC <$81
        BNE f83a
        PHY 
        LDA #$08
        STA <$81
        INC <$80
        LDY <$80
        LDA [$78], Y
        STA <$83
        PLY 

f83a:   ROR <$83 ; the rle counter
        BCS f849
        STZ $3740, X
        INX 
        DEC <$82
        BNE f828
        JMP f7f6

f849:   LDA [$78], Y
        STA $3740, X
        INY 
        INX 
        DEC <$82
        BNE f828
        JMP f7f6

The "XOR" routine:

Code: Select all

f857:   PHX 
        PHY 
        LDY #$07
        CLX 
        
f85c:   LDA $3740, X
        EOR $3742, X
        STA $3742, X
        LDA $3741, X
        EOR $3743, X
        STA $3743, X
        LDA $3750, X
        EOR $3752, X
        STA $3752, X
        LDA $3751, X
        EOR $3753, X
        STA $3753, X
        
        INX 
        INX 
        
        DEY 
        BNE f85c
        
        PLY 
        PLX 
        RTS

And last but not least, the copy to VRAM loop:

Code: Select all

f803:   CLY     
f804:   LDA [$78], Y
        STA $0002
        INY 
        LDA [$78], Y
        STA $0003
        INY 
        CPY #$20
        BNE f804
        JMP f7f6

For the curious ones here's f7f6:

Code: Select all

f7f6:   TYA     
        CLC 
        ADC <$78
        STA <$78
        CLA 
        ADC <$79
        STA <$79
        CLY 
        RTS

You can find an example of RLE+XOR data at file offset $2ec0b.

More on next post

Posted: **Tue Nov 10, 2009 4:47 pm**

what wonderful label names.

Posted: **Tue Nov 10, 2009 9:53 pm**

Isn't it?

I managed to insert lower case characters. The trick is to avoid any compressed character. They are easy to identify. There's usually a four bytes header and the size of the "pattern" is less than 0x20 bytes. Maybe that's enough to start working on the insertion script?

Posted: **Wed Nov 11, 2009 2:24 pm**

Oops I forgot to attach the script (best viewed with shiftjis)

Posted: **Sun Nov 22, 2009 10:08 pm**

Here's a short update.
The first 10 strings are used in the introduction. I wrote a small program that creates an IPS patch with the lower font, the strings and the table.
Unfortunately the final translate script may be too big. So I think I'm good for some DTE stuffs.
Tomaitheous discovered some free space. 899 bytes free starting at $1e5d (that includes the rom header offset). The whole dual char text compression may go there.

I attached a dummy script for the intro to give you an idea of the screen space occupancy.

Posted: **Sun Nov 29, 2009 9:43 pm**

Ok people, I have the bpe encoder and decoder ready. The bpe encoder still need to be integrated into the insertion software.
If you want some academical reference for byte pair encoding (bpe) you could check this paper :
Byte Pair Encoding: A Text Compression Scheme That Accelerates Pattern Matching (1999)
Or wikipedia.

If you compile bpe.c with BPE_DEBUG, it will compress the string ABCDEFBECEDFABBEAFDACEBFDABABEDFCFEDCAFBEBDBCABAB and output the dictionary, the decompressed string and various infos on the standard output (I know I'm lazy).
And bpe_asm is a simple program that decompress the text into the BSS ($2000). Nothing is displayed on screen. Use mednafen debugger/memory viewer for more fun.
I added the infamous Lorem ipsum in bpe.c but commented it. It's 4046 bytes long. So beware

Posted: **Mon Nov 30, 2009 10:21 pm**

Here's the commented font display routine. It's approximatively 210 bytes long. As the bpe decoding is only 48 bytes long, I think it can fit there as we'll remove the accentuation code (#$fe and #$fb) and rewrite the rest.
But that's tomorrow job

[edit] Ok here it is... I still need to test it.

PC Engine dev forum

[Translation] Marchen Maze

[Translation] Marchen Maze

Re: Marchen Maze translation

Re: Marchen Maze translation

Re: Marchen Maze translation

Re: Marchen Maze translation

Re: Marchen Maze translation

Re: Marchen Maze translation

Re: Marchen Maze translation

Re: Marchen Maze translation

Re: Marchen Maze translation