VRAM Transfer Speed (and scd ram under HuC)

hu, cd, scd, acd, supergrafx discussions.
Orion_
Posts: 5
Joined: Mon Jun 15, 2009 9:06 am

VRAM Transfer Speed (and scd ram under HuC)

Post by Orion_ »

Hi there,

I'm a bit new to PC Engine programming but I actually have a lots of fun with it
I did a little HiColor pictures slideshow using HuC and CD and I'm loading pictures from the CD with the "cd_loadvram" function.
I load about 59,5k in VRAM each time and that's really slow on the real hardware (about 8 seconds !)
I tried to do the same from ASM scratch with my own lib and trying to first load as much as I can straight in ram and then transfer to the vram using "tia" but it's seems as slow as "cd_loadvram" (using ootake emulator with "real CD speed" option)
So I guess this is a vram speed problem ?
Can the VDC DMA be used to transfer from RAM to VRAM or only VRAM to VRAM ? Is it a lot faster ?
I started to write a routine to load CD data to SCD RAM because HuC cannot really let you access the SCD RAM (I found out that HuC limit code to 8k, const data to 8k, and RAM data to 5k located in the pce main ram, not the scd ram)
I just now need to load the scd ram data to vram using dma.
tomaitheous
Posts: 88
Joined: Mon Jun 23, 2008 1:58 pm

Re: VRAM Transfer Speed (and scd ram under HuC)

Post by tomaitheous »

1) *Any* pointer or array access with HuC is *dead* slow. REALLY slow. It treats near data of scratchpad ram or constant bank as far data :x 2) Yeah, that *code* limit of 8k is lame. It originates from PCEAS, so it extends to HuC.

VDC DMA doesn't touch the CPU address range. Only VRAM. So VRAM to VRAM only. I'm not sure what functions you're using, but natively you can write to VRAM at any point of the display - vblank or active. TIA takes 7 cycles a byte (when $0000-$07ff is the destination) so it shouldn't have any problems copying to vram in a timely manner (about 4 frames or 6.6ms to do it).

I know HuC passes arguments for *some/most* CD routines directly to the system card, but even then the system card read/write routines are very optimized. Later games from developers started using there own CD read routines for faster access. IIRC, I think I clocked the original CD_READ function of the system card at about ~90k a second. If it's taking 8 seconds to load ~60k in sequential segments to vram, then it might be something with HuC reading in 2k at a time and transfering (i.e. multiple calls to CD_READ) - stacking up the seek time with each call. You should check out the CD_READ parameters and try call it manually to see if that speeds it up (or trace through your HuC call in mednafen and see what it's doing).

Either way, HuC needs some *serious* work done to it.
Orion_
Posts: 5
Joined: Mon Jun 15, 2009 9:06 am

Re: VRAM Transfer Speed (and scd ram under HuC)

Post by Orion_ »

the HuC code for "cd_loadvram" function is pretty small, only one call to cd_read with parameters to read and write to vram directly, so I guess this is the bios who is slow for at least this function
I will try loading stuff using my routine which will do 3 cd seek max, and burn a cd again to test that
Orion_
Posts: 5
Joined: Mon Jun 15, 2009 9:06 am

Re: VRAM Transfer Speed (and scd ram under HuC)

Post by Orion_ »

ok, I finally finished the routine and it now load pretty fast on real hardware !! like 4 time faster ! :)
here is the routine if you are interested, it just replace the cd_loadvram function, by loading chunk of 24k of data in SCD Ram, and transfer to VRAM
I think the bios was loading very small chunk of data, causing lots of cd seek and then slowing down the whole process.

http://onorisoft.info/pce/FastVRamCDLoad.txt

also, I made a little HiColor Picture SlideShow, with 2 hidden boot program allowing BRAM View (files and raw)
and a RAM Manager, allowing View/Edit and Run program ! it can be useful to test little programs by entering them in ram with the pad and then run it on real hardware. hope it can be useful to anyone. (I did that because I don't have flashcard and I don't want to burn a cd each time I want to test something ^^)

http://onorisoft.info/pce/reunion.zip
tomaitheous
Posts: 88
Joined: Mon Jun 23, 2008 1:58 pm

Re: VRAM Transfer Speed (and scd ram under HuC)

Post by tomaitheous »

Wow, those are some awesome pic conversions :)

Curious though, since you seem pretty comfortable with ASM (and self modifying code), why use HuC?
Orion_
Posts: 5
Joined: Mon Jun 15, 2009 9:06 am

Re: VRAM Transfer Speed (and scd ram under HuC)

Post by Orion_ »

well, asm don't scare me anymore, I did about 3 years of pure 68k and GPU asm to do games on the Atari Jaguar console, but ... 8bits asm is really scary ... what can you do with only three 8bits regs ... I mean, to do some little routines it's ok, but to make a complete game, it's too hard for me, I prefer C even if HuC is very slow, I can code very fast using it, and I don't want to have headaches trying to do things using the little 8bits regs or using tons of macro to overcome the 8bits limitation :)
I tried to make my own libs from scratch, it was fun, but making a complete thing with it scares me before starting.
tomaitheous
Posts: 88
Joined: Mon Jun 23, 2008 1:58 pm

Re: VRAM Transfer Speed (and scd ram under HuC)

Post by tomaitheous »

Hey - you could try out this. Later SCD games started using their own libs to speed up CD read functions. It bypasses the system card and directly access the CD base ports/hardware. I ripped that from Seiya Monogatari. Just map the bank to MPR #2 and call the CD_VDC_DMA. It'll read and copy directly to the video port. And it's fast too. Like the comments at the beginning say, $10/11 need to point to your argument string. The first two bytes are the LBA address and the second two bytes are the n number of bytes to copy. $fa/$fb is the vram address.
Orion_ wrote: I don't want to have headaches trying to do things using the little 8bits regs or using tons of macro to overcome the 8bits limitation :)
Ehh? 8bit limitation? It's just logic, really. I have no problem doing 16bit or higher arithmetic with the processor, though I tend to keep things 8bit if possible for speed reasons. Sure, you only have 3 8bit regs - but it's an Accumulator based processor and what about all those address regs? 128 16bit address is nothing to sneeze at. And free indexing with X/Y regs makes for some really nice/fast optimizations. The 68k is cake to code for, but the 65x series is pretty easy too.
I mean, to do some little routines it's ok, but to make a complete game, it's too hard for me
But.. it's the same design/layout regardless of the processor. A complete game, like any other, is just a bunch of subroutines. No different on the 65x, to the z80 or 68k or x86, etc. The only real difference I can see is that you'll have "far data" and "far code" calls VS a linear code/data address range of the 68k.

Anyway, if you plan on using HuC - I'd highly recommend you find away around the super slow array/pointer problem. It's not just very slow, it's crippling slow. Or maybe modify HuC's source. There's no reason why accessing a byte/word from an array in static/constant mapped ram should take like 150+ cycles. Even BASIC interpreter is faster than that. Not that far data should be that slow either, but scratchpad ram is more important
Orion_
Posts: 5
Joined: Mon Jun 15, 2009 9:06 am

Re: VRAM Transfer Speed (and scd ram under HuC)

Post by Orion_ »

well, after understanding the macros in macro.inc of mkit I did find coding a little easier for 16bits operation for example
now, the cool thing about HuC is that, integrating asm inside is damn easy, so, making the structure in C, and the things that need to be fast in asm can do the trick
and I think I going to do that, just like I did with my fast cd load function, which takes advantage of the SCD RAM that HuC doesn't allow access to
when I saw that HuC actually converted my "while (1) { }" instruction into a test of 1 == 1 ? I just replaced it by an asm "jmp loop" ^^
I think it's like xav who is making sonic in C and taking your asm routines for special effect thing that need speed ;)
tomaitheous
Posts: 88
Joined: Mon Jun 23, 2008 1:58 pm

Re: VRAM Transfer Speed (and scd ram under HuC)

Post by tomaitheous »

Orion_ wrote:well, after understanding the macros in macro.inc of mkit I did find coding a little easier for 16bits operation for example
now, the cool thing about HuC is that, integrating asm inside is damn easy, so, making the structure in C, and the things that need to be fast in asm can do the trick
and I think I going to do that, just like I did with my fast cd load function, which takes advantage of the SCD RAM that HuC doesn't allow access to
when I saw that HuC actually converted my "while (1) { }" instruction into a test of 1 == 1 ? I just replaced it by an asm "jmp loop" ^^
I think it's like xav who is making sonic in C and taking your asm routines for special effect thing that need speed ;)
Ohh, I not apposed to using C. C+ASM would be fairly decent, actually. We just need a new C compiler or really fix up HuC. Make the library more dynamic. Something where you can easily remove lots default video/sound/etc functions. Anyway, you should definitely pop in over at #utopiasoft on IRC (efnet server). It's a PCE coders channel.

Also, with those awesome hicolor/highres pics - they'd benefit nicely with a blur routine. Use an Hsync interrupt to offset even/odd scanlines by 1 pixel every other frame. Even scanlines on one frame, odd scalines on the next. The scanlines not being offset/shifted by one pixel, should be keep at 0 offset or unaltered. Looks really nice, especially on a real TV and in high res mode.
tomaitheous
Posts: 88
Joined: Mon Jun 23, 2008 1:58 pm

Re: VRAM Transfer Speed (and scd ram under HuC)

Post by tomaitheous »

Here's an example of h-int hblur:
http://pcedev.net/cat/cat.pce <- without blur
http://pcedev.net/cat/cat2.pce <- with blur

So far, the only emu's that run the hblur demo are mednafen and ME. If the emu drops a frame, you'll definitely notice it. Good thing the real system doesn't ;)
Post Reply