News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Double buffer in mode 13H

Started by popcalent, February 07, 2024, 04:46:28 PM

Previous topic - Next topic

popcalent

Hello, all!

I'm programming in assembly for MSDOS, using TASM, and I'm trying to write a couple of subroutines that put a pixel in a buffer, and then dump the buffer on video memory. But it's not working. I can put pixel directly on video memory, but this is very slow when I have to render a big sprite or a picture. If I use the buffer, no pixel shows up on screen.

I declared the buffer after the data segment and before the code segment like this:
SBUFFER SEGMENT
  SCREEN_BUFFER DB 64000 DUP(?)
  SBUFFER ENDS

This is the subroutine that puts a pixel in the buffer:
;########################################################
;# SUBROUTINE: PUT_PIXEL_IN_BUFFER
;#
;# EXAMPLE: YELLOW PIXEL AT X=40, Y=20
;#
;#      MOV    MODE13H_X, 40
;#      MOV    MODE13H_Y, 20
;#      MOV    MODE13H_COLOR, 12
;#
;########################################################
PUT_PIXEL_IN_BUFFER PROC NEAR     
        PUSH    AX
        PUSH    BX
        PUSH    ES
        PUSH    DI
        MOV    AX, SEG SBUFFER
        MOV    ES, AX
        ASSUME  ES:SBUFFER
        MOV    DI, OFFSET SCREEN_BUFFER
        MOV    AX, MODE13H_Y
        MOV    BX, MODE13H_Y
        SHL    AX, 8
        SHL    BX, 6
        ADD    AX, BX
        ADD    AX, MODE13H_X
        ADD    DI, AX
        MOV    AL, MODE13H_COLOR
        MOV    ES:[DI], AL
        POP    DI
        POP    ES
        POP    BX
        POP    AX
        RET
        PUT_PIXEL_IN_BUFFER ENDP

Next is the subroutine that dumps the buffer on video memory:

;########################################################
;# SUBROUTINE: DUMP_VIDEO_BUFFER
;########################################################
DUMP_VIDEO_BUFFER PROC NEAR
        PUSH    AX
        PUSH    BX
        PUSH    DS
        PUSH    ES
        PUSH    SI
        PUSH    DI
        ;=============
        MOV    AX, 0A000H
        MOV    DS, AX
        MOV    AX, SEG SBUFFER
        MOV    ES, AX
        ASSUME  ES:SBUFFER
        MOV    DI, OFFSET SCREEN_BUFFER
        MOV    SI, 0
        ADD    DI, 9600
        ADD    SI, 9600
        ;=============
        DUMP_VIDEO_BUFFER_:
                MOV    AL, ES:[DI]
                CMP    AL, 16
                JE      NO_DUMP
                MOV    DS:[SI], AL
        NO_DUMP:
                INC    SI
                INC    DI
                CMP    SI, 0FA00H
                JL      DUMP_VIDEO_BUFFER_
        ;=============
        POP    DI
        POP    SI
        POP    ES
        POP    DS
        POP    BX
        POP    AX
        RET
        DUMP_VIDEO_BUFFER ENDP

Finally, I have a subroutine that waits for the end of a CRT trace:

;#################################################
;# WAIT FOR NEW VERTICAL RETRACE
;#################################################
WAIT_VR PROC    NEAR
        PUSH    AX
        PUSH    DX
        MOV    DX, 3DAH
        ;WAIT FOR BIT 3 TO BE ZERO
        WAIT_END_OLDVR:
                IN      AL, DX
                TEST    AL, 08H
                JNZ    WAIT_END_OLDVR
        ;WAIT FOR BIT 3 TO BE ONE
        WAIT_BEGIN_NEWVR:
                IN      AL, DX
                TEST    AL, 08H
                JZ      WAIT_BEGIN_NEWVR
        POP    DX
        POP    AX
        RET
        WAIT_VR ENDP

And this is how I call the subroutines:

        MOV    MODE13H_X, 40
        MOV    MODE13H_Y, 20
        MOV    MODE13H_COLOR, 12
        CALL    PUT_PIXEL_IN_VIDEO_BUFFER
        CALL    WAIT_VR
        CALL    DUMP_VIDEO_BUFFER

The result is that nothing shows up on screen. I've been troubleshooting this for hours now, and can't find what I'm doing wrong. I'm hoping a good samaritan will spot the mistake in my code... Thanks a lot!

NoCforMe

Heh; just so happens I was looking at some of my old DOS code when I came across this. Ah, the goodbad old days of writing directly to screen memory ...

Your code looks fine. That is, until I did some arithmetic on your values. First of all, I'm wondering why you're offsetting your reads and writes to memory (both source and dest) by 9600. Also mystified by your left-shifting the same y-value by 8 and 6 and then adding them. I can only assume you know what you're doing there.

Taking your x- and y-values:

20 << 8 = 5120
20 << 6 = 1280
            40 +
---------------
result:   6440


That might explain it; you're skipping right over the data you wrote into your buffer by starting at offset 9600.

I can't see anything else that looks wrong. Not sure about your vertical-retrace code; have you tried it without that? I assume that's to avoid flicker, yes?
Assembly language programming should be fun. That's why I do it.

_japheth

One thing that strikes pretty instantly is this piece of code:

        NO_DUMP:
                INC    SI
                INC    DI
                CMP    SI, 0FA00H
                JL      DUMP_VIDEO_BUFFER_

JL is used for signed compares - and 0FA00h in 16 bit is a negative number. So your little loop will probably end just after one byte has been transfered. Better is to use JB.
Dummheit, gepaart mit Dreistigkeit - eine furchtbare Macht.

popcalent

Quote from: NoCforMe on February 07, 2024, 06:36:11 PMFirst of all, I'm wondering why you're offsetting your reads and writes to memory (both source and dest) by 9600. Also mystified by your left-shifting the same y-value by 8 and 6 and then adding them. I can only assume you know what you're doing there.
I wrote this code 20 years ago, and perhaps it worked with other macros or code I don't have any more... I don't remember what the 9600 is... The left shifting is just to multiply the y coordinate times 320.

Quote from: _japheth on February 07, 2024, 07:35:33 PMOne thing that strikes pretty instantly is this piece of code:
JL is used for signed compares - and 0FA00h in 16 bit is a negative number. So your little loop will probably end just after one byte has been transfered. Better is to use JB.
Corrected. Thanks.

Ok. So I took away the two instructions that added 9600 to both SI and DI and it works now. It's still slow. It takes a little less than a second to dump the buffer (plus the time prior to that to put the sprites in the buffer). It's better than before, because it dumped every individual sprite one after the other, and now it does the whole screen at once. But still slow...

I'm blown away by very old msdos games that, when you moved to the next room, the screen changed instantly...

popcalent

I fixed the code using the instruction MOVS, instead of moving pixel by pixel. In the original code, the CMP AL, 16 was meant to skip black pixels, but I got rid of it. I also disabled interrupts while the buffer is being copied. This is the new code (I skipped the pushes and the pops):
        CLI
        MOV     AX, 0A000H
        MOV     ES, AX                 ; Set ES to the video memory segment
        MOV     AX, SEG SBUFFER
        MOV     DS, AX                 ; Set DS to the segment of SBUFFER
        MOV     DI, OFFSET SCREEN_BUFFER  ; Set DI to the destination address in video memory
        MOV     SI, 0                  ; Set SI to the source index

        DUMP_VIDEO_BUFFER_:
                MOVS    BYTE PTR ES:[DI], BYTE PTR DS:[SI]  ;This also increments SI and DI
                CMP     SI, 0FA00H
                JB      DUMP_VIDEO_BUFFER_

        STI

Now there's no lag or flickering. The only problem is that my subroutine to put a sprite in the buffer, goes pixel by pixel, so there's a wait before I can see the image while the sprite is being transferred to the buffer.

This is the code of the subroutine that puts a sprite in the buffer:

;#################################################
;# SUBROUTINE: PRINT_256SPRITE
;#
;# EXAMPLE:
;#      MOV     MODE13H_DELETE, 1 (TO DELETE)
;#      MOV     MODE13H_X, 100
;#      MOV     MODE13H_Y, 105
;#      LEA     DI, SPRITE_ADDRESS
;#      CALL    WAIT_VR
;#      CALL    PRINT_256SPRITE
;#################################################
PRINT_256SPRITE PROC NEAR
        PUSH    AX
        PUSH    BX
        PUSH    CX
        PUSH    DX
        PUSH    DI

        MOV     CX, MODE13H_X  ;XCOORD
        MOV     DX, CX
        MOV     BX, MODE13H_Y ;YCOORD

        PRINT_256SPRITE_LOOP:
                CMP     BYTE PTR [DI],'$'       ;CHECK END OF SPRITE
                JE      PRINT_256SPRITE_END

                MOV     AL, BYTE PTR [DI]       ;GET PIXEL COLOR
                MOV     MODE13H_COLOR, AL
                CALL    PUT_PIXEL_IN_BUFFER     ;PUT PIXEL IN BUFFER

                INC     DI
                INC     MODE13H_X
                CMP     BYTE PTR [DI], '#'      ;CHECK END OF LINE
                JNE     PRINT_256SPRITE_LOOP
                MOV     MODE13H_X, CX           ;GO TO NEXT LINE
                INC     MODE13H_Y
                INC     DI
                JMP     PRINT_256SPRITE_LOOP

        PRINT_256SPRITE_END:
                MOV     MODE13H_DELETE, 0
                POP     DI
                POP     DX
                POP     CX
                POP     BX
                POP     AX
        RET
PRINT_256SPRITE ENDP

Perhaps there's a better way than transferring pixel by pixel, but then I'd have to make sure that the dimensions of the sprite are multiples of 8 of something. The only problem is that I won't be able to handle transparent pixels.

NoCforMe

You can do better than that for your byte-moving code. Take advantage of the x86's more advanced instructions. No need for a comparison to end the loop. In fact, no need for a loop at all:
    MOV    AX, 0A000h
    MOV    ES, EX
    MOV    AX, SEG Sbuffer
    MOV    DS, AX
    XOR    SI, SI
    MOV    DI, OFFSET SCREEN_BUFFER
    MOV    CX, <# of bytes to move>

; This moves DS:SI--> ES:DI, increments SI & DI, does it CX times:
    REP    MOVSB
Assembly language programming should be fun. That's why I do it.

NoCforMe

Another opportunity for a speed-up: in your PRINT_256SPRITE routine, instead of calling PUT_PIXEL_IN_BUFFER for every single pixel, put that code in-line in your loop (PRINT_256SPRITE_LOOP), but minus the setup. Do your setup (setting your buffer pointer and the segment register) outside of the loop. Since you're using DI as your sprite-data pointer, use either SI or BX as a buffer pointer. That should reduce a lot of overhead, at the expense of slightly more code.
Assembly language programming should be fun. That's why I do it.

sinsi

Quote from: NoCforMe on February 08, 2024, 05:43:03 AMYou can do better than that for your byte-moving code. Take advantage of the x86's more advanced instructions. No need for a comparison to end the loop. In fact, no need for a loop at all:
    MOV    AX, 0A000h
    MOV    ES, EX
    MOV    AX, SEG Sbuffer
    MOV    DS, AX
    XOR    SI, SI
    MOV    DI, OFFSET SCREEN_BUFFER
    MOV    CX, <# of bytes to move>

; This moves DS:SI--> ES:DI, increments SI & DI, does it CX times:
    REP    MOVSB

I would fill CX with 320*200/4 and change MOVSB to MOVSD, moving aligned DWORDs will speed things up.
No need to wait for a vertical refresh if you're writing to the buffer, only when writing to the screen.
Also, as it is your code doesn't check for a sprite that is partly off the screen, is this deliberate?
🍺🍺🍺

jj2007


sinsi

#9
Quote from: jj2007 on February 09, 2024, 02:57:19 AMDoes movsd work in 16-bit code?
Yes, but I think it uses the operand size override 66H
It also requires a 386 or better :biggrin:
🍺🍺🍺

daydreamer

Hi popcalent
example of scrolling in mode 13h
and keyboard control and rossler with help of fpu
https://masm32.com/board/index.php?topic=9319.30
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

popcalent

Thank you all for your replies. MOVSB won't work unless using 386 assembly, which I'm not.


Quote from: daydreamer on February 09, 2024, 04:53:02 AMHi popcalent
example of scrolling in mode 13h
and keyboard control and rossler with help of fpu
https://masm32.com/board/index.php?topic=9319.30

Thanks! Is this also taking care of the keyboard lag when using arrow keys to move a sprite?

NoCforMe

Quote from: popcalent on February 09, 2024, 08:09:56 AMMOVSB won't work unless using 386 assembly, which I'm not.
Not true! Where did you get that idea? I've been using that (including REP MOVSB) since 8088 days.

What assembler are you using?
Assembly language programming should be fun. That's why I do it.

popcalent

Quote from: NoCforMe on February 09, 2024, 09:49:03 AM
Quote from: popcalent on February 09, 2024, 08:09:56 AMMOVSB won't work unless using 386 assembly, which I'm not.
Not true! Where did you get that idea? I've been using that (including REP MOVSB) since 8088 days.

What assembler are you using?

I'm sorry. I meant MOVSD. If I'm not mistaken, MOVSB is the same as MOVS and forcing the parameters to be bytes, which is what I'm doing, correct? I can't remember if MOVSW works in < 386 assembly...

NoCforMe

Both MOVSB and MOVSW will work with any x86 assembler; those have been in the instruction set since day 1.

You're correct, MOVSD requires 386 or better. But I don't think you need to mess with that in order to see a significant speed increase of your code. Why don't you try my suggestion and see what it does for you? (Including inlining your PUT_PIXEL_IN_BUFFER code.)

If you're moving an even number of bytes, then by all means use MOVSW instead of MOVSB. Take every advantage.
Assembly language programming should be fun. That's why I do it.