News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Binary to displayable text, any base

Started by ahsat, March 31, 2024, 04:37:00 AM

Previous topic - Next topic

sinsi

You might need this at the end of your source code for 32-bit
end entry_point

NoCforMe

Quote from: sinsi on April 01, 2024, 09:37:54 AMYou might need this at the end of your source code for 32-bit
end entry_point
Yes! You definitely do need that. (But I know nothing about polink; I just use good old ml.exe.)
Assembly language programming should be fun. That's why I do it.

ahsat

Quote from: sinsi on April 01, 2024, 09:37:54 AMend entry_point
The entry_point on the end statement did it. Forever, that has been the way the entry point was defined, but they changed that for the 64bit assembled.

Thank you guys so much for this help. The 32 bit version now links and seems to work. I will see if it needs any clean up.

ahsat

Below is the 32 bit version of Anybase32.exe. It was linked using:

polink.exe /SUBSYSTEM:CONSOLE /OUT:AnyBase32.exe AnyBase.obj

In my day, an algorithms worth was based on how powerful it was, not just its speed. All the routines you guys are comparing it to, will only convert a single base. The algorithm I released will do any base, and is faster for most bases.

So, which is better, code a routine for each base that you will ever use, or have one routine that will do any base you need? With this algorithm you only need one choice, leading zeors or not.

TITLE AnyBase

OPTION PROC:PRIVATE                ;Don't automatically make procs public

include \masm32\include\masm32rt.inc

.data

result    db      33 dup (?)        ;max should be 32, plus a null     
newLine   db      0dh, 0ah, 0
numb      dword   ?                 ;holds the number being converted

.CODE

;Got to keep the Sumerian happy, base 60 is the max here.
digits    db      '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijkylmnopqrstuvw'
maxBase   equ     $-digits          ;Max base number, should be 60

AnyToAny proc
;----------------------------------------------------------------------------
;          Original by Ray Gwinn
; Any number to any base.
; Parameters:
;   eax    The number to convert
;   ebx    The desired base
;   edi    Where to place the converted string
;----------------------------------------------------------------------------
;
    push  edx                       ;save edx, or a digit
    xor   edx, edx                  ;zero edx
    div   ebx                       ;generate next digit in edx
    test  eax, eax                  ;test if done
    jz    @f                        ;br if done

    invoke AnyToAny                 ;generate next digit
@@: mov   al, digits[edx]           ;get the ascii value
    stosb                           ;save it
    pop   edx                       ;restore edx, or get next digit
    ret
AnyToAny endp

ctAnyToAny proc
;----------------------------------------------------------------------------
;          Original by Ray Gwinn
; Any number to any base with a specified digit length, will pad leading zeros.
; Parameters:
;   eax    The number to convert
;   ebx    The desired base
;   ecx    The desired digit count
;   edi    Where to place the converted string
;----------------------------------------------------------------------------
;
    push  edx                       ;save edx, or a digit
    xor   edx, edx                  ;zero edx
    div   ebx                       ;generate next digit in edx
    dec   ecx                       ;decrement digit count
    jcxz  @f                        ;br if done

    invoke ctAnyToAny               ;generate next digit
@@: mov   al, digits[edx]           ;get the ascii value
    stosb                           ;save it
    pop   edx                       ;restore edx, or get next digit
    ret
ctAnyToAny endp

entry_point proc public
;----------------------------------------------------------------------------
; Program to demo any number to any base, bases 2 to 60.
;----------------------------------------------------------------------------
;
    mov   numb, 123456789           ;convert this number, over and over
    jmp   @f

m00 db    13, 10, 'This is the number we will be converting', 13, 10, 0
@@: mov   eax, numb                 ;the number to eax
    mov   ebx, 10                   ;the base to ebx
    mov   edi, offset result        ;the destination addr to edi
    invoke AnyToAny
    mov   byte ptr [edi], 0         ;null terminate the result string
    invoke StdOut, addr m00
    invoke StdOut, addr result
    invoke StdOut, addr newLine
    jmp   @f

m01 db    13, 10, 'This is the low byte of the number in hex', 13, 10, 0
@@: mov   ecx,2
    mov   eax, numb                 ;the number to eax
    mov   ebx, 16                   ;the base to ebx
    mov   edi, offset result        ;the destination to edi
    invoke ctAnyToAny
    mov   byte ptr [edi], 0         ;null terminate the result string
    invoke StdOut, addr m01
    invoke StdOut, addr result
    invoke StdOut, addr newLine
    jmp   @f

m02 db    13, 10, 'This is the 16 bit number, in hex', 13, 10, 0
@@: mov   ecx,4
    mov   eax, numb                 ;the number to eax
    mov   ebx, 16                   ;the base to ebx
    mov   edi, offset result        ;the destination to edi
    invoke ctAnyToAny
    mov   byte ptr [edi], 0         ;null terminate the result string
    invoke StdOut, addr m02
    invoke StdOut, addr result
    invoke StdOut, addr newLine
    jmp   @f

m03 db    13, 10, 'This is the 32 bit number, in hex', 13, 10, 0
@@: mov   ecx,8
    mov   eax, numb                 ;the number to eax
    mov   ebx, 16                   ;the base to ebx
    mov   edi, offset result        ;the destination to edi
    invoke ctAnyToAny
    mov   byte ptr [edi], 0         ;null terminate the result string
    invoke StdOut, addr m03
    invoke StdOut, addr result
    invoke StdOut, addr newLine
    jmp   @f

m05 db    13, 10, 'This is the 32 bit number, in binary', 13, 10, 0
@@: mov   ecx,32
    mov   eax, numb                 ;the number to eax
    mov   ebx, 2                    ;the base to ebx
    mov   edi, offset result        ;the destination to edi
    invoke ctAnyToAny
    mov   byte ptr [edi], 0         ;null terminate the result string
    invoke StdOut, addr m05
    invoke StdOut, addr result
    invoke StdOut, addr newLine
    jmp   @f

m06 db    13, 10, 'And in Sumerian Sexagesimal, base 60', 13, 10, 0
@@: mov   ecx,10
    mov   eax, numb                 ;the number to eax
    mov   ebx, 60                   ;the base to ebx
    mov   edi, offset result        ;the destination to edi
    invoke ctAnyToAny
    mov   byte ptr [edi], 0         ;null terminate the result string
    invoke StdOut, addr m06
    invoke StdOut, addr result
    invoke StdOut, addr newLine

    invoke ExitProcess, 0           ;Error code 0
    ret
entry_point ENDP

end entry_point

NoCforMe

Welll, so far in my assembly-coding career I've only ever needed 2 bases: decimal and hex. Well, maybe binary once or twice. So the "any base" thing is a meh in my book; just one more annoying parameter to forget to place properly.

And really, when in the world would you ever use base 60? or really anything except 2, 10 or 16? (If you're a really old Unix coot I guess you could add octal.)

Still, kewl algo.
Assembly language programming should be fun. That's why I do it.

jj2007

Quote from: ahsat on April 02, 2024, 05:46:03 AMBelow is the 32 bit version of Anybase32.exe

Thanks, Ray. I've added it to the testbed as "Ray II", see attachment. Timings:

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)
...
Averages:
5348    cycles for dwtoa
3646    cycles for dw2str
22575   cycles for MasmBasic Str$()
16554   cycles for Ray's algo I
15138   cycles for Ray's algo II

20      bytes for dwtoa
74      bytes for dw2str
16      bytes for MasmBasic Str$()
110     bytes for Ray's algo I
129     bytes for Ray's algo II

dwtoa                                   12345678
dw2a                                    12345678
dw2str                                  12345678
MasmBasic Str$()                        12345678
Ray's algo I                            012345678
Ray's algo II                           123456788

It is certainly faster than my own Str$(), but dw2str is a tick faster. You may also need to care for a zero delimiter: 123456788

ahsat

Quote from: jj2007 on April 02, 2024, 06:23:33 AMIt is certainly faster than my own Str$(), but dw2str is a tick faster
Yes, but my point is that you need two routines to do that.

Even in your timing test you had to use three different programs to compare to one. The one can do what takes three others to do, just for you to test it.

If one assembler can do the combined things that three other assemblers can do, which one are you going to use on a normal basis? And I suspect you would not care if it was a little slower.

jj2007

Ray,

Your solution is certainly very elegant, and I think we all agree that you got an extremely good start as an "old newbie" in this forum. It's nice to have you with us :thup:

Jochen

jj2007

I got the mult version working, at least partially:

Averages:
4372    cycles for dwtoa
2816    cycles for dw2str
16560   cycles for MasmBasic Str$()
12337   cycles for Ray's algo I
4289    cycles for Ray+mul
12338   cycles for Ray's algo II

20      bytes for dwtoa
82      bytes for dw2str
16      bytes for MasmBasic Str$()
110     bytes for Ray's algo I
64      bytes for Ray+mul
129     bytes for Ray's algo II

dwtoa                                   123456789
dw2a                                    123456789
dw2str                                  123456789
MasmBasic Str$()                        123456789
Ray's algo I                            123456789
Ray+mul                                 123456789
Ray's algo II                           123456789

.DATA?
result2 db 40 dup(?)
.CODE
_xmul dd 0CCCCCCCDh
ctAnyToAnyMul:
  push ecx ; save edx, or a digit
  mov ecx, eax
  mul _xmul ; something is wrong here
  shr edx, 3
  mov eax, edx
  add edx, edx
  lea edx, [edx*4+edx]
  sub ecx, edx
  neg edx
  test eax, eax
  .if !Zero? ; br if done
call ctAnyToAnyMul ; generate next digit
  .endif
  mov al, @digits[edx] ; get the ascii value
  stosb ; save it
  pop edx ; restore edx, or get next digit
  ret

NameF equ Ray+mul
TestF proc
  mov esi, AlgoLoops-1 ; loop e.g. 100x
  align 4
  .Repeat
mov eax, TheNumber
mov edi, offset result2
call ctAnyToAnyMul
dec esi
  .Until Sign?
  mov eax, offset result
  ret
TestF endp

ahsat

A forum member, jimg, managed to eliminate the recursion and I have incorporated his changes.

I will do the 32 bit version tomorrow, a new 64 bit version of AnyToAny is below. It can be linked with:

polink.exe /SUBSYSTEM:CONSOLE /LARGEADDRESSAWARE:NO /ENTRY:main /OUT:AnyBase.exe AnyBase.obj
TITLE AnyBase

OPTION PROC:PRIVATE                ;Don't automatically make procs public

include \masm64\include64\masm64rt.inc

.data

result    db      65 dup (?)        ;max should be 64, plus a null     
newLine   db      0dh, 0ah, 0
numb      qword   ?                 ;holds the number being converted

.CODE

;Got to keep the Sumerian happy, base 60 is the max here.
digits    db      '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijkylmnopqrstuvw'
maxBase   equ     $-digits          ;Max base number, should be 60

AnyToAny proc
;----------------------------------------------------------------------------
;          Original by Ray Gwinn
; Any number to any base.
; Parameters:
;   eax    The number to convert
;   ebx    The desired base
;   edi    Where to place the converted string
;----------------------------------------------------------------------------
;
    xor   rcx, rcx                  ;zero rcx
agn:
    push  rdx                       ;save rdx, or a digit
    inc   rcx                       ;bump counts in loop
    xor   rdx, rdx                  ;zero rdx
    div   rbx                       ;generate next digit in rdx
    test  rax, rax                  ;test if done
    jnz   agn                       ;br if done

@@: mov   al, digits[rdx]           ;get the ascii value
    stosb                           ;save it
    pop   rdx                       ;restore rdx, or get next digit
    loop  @b                        ;loop till all digits processed

    ret
AnyToAny endp

main proc  Public
;----------------------------------------------------------------------------
; Program to demo any number to any base, bases 2 to 60.
;----------------------------------------------------------------------------
;
    xor   rsi, rsi
    mov   rax, 123456789            ;convert this number, over and over
    mov   numb, rax                 ;save it in numb
    jmp   @f

m00 db    13, 10, 'This is the number we will be converting', 13, 10, 0
@@: mov   rax, numb                 ;the number to rax
    mov   ebx, 10                   ;the base to rbx
    mov   edi, offset result        ;the destination addr to edi
    invoke AnyToAny
    mov   byte ptr [edi], 0         ;null terminate the result string
    invoke StdOut, addr m00
    invoke StdOut, addr result
    invoke StdOut, addr newLine
    jmp   @f

m04 db    13, 10, 'This is the 64 bit number, in hex', 13, 10, 0
@@: mov   rax, numb                 ;the number to rax
    mov   ebx, 16                   ;the base to rbx
    mov   edi, offset result        ;the destination to edi
    invoke AnyToAny
    mov   byte ptr [edi], 0         ;null terminate the result string
    invoke StdOut, addr m04
    invoke StdOut, addr result
    invoke StdOut, addr newLine
    jmp   @f

m05 db    13, 10, 'This is the 64 bit number, in binary', 13, 10, 0
@@: mov   rax, numb                 ;the number to rax
    mov   ebx, 2                    ;the base to rbx
    mov   edi, offset result        ;the destination to edi
    invoke AnyToAny
    mov   byte ptr [edi], 0         ;null terminate the result string
    invoke StdOut, addr m05
    invoke StdOut, addr result
    invoke StdOut, addr newLine
    jmp   @f

m06 db    13, 10, 'And in Sumerian Sexagesimal, base 60', 13, 10, 0
@@: mov   rax, numb                 ;the number to rax
    mov   ebx, 60                   ;the base to rbx
    mov   edi, offset result        ;the destination to edi
    invoke AnyToAny
    mov   byte ptr [edi], 0         ;null terminate the result string
    invoke StdOut, addr m06
    invoke StdOut, addr result
    invoke StdOut, addr newLine

done:
    invoke ExitProcess, 0           ;Error code 0
    ret
main ENDP

end

sinsi

mov   edi, offset resultUnless you want a write access violation, that should be RDI

mov   al, digits[rdx]You might get a link error about fixups, try this
lea   r9,digits
...
mov   al,[r9][rdx]
If you use lea instead of offset then ML uses a RIP-relative instruction which has two benefits - code is smaller by 4(?) bytes and you will have proper position-independent code

One last niggle
xor   rsi,rsi
;the same but 1 byte smaller
xor   esi,esi
An instruction (all? not sure) that changes the low 32-bit part of a 64-bit register will zero the top 32-bits.

NoCforMe

Quote from: sinsi on April 04, 2024, 05:14:09 PMOne last niggle
xor  rsi,rsi
;the same but 1 byte smaller
xor  esi,esi
An instruction (all? not sure) that changes the low 32-bit part of a 64-bit register will zero the top 32-bits.
Interesting. Is that a totally reliable (IOW, documented and all) side effect?
Assembly language programming should be fun. That's why I do it.

sinsi

Quote from: NoCforMe on April 04, 2024, 06:28:40 PM
Quote from: sinsi on April 04, 2024, 05:14:09 PMOne last niggle
xor  rsi,rsi
;the same but 1 byte smaller
xor  esi,esi
An instruction (all? not sure) that changes the low 32-bit part of a 64-bit register will zero the top 32-bits.
Interesting. Is that a totally reliable (IOW, documented and all) side effect?
It's in the Intel docs

ahsat

Quote from: sinsi on April 04, 2024, 05:14:09 PMmov   edi, offset result
One last niggle
xor   rsi,rsi
;the same but 1 byte smaller
xor   esi,esi
An instruction (all? not sure) that changes the low 32-bit part of a 64-bit register will zero the top 32-bits.

That is really good information. However, I am not going to use it until I become more comfortable with 64 bit code.

I am going to post yet another 64 bit version, then I will get to work on a 32 bit version.

jj2007

Quote from: ahsat on March 31, 2024, 11:44:31 AMThe program was linked using:
polink.exe /SUBSYSTEM:CONSOLE /LARGEADDRESSAWARE:NO

With that option, you can use mov edi, offset somestring. /LARGEADDRESSAWARE:NO means "stick to the 32-bit world, even in 64-bit code". A dangerous road, because you lose the only real advantage that the 64-bit world offers: a large address space.