News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Challenge: Hexify #s

Started by NoCforMe, December 02, 2024, 06:57:53 AM

Previous topic - Next topic

NoCforMe

Here's a challenge specifically for you, @Zedd, since I know you're fond of creating plugins for that editor. If this catches your fancy maybe you want to give it a whirl.

Current problem here at NoCforMe Laboratories, GmbH, was dealing with a ton of hexadecimal data from my BMPinfo program (thread on this elsewhere on this site). F'rinstance, here's one line of pixels:

1800: 1A161A 19161A 19151A 1C181D 1D191E 1B171C 1E1A1F 1E1A1F
1F1B20 1C181D 1D191E 1C181D 191619 1C191B 1C181C 201C21
1B171D 19151B 1B171D 1D191F 1F1B21 1D191F 1A161C 1D181E
1D191F 1C1B21 1C1C22 1B181E 1D171D 1B171C 1B171C 1B171C
1B171C 1C181D 1D1B20 303138 81838B BABFC4 D7DFE0 E1ECEB
D8E7F0 DFEDF5 E6F1F7 ECF6F9 E4ECED E0E7E4 EEF4EF ECF2EE
D4D9D7 A0A5A4 BEC3C2 848A8A 414747 414545 2E3030 282829
232024 252127 28242A 211D23 1B171C 1C181D 1A1719 181517
171416 171416 181517 161315 171416 171416 161315 161315
161315 171214 181315 171214 181315 191416 181416 161315
161315 161315 171416 171416 161415 161315 171417 181418
171318 181418 161315 181518 19151A 1B171C 221D23 2A232C
332C36 423B45 4F4955 5F5A66

I needed to get this into some code to create a BMP from it, which meant I had to massage each and every # there:
  • Put the hex ID ('h') at the end of each #
  • Prefix the string with a '0' if it didn't start with a numeric digit

giving this result:

1800h: 1A161Ah 19161Ah 19151Ah 1C181Dh 1D191Eh 1B171Ch 1E1A1Fh 1E1A1Fh
1F1B20h 1C181Dh 1D191Eh 1C181Dh 191619h 1C191Bh 1C181Ch 201C21h
1B171Dh 19151Bh 1B171Dh 1D191Fh 1F1B21h 1D191Fh 1A161Ch 1D181Eh
1D191Fh 1C1B21h 1C1C22h 1B181Eh 1D171Dh 1B171Ch 1B171Ch 1B171Ch
1B171Ch 1C181Dh 1D1B20h 303138h 81838Bh 0BABFC4h 0D7DFE0h 0E1ECEBh
0D8E7F0h 0DFEDF5h 0E6F1F7h 0ECF6F9h 0E4ECEDh 0E0E7E4h 0EEF4EFh 0ECF2EEh
0D4D9D7h 0A0A5A4h 0BEC3C2h 848A8Ah 414747h 414545h 2E3030h 282829h
232024h 252127h 28242Ah 211D23h 1B171Ch 1C181Dh 1A1719h 181517h
171416h 171416h 181517h 161315h 171416h 171416h 161315h 161315h
161315h 171214h 181315h 171214h 181315h 191416h 181416h 161315h
161315h 161315h 171416h 171416h 161415h 161315h 171417h 181418h
171318h 181418h 161315h 181518h 19151Ah 1B171Ch 221D23h 2A232Ch
332C36h 423B45h 4F4955h 5F5A66h

Ugh. I started doing this by hand; total PITA. Then, with still dozens of lines left to go, I figured it would take less time overall to write some code to automate this process. Which I did; I put the code in my editor (EdAsm). Which worked well.

So in the interest of friendly competition, do you think you want to try this? Warning: it's not quite as simple as you might think. But it is doable. And I think the way I came up with doing it is probably far better than what you might come up with at first, at least, which will be (predicting here) dozens of if-then-else statements.

If you're interested I can show you my method, which uses my famous parsing scheme based on finite-state automata (FSA)--woohoo! Really, it works very nicely to handle any number of situations where you need to interpret textual data. Boils down to a surprisingly small amount of code, driven by data (a state table).

BTW, you probably noticed that the hexified data still isn't ready for prime time, as it needs commas inserted after each number (except for the last one). I'm working on a converter for that now. This is even a bit trickier than the hexifier. Another possibility for a plugin?
Assembly language programming should be fun. That's why I do it.

NoCforMe

BTW, interesting side effect of this function. If you try to hexify this
dead beef deaf ace decadeyou end up with this
0deadh 0beefh 0deafh 0aceh 0decadeh'cuz the code just don't know any better (no AI here).
Assembly language programming should be fun. That's why I do it.

zedd151

Funny you mention it, but yes I have a plugin to convert ascii decimal to ascii hex but only one at a time (by selecting the ascii decimal string and running the plugin) but not what you want - read further...

But looking again at your description, you want hex data converted to comma delimited hex strings. I don't have that. Nor would it work as a plugin since both source and destination are both ascii, read from rich edit control and written back to the rich edit control.

But throwing together an algo to do exactly what you want shouldn't be too difficult. If dealing with hex data and converting to ascii hex string, a table should work well and be fast, methinks. Later today or maybe tomorrow, I'll take a crack at it.    :biggrin:

Is it bytewise hex data or dword hex data? What flavor of endian? I think I know, but I gotta ask...

NoCforMe

Quote from: zedd151 on December 02, 2024, 07:12:15 AMBut looking again at your description, you want hex data converted to comma delimited hex strings. I don't have that. Nor would it work as a plugin since both source and destination are both ascii, read from rich edit control and written back to the rich edit control.

Ah, but grasshopper, it can work: that's exactly what my code (which could be a plugin) does. You copy the selection from the RichEdit control into a buffer, massage it, then copy the buffer back to the RichEdit. The resulting data doesn't have to be the same size as the original selection.

QuoteBut throwing together an algo to do exactly what you want shouldn't be too difficult. If dealing with hex data and converting to ascii hex string, a table should work well and be fast, methinks. Later today or maybe tomorrow, I'll take a crack at it.    :biggrin:

Is it bytewise hex data or dword hex data? What flavor of endian? I think I know, but I gotta ask...

I think you misunderstand: the data is all ASCII. No numeric conversion required.

Also, for this particular challenge, forget the comma-ification; that's a whole 'nother can o'worms. This is just doing the hex cleanup (not really conversion; the data is already hex, but lacks what is needed in order for MASM to digest it).
Assembly language programming should be fun. That's why I do it.

zedd151

I though you wanted something to incorporate into your program, to output masm  compatible hex strings... I musta misunderstood. I'll look at this later, or tomorrow A.M.

Adding commas is not a big deal btw, it should be easy peasly

Basically you want the bitmap data stringified, and comma separated. And for masm compatibility, a leading zero where needed. I got this... my gears are turning already....  :cool:  I've got some ideas already...

I'll work on two versions:
One converting the raw hex data to formatted strings,  the other working on formatting the existing ascii hex data.
Both *should* be trivial...


Hmmm... thinking about this, the bitmap data is definitely not dwords, but sequential byte sequences - your example makes it look like dwords... Do you want to convert them to dwords? Or leave them in the same order?  BGR or RGB?

NoCforMe

Quote from: zedd151 on December 02, 2024, 07:23:36 AMI though you wanted something to incorporate into your program, to output masm  compatible hex strings... I musta misunderstood. I'll look at this later, or tomorrow A.M.

Adding commas is not a big deal btw, it should be easy peasly

Basically you want the bitmap data stringified, and comma separated. And for masm compatibility, a leading zero where needed. I got this... my gears are turning already....  :cool:  I've got some ideas already...

I'll work on two versions:
One converting the raw hex data to formatted strings,  the other working on formatting the existing ascii hex data.
Both *should* be trivial...


Hmmm... thinking about this, the bitmap data is definitely not dwords, but sequential byte sequences - your example makes it look like dwords... Do you want to convert them to dwords? Or leave them in the same order?  BGR or RGB?

Aaaargh: why are you making this so complicated?
Yes, they're DWORDs (maybe shoulda written that), and in the correct byte order, so you don't need to mess with that. (This is 24 bit-per-pixel data.) The final result will be this for each line:
DD 1A161Ah, 19161Ah, 19151Ah, 1C181Dh, 1D191Eh, 1B171Ch, 1E1A1Fh, 1E1A1Fh

and like I said, forget the comma-ification for now. Because it's complicated: remember, the last "chunk" on a line doesn't get a comma. But there are characters after that last "chunk" before the end-of-line character (CR/LF at the end of each line), in this case spaces. So it's not as straightforward as you might think.

But I'm sure you can handle the hex-ification.
Assembly language programming should be fun. That's why I do it.

daydreamer

nice challenge  :thumbsup:
my suggestion is add something that check lenght of hex numbers in source file,to detect big hex numbers 8 characters for 32bit values and 16 characters for 64bit values so coder doesnt accidently wrote only 7 characters or 9 or 15 or 17 instead
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

NoCforMe

Quote from: zedd151 on December 02, 2024, 07:23:36 AMI'll work on two versions:
One converting the raw hex data to formatted strings,  the other working on formatting the existing ascii hex data.
Both *should* be trivial...

I got my comma-delimiting code working.
So how's your hexification code coming?
Looking forward to, what should we call it? the code showdown? code slam?*

* Like a poetry slam. Friendly competition.
Assembly language programming should be fun. That's why I do it.

zedd151

Haven't really delved into it yet. I might pass that off to daydreamer..  :biggrin:  How about it, Magnus?  :tongue:

I'm  a little busy solving a problem of my own making, at the moment.   :cool:

jj2007

  Let esi=FileRead$("raw.txt")
  add Len(esi), 1000 ; whatever is needed to compensate
  Let edi=New$(eax) ; leading zeros and trailing "h"
  push edi ; for printing
  .Repeat
lodsb ; 1800: 1A161A D8E7F0 19151A
  .Until al==":" || !al
  mov eax, 20206464h
  stosd
  dec edi
  .While al
.Repeat
lodsb
.Until al>"0" || !al ; D8E7F0
.Break .if !al
.if al>="A"
mov byte ptr [edi], "0" ; add leading zero
inc edi
.endif
.Repeat
stosb
lodsb
.Until al<"0"
.if al==" "
mov eax, "  ,h" ; 202C68
.else
mov eax, 0A0D68h ; h CrLf
stosd
.Break .if dword ptr [esi]<=0A0Dh ; no dd at end of file
dec edi
mov eax, 20206464h
.endif
stosd
dec edi ; write three characters
  .Endw
  pop edi
  PrintLine edi

dd 1A161Ah, 19161Ah, 19151Ah, 1C181Dh, 1D191Eh, 1B171Ch, 1E1A1Fh, 1E1A1Fh
dd 1F1B20h, 1C181Dh, 1D191Eh, 1C181Dh, 191619h, 1C191Bh, 1C181Ch, 201C21h
dd 1B171Dh, 19151Bh, 1B171Dh, 1D191Fh, 1F1B21h, 1D191Fh, 1A161Ch, 1D181Eh
dd 1D191Fh, 1C1B21h, 1C1C22h, 1B181Eh, 1D171Dh, 1B171Ch, 1B171Ch, 1B171Ch
dd 1B171Ch, 1C181Dh, 1D1B20h, 303138h, 81838Bh, 0BABFC4h, 0D7DFE0h, 0E1ECEBh
dd 0D8E7F0h, 0DFEDF5h, 0E6F1F7h, 0ECF6F9h, 0E4ECEDh, 0E0E7E4h, 0EEF4EFh, 0ECF2EEh
dd 0D4D9D7h, 0A0A5A4h, 0BEC3C2h, 848A8Ah, 414747h, 414545h, 2E3030h, 282829h
dd 232024h, 252127h, 28242Ah, 211D23h, 1B171Ch, 1C181Dh, 1A1719h, 181517h
dd 171416h, 171416h, 181517h, 161315h, 171416h, 171416h, 161315h, 161315h
dd 161315h, 171214h, 181315h, 171214h, 181315h, 191416h, 181416h, 161315h
dd 161315h, 161315h, 171416h, 171416h, 161415h, 161315h, 171417h, 181418h
dd 171318h, 181418h, 161315h, 181518h, 19151Ah, 1B171Ch, 221D23h, 2A232Ch
dd 332C36h, 423B45h, 4F4955h, 5F5A66h

NoCforMe

Hmm, pretty slick there.
Still digesting this.
One tiny quibble: where you use
.if al>="A"
mov byte ptr [edi], "0" ; add leading zero
what if the hex chars use lowercase "a-f" instead, which is permitted? (My code allows either.)

I don't yet understand how all those stosds (like 20206464h work, but I'm sure it's very clever.

So we have a contest entry here.
Assembly language programming should be fun. That's why I do it.

NoCforMe

Also, JJ, something I noticed while coding my comma-delimiting code:
The buffer I got from my RichEdit control (using EM_GETTEXTEX) only had carriage returns (0Dh) with no line feeds (0Ah). Is this the way all (ASCII) text in a RichEdit control is encoded? I'm so used to seeing CR-LF that my code broke because I was assuming that's what I would see for line endings.
Assembly language programming should be fun. That's why I do it.

zedd151

Quote from: NoCforMe on December 03, 2024, 07:51:10 PMAlso, JJ, something I noticed while coding my comma-delimiting code:
The buffer I got from my RichEdit control (using EM_GETTEXTEX) only had carriage returns (0Dh) with no line feeds (0Ah). Is this the way all (ASCII) text in a RichEdit control is encoded? I'm so used to seeing CR-LF that my code broke because I was assuming that's what I would see for line endings.
:joking:
Yes rich edit likes to save bytes.  :biggrin:  No 'trailing' line feed.
When I started working with rich edit code years ago, that threw me off too.

@sinsi, I never knew about that one.  Nice.   :smiley:  (post below this one)

sinsi

Quote from: NoCforMe on December 03, 2024, 07:51:10 PMAlso, JJ, something I noticed while coding my comma-delimiting code:
The buffer I got from my RichEdit control (using EM_GETTEXTEX) only had carriage returns (0Dh) with no line feeds (0Ah). Is this the way all (ASCII) text in a RichEdit control is encoded? I'm so used to seeing CR-LF that my code broke because I was assuming that's what I would see for line endings.
You need to use GT_USECRLF as the flags member of the GETTEXTEX structure

NoCforMe

Quote from: sinsi on December 03, 2024, 07:57:23 PM
Quote from: NoCforMe on December 03, 2024, 07:51:10 PMAlso, JJ, something I noticed while coding my comma-delimiting code:
The buffer I got from my RichEdit control (using EM_GETTEXTEX) only had carriage returns (0Dh) with no line feeds (0Ah). Is this the way all (ASCII) text in a RichEdit control is encoded? I'm so used to seeing CR-LF that my code broke because I was assuming that's what I would see for line endings.
You need to use GT_USECRLF as the flags member of the GETTEXTEX structure

Actually, it's nice only having one line terminator instead of two.
CR-LF seems to be an ancient holdover from the days of teletype terminals and such, when CR actually returned the carriage and LF fed a new line, just like a typewriter. How quaint.
Assembly language programming should be fun. That's why I do it.