News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Sorts in Masmlib

Started by jimg, March 22, 2020, 04:59:11 AM

Previous topic - Next topic

jimg

I tried several of the string sorts in Masmlib and got what I think are strange results.
The sorts move high ascii (>127) to the top of the list, that it they think high ascii is smaller than low ascii characters. ?
Is this normal??

assort
input   output
10      ½½½½
1.      ¼this
3.      ═╦╩a
11      1
½½½½    1.
1       10
1;      11
¼this   1;
is      3.
═╦╩a    is
test    test

acisort
input   output
10      ½½½½
1.      ¼this
3.      ═╦╩a
11      1
½½½½    1.
1       10
1;      11
¼this   1;
is      3.
═╦╩a    is
test    test

cstsorta
input   output
10      ½½½½
1.      ¼this
3.      ═╦╩a
11      1
½½½½    1.
1       10
1;      11
¼this   1;
is      3.
═╦╩a    is
test    test

HSE

Is correct string order  :thumbsup:

   nothing < symbols < numbers < letter

    But "1;" is wrong!


Equations in Assembly: SmplMath

jimg

Quote from: HSE on March 22, 2020, 05:55:10 AM
Is correct string order  :thumbsup:

   nothing < symbols < numbers < letter

What?   Where did you get that logic?

HSE

Quote from: jimg on March 22, 2020, 07:24:38 AM
What?   Where did you get that logic?

:biggrin:  Windows explorer, DOS "dir", Linux "ls" (if I remember well)
Equations in Assembly: SmplMath

jimg

It is difficult to find a program to download to sort strings for testing.  So far, every one, and every online site I tried,  sorts the high ascii characters to the end.

The only exception I can find is Microsoft's command line sort.  (So in my opinion and only my opinion, Microsoft screwed it up.)

Taking a look at the assort (and the asqsort which does the work) code, the problem is obvious.  All the comparisons are signed (jg jl etc.) rather than unsigned (ja jb etc.).  Why would one think ascii characters are signed?  Well, I just made the same mistake in a program I'm working on and it took me a week to find the problem, so not hard to screw up.

I'll just gen up a copy from masmlib to use for testing.  But others that might use the routines should be aware.


hutch--

Sorting of text means "alphabetical sort order", it is correct that high ascii occur higher up than normal alphabetical text. I am not sure what you are after, perhaps you need to write a custom sort that does what you require.

jimg

Quote from: hutch-- on March 22, 2020, 10:01:59 AM
Sorting of text means "alphabetical sort order", it is correct that high ascii occur higher up than normal alphabetical text. I am not sure what you are after, perhaps you need to write a custom sort that does what you require.


Hi Hutch-

That was very bad of me in the first post to say "higher up".   At the time, I meant visually higher up the list.  Very bad terminology on my part.

I agree with your statement, I think.  "it is correct that high ascii occur higher up than normal alphabetical text" which I interpret to mean high ascii should be put out Later in the list, after the normal characters.  Unfortunately, the alpha sorts in Masmlib do not do this, they put the high ascii characters out first, before the normal characters.

It only took a few minutes to copy the four procs involved into a file and change the jumps as required.    So far it looks good, but more testing is needed.


hutch--

Jim,

I am still not sure what you are after, text sort = a-z - A-Z, the rest do not matter. If you have some special requirement, then design another sort or adapt one of the existing ones but a string sort works of the alphabet, not the whole range of ascii characters. The problem is you are trying to redefine what a string sort is supposed to do.

jimg

Oh.  Okay then.  No problem.

jj2007

Is this the order that you consider correct?

QSort
input   output
10      1
1.      1.
3.      10
11      11
««««    1;
1       3.
1;      is
¬this   test
is      ««««
ÍËÊa    ¬this
test    ÍËÊa

jimg

Yes, don't you?

Edit-

Here's what I did to the masmlib procs to get the "correct" results-

jj2007

Quote from: jimg on March 22, 2020, 04:04:04 PM
Yes, don't you?

Me too. QSort() uses this order, also in case-insensitive mode.

It becomes more complex with Utf8 text or Umlaute (äöü), of course:
1.1 motiviation
1.2 Sorting in Java
1. unbelievable point
1 Introduction
Adele
andere
Ändern
anders
ängstlich
zippy
zone


Apparently, Java can do that. It's probably slow, because you need a lookup table to do that (QSort has one, but it's almost undocumented).

What is your specific problem, i.e. why do you need characters above Ascii 127? Just curious.

jimg

I'm using the masmlib sorts as my "standard" to test some code I'm working on.   The problem showed up when I used Windows.inc from masmlib as a test file, it contains some high ascii on line 104 ( "; ««««"  ).  So now I'm off on this side track trying to get my tools working so I can go back to my program.  Yes, I want my program to handle all possible byte values properly, not specifically related to sorting text files with high ascii.

I'm not trying to be cryptic, it's a stupid program I'm just doing for "fun", but in the end, it all pays the same.

hutch--

Jim,

You can test the two sorts that are in QE. First you left align the entire text so that you avoid leading spaces then sort the text and you will see what its supposed to do.

This is the actual QE code that calls the sorts.

    .if flag == 0
      invoke assort,parr,ebx,0
    .else
      invoke dssort,parr,ebx,0
    .endif

jimg

Interesting Hutch, it just throws away any line that starts with a high ascii character.
Below if before and after sort.