The MASM Forum

General => The Workshop => Topic started by: jimg on March 22, 2020, 04:59:11 AM

Title: Sorts in Masmlib
Post by: jimg on March 22, 2020, 04:59:11 AM
I tried several of the string sorts in Masmlib and got what I think are strange results.
The sorts move high ascii (>127) to the top of the list, that it they think high ascii is smaller than low ascii characters. ?
Is this normal??

assort
input   output
10      ½½½½
1.      ¼this
3.      ═╦╩a
11      1
½½½½    1.
1       10
1;      11
¼this   1;
is      3.
═╦╩a    is
test    test

acisort
input   output
10      ½½½½
1.      ¼this
3.      ═╦╩a
11      1
½½½½    1.
1       10
1;      11
¼this   1;
is      3.
═╦╩a    is
test    test

cstsorta
input   output
10      ½½½½
1.      ¼this
3.      ═╦╩a
11      1
½½½½    1.
1       10
1;      11
¼this   1;
is      3.
═╦╩a    is
test    test
Title: Re: Sorts in Masmlib
Post by: HSE on March 22, 2020, 05:55:10 AM
Is correct string order  :thumbsup:

   nothing < symbols < numbers < letter

    But "1;" is wrong!


Title: Re: Sorts in Masmlib
Post by: jimg on March 22, 2020, 07:24:38 AM
Quote from: HSE on March 22, 2020, 05:55:10 AM
Is correct string order  :thumbsup:

   nothing < symbols < numbers < letter

What?   Where did you get that logic?
Title: Re: Sorts in Masmlib
Post by: HSE on March 22, 2020, 08:33:14 AM
Quote from: jimg on March 22, 2020, 07:24:38 AM
What?   Where did you get that logic?

:biggrin:  Windows explorer, DOS "dir", Linux "ls" (if I remember well)
Title: Re: Sorts in Masmlib
Post by: jimg on March 22, 2020, 09:22:20 AM
It is difficult to find a program to download to sort strings for testing.  So far, every one, and every online site I tried,  sorts the high ascii characters to the end.

The only exception I can find is Microsoft's command line sort.  (So in my opinion and only my opinion, Microsoft screwed it up.)

Taking a look at the assort (and the asqsort which does the work) code, the problem is obvious.  All the comparisons are signed (jg jl etc.) rather than unsigned (ja jb etc.).  Why would one think ascii characters are signed?  Well, I just made the same mistake in a program I'm working on and it took me a week to find the problem, so not hard to screw up.

I'll just gen up a copy from masmlib to use for testing.  But others that might use the routines should be aware.

Title: Re: Sorts in Masmlib
Post by: hutch-- on March 22, 2020, 10:01:59 AM
Sorting of text means "alphabetical sort order", it is correct that high ascii occur higher up than normal alphabetical text. I am not sure what you are after, perhaps you need to write a custom sort that does what you require.
Title: Re: Sorts in Masmlib
Post by: jimg on March 22, 2020, 11:09:27 AM
Quote from: hutch-- on March 22, 2020, 10:01:59 AM
Sorting of text means "alphabetical sort order", it is correct that high ascii occur higher up than normal alphabetical text. I am not sure what you are after, perhaps you need to write a custom sort that does what you require.


Hi Hutch-

That was very bad of me in the first post to say "higher up".   At the time, I meant visually higher up the list.  Very bad terminology on my part.

I agree with your statement, I think.  "it is correct that high ascii occur higher up than normal alphabetical text" which I interpret to mean high ascii should be put out Later in the list, after the normal characters.  Unfortunately, the alpha sorts in Masmlib do not do this, they put the high ascii characters out first, before the normal characters.

It only took a few minutes to copy the four procs involved into a file and change the jumps as required.    So far it looks good, but more testing is needed.

Title: Re: Sorts in Masmlib
Post by: hutch-- on March 22, 2020, 11:51:37 AM
Jim,

I am still not sure what you are after, text sort = a-z - A-Z, the rest do not matter. If you have some special requirement, then design another sort or adapt one of the existing ones but a string sort works of the alphabet, not the whole range of ascii characters. The problem is you are trying to redefine what a string sort is supposed to do.
Title: Re: Sorts in Masmlib
Post by: jimg on March 22, 2020, 12:16:15 PM
Oh.  Okay then.  No problem.
Title: Re: Sorts in Masmlib
Post by: jj2007 on March 22, 2020, 02:27:01 PM
Is this the order that you consider correct?

QSort
input   output
10      1
1.      1.
3.      10
11      11
««««    1;
1       3.
1;      is
¬this   test
is      ««««
ÍËÊa    ¬this
test    ÍËÊa
Title: Re: Sorts in Masmlib
Post by: jimg on March 22, 2020, 04:04:04 PM
Yes, don't you?

Edit-

Here's what I did to the masmlib procs to get the "correct" results-
Title: Re: Sorts in Masmlib
Post by: jj2007 on March 22, 2020, 08:35:56 PM
Quote from: jimg on March 22, 2020, 04:04:04 PM
Yes, don't you?

Me too. QSort() (http://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1175) uses this order, also in case-insensitive mode.

It becomes more complex with Utf8 text or Umlaute (äöü), of course:
1.1 motiviation
1.2 Sorting in Java
1. unbelievable point
1 Introduction
Adele
andere
Ändern
anders
ängstlich
zippy
zone


Apparently, Java can do that (https://medium.com/fme-developer-stories/how-to-sort-umlaute-in-java-correctly-13f3262f15a1). It's probably slow, because you need a lookup table to do that (QSort has one, but it's almost undocumented).

What is your specific problem, i.e. why do you need characters above Ascii 127? Just curious.
Title: Re: Sorts in Masmlib
Post by: jimg on March 23, 2020, 12:11:30 AM
I'm using the masmlib sorts as my "standard" to test some code I'm working on.   The problem showed up when I used Windows.inc from masmlib as a test file, it contains some high ascii on line 104 ( "; ««««"  ).  So now I'm off on this side track trying to get my tools working so I can go back to my program.  Yes, I want my program to handle all possible byte values properly, not specifically related to sorting text files with high ascii.

I'm not trying to be cryptic, it's a stupid program I'm just doing for "fun", but in the end, it all pays the same.
Title: Re: Sorts in Masmlib
Post by: hutch-- on March 23, 2020, 05:38:27 AM
Jim,

You can test the two sorts that are in QE. First you left align the entire text so that you avoid leading spaces then sort the text and you will see what its supposed to do.

This is the actual QE code that calls the sorts.

    .if flag == 0
      invoke assort,parr,ebx,0
    .else
      invoke dssort,parr,ebx,0
    .endif
Title: Re: Sorts in Masmlib
Post by: jimg on March 23, 2020, 07:39:22 AM
Interesting Hutch, it just throws away any line that starts with a high ascii character.
Below if before and after sort.
Title: Re: Sorts in Masmlib
Post by: Adamanteus on March 23, 2020, 08:10:23 AM
 I could mark, that for strings in C exists collate strings function - strcol  :eusa_boohoo:
Title: Re: Sorts in Masmlib
Post by: hutch-- on March 23, 2020, 10:27:17 AM
I should have explained, in QE its the tokeniser that strips out the high ascii, in order, load the text file, tokenise it then sort the tokenised array.
Title: Re: Sorts in Masmlib
Post by: daydreamer on March 24, 2020, 09:02:22 PM
Quote from: jj2007 on March 22, 2020, 08:35:56 PM
Quote from: jimg on March 22, 2020, 04:04:04 PM
Yes, don't you?

Me too. QSort() (http://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1175) uses this order, also in case-insensitive mode.

It becomes more complex with Utf8 text or Umlaute (äöü), of course:
1.1 motiviation
1.2 Sorting in Java
1. unbelievable point
1 Introduction
Adele
andere
Ändern
anders
ängstlich
zippy
zone


Apparently, Java can do that (https://medium.com/fme-developer-stories/how-to-sort-umlaute-in-java-correctly-13f3262f15a1). It's probably slow, because you need a lookup table to do that (QSort has one, but it's almost undocumented).

What is your specific problem, i.e. why do you need characters above Ascii 127? Just curious.
this would be interesting to make custom sort for:
look it seems like random order,almost like sort Roman numbers need to be,I=1,C=100,M=1000,X=10,but ascii in complete other order,C,I,M,X
/*
1: &#x4e00;
2: &#x4e8c;
3: &#x4e09;
4: &#x56db;
5: &#x4e94;
6: &#x516d;
7: &#x4e03;
8: &#x516b;
9: &#x4e5d;
10: &#x5341;
11: 十一  &#x5341;&#x4e00;
12: 十二 &#x5341;&#x4e8c;
20: 二十 &#x4e8c;&#x5341;
50: 五十 &#x4e94;&#x5341;
100: &#x767e; (Japanese: hyaku, Chinese: bai)
1000: &#x5343; (Japanese: sen, Chinese: qian)
10,000:  &#x4e07; (Japanese: man)
10,000:  &#x842c; (Chinese: wan)
10^8: &#x5104; (Japanese: oku)
10^8: 亿 &#x4ebf; (Chinese: yi)
10^12: &#x5146; (Japanese: chou, Chinese: jhao)
*/
Title: Re: Sorts in Masmlib
Post by: deeR44 on July 21, 2020, 05:07:17 PM
Quote from: hutch-- on March 23, 2020, 10:27:17 AM
I should have explained, in QE its the tokeniser that strips out the high ascii, in order, load the text file, tokenise it then sort the tokenised array.
Ok, I give up. What's "QE"? Quality Engineering?
Title: Re: Sorts in Masmlib
Post by: hutch-- on July 21, 2020, 05:54:39 PM
Well, I could be tempted to boast about it being "Quality Engineering" but its the abbreviation of the editor I write that is installed in the MASM32 SDK. It has been called "Quick Editor" for over 20 years from its pre year 2000 version up to current.
Title: Re: Sorts in Masmlib
Post by: deeR44 on November 02, 2020, 05:05:27 PM
Quote... It has been called "Quick Editor" for over 20 years from its pre year 2000 version up to current.

Thank you, Hutch. I'm not familiar with many editors since I've been using one called "VEDIT" for something like forty years.