I tried several of the string sorts in Masmlib and got what I think are strange results.
The sorts move high ascii (>127) to the top of the list, that it they think high ascii is smaller than low ascii characters. ?
Is this normal??
assort
input output
10 ½½½½
1. ¼this
3. ═╦╩a
11 1
½½½½ 1.
1 10
1; 11
¼this 1;
is 3.
═╦╩a is
test test
acisort
input output
10 ½½½½
1. ¼this
3. ═╦╩a
11 1
½½½½ 1.
1 10
1; 11
¼this 1;
is 3.
═╦╩a is
test test
cstsorta
input output
10 ½½½½
1. ¼this
3. ═╦╩a
11 1
½½½½ 1.
1 10
1; 11
¼this 1;
is 3.
═╦╩a is
test test
Is correct string order :thumbsup:
nothing < symbols < numbers < letter
But "1;" is wrong!
Quote from: HSE on March 22, 2020, 05:55:10 AM
Is correct string order :thumbsup:
nothing < symbols < numbers < letter
What? Where did you get that logic?
Quote from: jimg on March 22, 2020, 07:24:38 AM
What? Where did you get that logic?
:biggrin: Windows explorer, DOS "dir", Linux "ls" (if I remember well)
It is difficult to find a program to download to sort strings for testing. So far, every one, and every online site I tried, sorts the high ascii characters to the end.
The only exception I can find is Microsoft's command line sort. (So in my opinion and only my opinion, Microsoft screwed it up.)
Taking a look at the assort (and the asqsort which does the work) code, the problem is obvious. All the comparisons are signed (jg jl etc.) rather than unsigned (ja jb etc.). Why would one think ascii characters are signed? Well, I just made the same mistake in a program I'm working on and it took me a week to find the problem, so not hard to screw up.
I'll just gen up a copy from masmlib to use for testing. But others that might use the routines should be aware.
Sorting of text means "alphabetical sort order", it is correct that high ascii occur higher up than normal alphabetical text. I am not sure what you are after, perhaps you need to write a custom sort that does what you require.
Quote from: hutch-- on March 22, 2020, 10:01:59 AM
Sorting of text means "alphabetical sort order", it is correct that high ascii occur higher up than normal alphabetical text. I am not sure what you are after, perhaps you need to write a custom sort that does what you require.
Hi Hutch-
That was very bad of me in the first post to say "higher up". At the time, I meant visually higher up the list. Very bad terminology on my part.
I agree with your statement, I think. "it is correct that high ascii occur higher up than normal alphabetical text" which I interpret to mean high ascii should be put out
Later in the list, after the normal characters. Unfortunately, the alpha sorts in Masmlib do not do this, they put the high ascii characters out first, before the normal characters.
It only took a few minutes to copy the four procs involved into a file and change the jumps as required. So far it looks good, but more testing is needed.
Jim,
I am still not sure what you are after, text sort = a-z - A-Z, the rest do not matter. If you have some special requirement, then design another sort or adapt one of the existing ones but a string sort works of the alphabet, not the whole range of ascii characters. The problem is you are trying to redefine what a string sort is supposed to do.
Oh. Okay then. No problem.
Is this the order that you consider correct?
QSort
input output
10 1
1. 1.
3. 10
11 11
«««« 1;
1 3.
1; is
¬this test
is ««««
ÍËÊa ¬this
test ÍËÊa
Yes, don't you?
Edit-
Here's what I did to the masmlib procs to get the "correct" results-
Quote from: jimg on March 22, 2020, 04:04:04 PM
Yes, don't you?
Me too. QSort() (http://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1175) uses this order, also in case-insensitive mode.
It becomes more complex with Utf8 text or Umlaute (äöü), of course:
1.1 motiviation
1.2 Sorting in Java
1. unbelievable point
1 Introduction
Adele
andere
Ändern
anders
ängstlich
zippy
zone
Apparently, Java can do that (https://medium.com/fme-developer-stories/how-to-sort-umlaute-in-java-correctly-13f3262f15a1). It's probably slow, because you need a lookup table to do that (QSort has one, but it's almost undocumented).
What is your specific problem, i.e. why do you need characters above Ascii 127? Just curious.
I'm using the masmlib sorts as my "standard" to test some code I'm working on. The problem showed up when I used Windows.inc from masmlib as a test file, it contains some high ascii on line 104 ( "; ««««" ). So now I'm off on this side track trying to get my tools working so I can go back to my program. Yes, I want my program to handle all possible byte values properly, not specifically related to sorting text files with high ascii.
I'm not trying to be cryptic, it's a stupid program I'm just doing for "fun", but in the end, it all pays the same.
Jim,
You can test the two sorts that are in QE. First you left align the entire text so that you avoid leading spaces then sort the text and you will see what its supposed to do.
This is the actual QE code that calls the sorts.
.if flag == 0
invoke assort,parr,ebx,0
.else
invoke dssort,parr,ebx,0
.endif
Interesting Hutch, it just throws away any line that starts with a high ascii character.
Below if before and after sort.
I could mark, that for strings in C exists collate strings function - strcol :eusa_boohoo:
I should have explained, in QE its the tokeniser that strips out the high ascii, in order, load the text file, tokenise it then sort the tokenised array.
Quote from: jj2007 on March 22, 2020, 08:35:56 PM
Quote from: jimg on March 22, 2020, 04:04:04 PM
Yes, don't you?
Me too. QSort() (http://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1175) uses this order, also in case-insensitive mode.
It becomes more complex with Utf8 text or Umlaute (äöü), of course:
1.1 motiviation
1.2 Sorting in Java
1. unbelievable point
1 Introduction
Adele
andere
Ändern
anders
ängstlich
zippy
zone
Apparently, Java can do that (https://medium.com/fme-developer-stories/how-to-sort-umlaute-in-java-correctly-13f3262f15a1). It's probably slow, because you need a lookup table to do that (QSort has one, but it's almost undocumented).
What is your specific problem, i.e. why do you need characters above Ascii 127? Just curious.
this would be interesting to make custom sort for:
look it seems like random order,almost like sort Roman numbers need to be,I=1,C=100,M=1000,X=10,but ascii in complete other order,C,I,M,X
/*
1: 一 一
2: 二 二
3: 三 三
4: 四 四
5: 五 五
6: 六 六
7: 七 七
8: 八 八
9: 九 九
10: 十 十
11: 十一 十一
12: 十二 十二
20: 二十 二十
50: 五十 五十
100: 百 百 (Japanese: hyaku, Chinese: bai)
1000: 千 千 (Japanese: sen, Chinese: qian)
10,000: 万 万 (Japanese: man)
10,000: 萬 萬 (Chinese: wan)
10^8: 億 億 (Japanese: oku)
10^8: 亿 亿 (Chinese: yi)
10^12: 兆 兆 (Japanese: chou, Chinese: jhao)
*/
Quote from: hutch-- on March 23, 2020, 10:27:17 AM
I should have explained, in QE its the tokeniser that strips out the high ascii, in order, load the text file, tokenise it then sort the tokenised array.
Ok, I give up. What's "QE"? Quality Engineering?
Well, I could be tempted to boast about it being "Quality Engineering" but its the abbreviation of the editor I write that is installed in the MASM32 SDK. It has been called "Quick Editor" for over 20 years from its pre year 2000 version up to current.
Quote... It has been called "Quick Editor" for over 20 years from its pre year 2000 version up to current.
Thank you, Hutch. I'm not familiar with many editors since I've been using one called "VEDIT" for something like forty years.