Is it true that lcase$ doesn't work with Ü Ä Ö ?
include \masm32\include\masm32rt.inc
.data
example1 db "ANWEISUNG",0
example2 db "ÜBERWEISUNG",0
example3 db "ÖLTANKER",0
example4 db "ÄCHTUNG",0
.code
start:
INVOKE MessageBox,0,offset example1,offset example1,0
mov eax,lcase$(offset example1)
INVOKE MessageBox,0,offset example1,offset example1,0
INVOKE MessageBox,0,offset example2,offset example2,0
mov eax,lcase$(offset example2)
INVOKE MessageBox,0,offset example2,offset example2,0
INVOKE MessageBox,0,offset example3,offset example3,0
mov eax,lcase$(offset example3)
INVOKE MessageBox,0,offset example3,offset example3,0
INVOKE MessageBox,0,offset example4,offset example4,0
mov eax,lcase$(offset example4)
INVOKE MessageBox,0,offset example4,offset example4,0
INVOKE ExitProcess,0
end start
If you are using a single byte character, it uses the algo in the library "szLower" which works with the uppercase alphabetical range of A to Z.
Try Lower$().
Jochen,
würde ich ja, aber - wie so oft bei MasmBasic verstehe ich die Syntax nicht.
Let esi=offset example2
Lower$ ??
If you like win32 API: CharLowerBuff (https://docs.microsoft.com/en-us/windows/desktop/api/winuser/nf-winuser-charlowerbuffa)
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#pragma comment(lib, "user32.lib")
#pragma comment(lib, "msvcrt.lib")
void __cdecl mainCRTStartup(void)
{
char soem[30], s[] = "GRÜEßEN ÖLTANKER ÄCHTUNG";
CharLowerBuffA(s, sizeof(s));
CharToOemBuff(s, soem, sizeof(s));
puts(soem);
MessageBox(0,s,0,0);
exit(0);
}
GRÜEßEN ÖLTANKER ÄCHTUNG - > grüeßen öltanker ächtung
I attach C code but is easy to convert to ASM in 5 minutes.
Quote from: clamicun on March 07, 2019, 09:09:10 AM
Jochen,
würde ich ja, aber - wie so oft bei MasmBasic verstehe ich die Syntax nicht.
Let esi=offset example2
Lower$ ??
include \masm32\MasmBasic\MasmBasic.inc ; download
.data
example db "ÄÖÜ", 0
SetGlobals my$="German umlauts: ÄÖÜß - and some more: ÉÊÈÆÌÍÎÏÒÓÔÕÖØÙÚÛÜÁÂÀÃÅÞÍÎÌÏÑÓÔÒÕÚÛÙ - ДОБРО ПОЖАЛОВАТЬ"
Init
PrintLine Lower$(offset example)
PrintLine my$
Let esi=Lower$(my$)
PrintLine esi
PrintLine Lower$(my$)
Inkey Upper$(esi)
EndOfCode
Output:
C:\>mb
äöü
German umlauts: ÄÖÜß - and some more: ÉÊÈÆÌÍÎÏÒÓÔÕÖØÙÚÛÜÁÂÀÃÅÞÍÎÌÏÑÓÔÒÕÚÛÙ - ДОБРО ПОЖАЛОВАТЬ
german umlauts: äöüß - and some more: éêèæìíîïòóôõöøùúûüáâàãåþíîìïñóôòõúûù - добро пожаловать
german umlauts: äöüß - and some more: éêèæìíîïòóôõöøùúûüáâàãåþíîìïñóôòõúûù - добро пожаловать
GERMAN UMLAUTS: ÄÖÜß - AND SOME MORE: ÉÊÈÆÌÍÎÏÒÓÔÕÖØÙÚÛÜÁÂÀÃÅÞÍÎÌÏÑÓÔÒÕÚÛÙ - ДОБРО ПОЖАЛОВАТЬ
For comparison:
C:\>aw
grãoeãÿen ã-ltanker ã"chtung
(might be a codepage problem...)
You, or your hidden library function, changed the code page of the console to ANSI but forgot to tell the people where to download your Russian Characters Font. :(
(https://www.dropbox.com/s/d7xcmusf7fapq0x/JJGerman.jpg?dl=1)
Then you went to test my proggy in code page 1252 :badgrin:
Another challenge:
Ć-Ń-Į-Ŗ-Ś ) Ar Čia kas Ų nors kalba ...? -> ć-ń-į-ŗ-ś ) ar čia kas ų nors kalba ...? :t
Many thanks to everyone who responded !
Problem resolved.
Quote from: AW on March 07, 2019, 06:01:29 PMThen you went to test my proggy in code page 1252 :badgrin:
Print (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1110) sets codepage Utf-8. Interesting that it exits with 1252 on your machine. What does the console reply when you try chcp 65001?
Quote from: jj2007 on March 07, 2019, 08:07:54 PM
Print (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1110) sets codepage Utf-8. Interesting that it exits with 1252 on your machine. What does the console reply when you try chcp 65001?
Sorry, I don't see any Print, I just copied and pasted. You need to explain things to me slowly.
My proggy, as it was, will work with the usual OEM codepages - no UTF-8 no Ansi.
The interesting bit is that my exe sets the codepage to 65001 (=Utf8). However, it seems that on your machine it fails to do so - your chcp without args returns 1252. When I launch my exe and do a chcp directly after, it will show 65001.
Jochen,
ich weiss, Du raufst Dir die Haare über soviel Unverstand.
Dies funktioniert.
example db "ÜBERWEISÜNG",0
mov esi,offset example
PrintLine Lower$(esi)
Die Console gibt "überweisüng" aus.
Aber in esi ist immer noch "ÜBERWEISÜNG"
Aber ich benötige den Text in Kleinbuchstaben für eine weitere Verwendung.
Wie leite ich den Consolentext in eine andere Variable um ?
Lower$(esi) allein gibt den Fehler "Syntaxerror Use_Let"
I think I got it
mov edi,offset example
Let esi=Lower$(edi)
INVOKE MessageBox,0,esi,offset example,0
???
Quote from: jj2007 on March 07, 2019, 09:40:43 PM
The interesting bit is that my exe sets the codepage to 65001 (=Utf8). However, it seems that on your machine it fails to do so - your chcp without args returns 1252. When I launch my exe and do a chcp directly after, it will show 65001.
Don't assume I will learn Masm Basic, I inserted this in your code
invoke crt_system, chr$("chcp")
and it displayed Active code page: 1252
Jochen,
My rotine searches in strings from a fairly large database.
This is what I get:
Too many strings: check if local or register
Let x$= get cleared, or increase MbHeapStrings
JJ:
something it's not working!
Pasting result it's perfect (perhaps you are seen screen capture in RichMasm?):
D:\masm32\foro\clamicun>chcp
P gina de c¢digos activa: 850
D:\masm32\foro\clamicun>ascii3
äöü
German umlauts: ÄÖÜß - and some more: ÉÊÈÆÌÍÎÏÒÓÔÕÖØÙÚÛÜÁÂÀÃÅÞÍÎÌÏÑÓÔÒÕÚÛÙ - ????? ??????????
german umlauts: äöüß - and some more: éêèæìíîïòóôõöøùúûüáâàãåþíîìïñóôòõúûù - ????? ??????????
german umlauts: äöüß - and some more: éêèæìíîïòóôõöøùúûüáâàãåþíîìïñóôòõúûù - ????? ??????????
GERMAN UMLAUTS: ÄÖÜß - AND SOME MORE: ÉÊÈÆÌÍÎÏÒÓÔÕÖØÙÚÛÜÁÂÀÃÅÞÍÎÌÏÑÓÔÒÕÚÛÙ - ????? ??????????
D:\masm32\foro\clamicun>chcp
Página de códigos activa: 1252
But what I really see in the console it's not:
(For a moment I think problem was solved, but not)
deleted
AW,
german.zip
_main.cpp
doesn't compile
E:\Dev-Cpp\_Examples\collect2.exe [Error] ld returned 1 exit status
@HSE, AW: What do you see when you do
chcp 65001
chcp
?
Quote from: clamicun on March 07, 2019, 09:52:33 PM
Wie leite ich den Consolentext in eine andere Variable um ?
Lower$(esi) allein gibt den Fehler "Syntaxerror Use_Let"
Let esi=Lower$(esi)
Quote from: clamicun on March 07, 2019, 11:01:59 PM
Too many strings: check if local or register
Let x$= get cleared, or increase MbHeapStrings
This means you are using Let esi="something" improperly. The reason is that Let esi="..." checks if esi points already to a heap location; if yes, it gets reallocated, fine. If not, a new one is being created - and there is a total of only 100 slots. So somewhere in your code you are using esi for other purposes. A simple lodsb or inc esi discards the pointer and forces Let to create a new one:
include \masm32\MasmBasic\MasmBasic.inc
Init
For_ ecx=1 To 200
Let esi=Str$("THIS IS STRING #%i", ecx)
Let esi=Lower$(Mid$(esi, 8 ))
; inc esi ; very bad idea - this discards the pointer to heap memory
Next
Inkey "The last one: ", esi
EndOfCodeI would have to see your relevant code to find the problem. One way out is a global variable:
.DATA?
my$ dd ?
.CODE
Let my$="whatever "+my$+" etc"
Quote from: clamicun on March 08, 2019, 02:10:57 AM
AW,
german.zip
_main.cpp
doesn't compile
E:\Dev-Cpp\_Examples\collect2.exe [Error] ld returned 1 exit status
I don't use in Windows, Linux tools, so can't figure out what the returned 1 exit status means.
Make a new console project in Visual Studio, copy what is in _main.cpp and paste in the initial source file you have.
You should be good to build.
@JJ
chcp 65001 always switches code page for every Windows version for the last 20 years.
Solved :biggrin:
Quote from: M$Note that successfully displaying Unicode characters to the console requires the following:
- The console must use a TrueType font, such as Lucida Console or Consolas, to display characters.
- A font used by the console must define the particular glyph or glyphs to be displayed. The console can take advantage of font linking to display glyphs from linked fonts if the base font does not contain a definition for that glyph.
I changed the font and German characters suddenly appear!
deleted
Quote from: nidud on March 08, 2019, 05:43:15 AM
This should work locally (in Germany that is):
:t
In this case, it is possible to solve without Unicode and for anyone (not only for Germans) only condition is the string stored in CP 1252 (may eventually also work when stored in other code pages like 850 and making adjustments, but did not confirm).
#include <stdlib.h>
#include <stdio.h>
#include <conio.h>
#include <locale.h>
#include <stdint.h>
#include <Windows.h>
// Assuming the string is in ANSI (CP 1252)
char * strMB = "GRÜEßEN ÖLTANKER ÄCHTUNG";
#define _LEN_ 30
int main()
{
_locale_t loc;
int len, i;
char LowerCase[_LEN_];
len = strlen(strMB);
loc = _create_locale(LC_ALL, "German");
for (i = 0; i <= len; i++) {
LowerCase[i] = _tolower_l(strMB[i], loc);
}
SetConsoleOutputCP(1252);
printf("%s\n", LowerCase);
_getch();
return 0;
}
deleted
Quote from: nidud on March 08, 2019, 07:21:21 AM
It is this which make the whole thing so mind boggling.
That will work for all people that work with CP 1252 in Windows for non-Unicode, because Germans also use CP 1252, but it will still be necessary SetConsoleOutputCP(1252) or System("chcp 1252") if the CP of the console is not 1252.
Thanka lot guys,
I'll check it out.
;=====================
Jochen,
I think I understand what you mean.
Within my search routine - bevor and after Let esi=Lower$(edi) - esi is very frequently used with different offsets.
I am not going to change this. It works perfectly since years.
Three days ago I wanted to make the routine not case sensitiv changing the haystack and the needle to lower case letters and realized that lcase$(offset string) is not going to do the job on Ü Ä Ö.
Of course it would be nicer if it worked not case sensitiv.
Now, lower-casing Russian:
Original: КТО-ТО СКАЗАЛ — НЕ ПОМНЮ, КОГДА И ГДЕ
Small case: кто-то сказал — не помню, когда и где
It is true, they will reduce the font size! :biggrin:
#include <stdlib.h>
#include <stdio.h>
#include <conio.h>
#include <locale.h>
#include <stdint.h>
#include <wchar.h>
#include <Windows.h>
// Assuming the string is in UTF8
char strMB[] = "КТО-ТО СКАЗАЛ — НЕ ПОМНЮ, КОГДА И ГДЕ";
#define _LEN_ 100
int main()
{
wchar_t strWide[_LEN_] = { 0 };
_locale_t loc;
int len, i;
char LowerCaseMB[_LEN_] = { 0 };
wchar_t LowerCase[_LEN_];
len = strlen(strMB);
loc = _create_locale(LC_CTYPE, "Russian");
MultiByteToWideChar(CP_UTF8, 0, strMB, len, strWide, _LEN_);
len = wcslen(strWide);
for (i = 0; i <= len; i++) {
LowerCase[i] = _towlower_l(strWide[i], loc);
}
SetConsoleOutputCP(CP_UTF8);
printf("Original: %s\n", strMB);
WideCharToMultiByte(CP_UTF8, 0, LowerCase, len, LowerCaseMB, _LEN_, NULL, NULL);
printf("Small case: %s\n", LowerCaseMB);
_getch();
return 0;
}
Edited:
Some #include files were not pasted to the original source file.
Quote from: clamicun on March 08, 2019, 08:11:24 AM
;=====================
Jochen,
I think I understand what you mean.
Within my search routine - bevor and after Let esi=Lower$(edi) - esi is very frequently used with different offsets.
I am not going to change this. It works perfectly since years.
Three days ago I wanted to make the routine not case sensitiv changing the haystack and the needle to lower case letters and realized that lcase$(offset string) is not going to do the job on Ü Ä Ö.
Of course it would be nicer if it worked not case sensitiv.
You don't have to change it! I often use Let esi="..." because esi is a non-volatile register, and that produces short code. But a global variable Let some$="hello" works equally well, and there is no risk to trash the pointer.
Re case-sensitivity, Instr_ (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1153)("Sometest", "TEST", 1) works fine.
Jochen,
Tip of the day. My searchroutine used "mov eax,find$(1,haystack,needle)".
It is not case insensitive and beside that returns -1 if the strings have the same length.
So - in this case - you have to compare them.
mov pos,Instr_(1,haystack,needle,1) is definitely much better !
Many thanks Michael
MASM BASIC !!
:biggrin:
Note the special modes, too:
mov pos, Instr_(1,haystack,needle, 1+4) ; case-insensitive, full word
mov pos, Instr_(1,haystack,needle, 2) ; case-insensitive for the first character (e.g. Hello = hello)
deleted
Yeah, Norwegian is the only valid codepage and programming language :t
Quote from: nidud on March 09, 2019, 03:28:40 AMAs for the use of SetConsoleOutputCP() this is rather intrusive
Microsoft Windows [Versione 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. Tutti i diritti riservati.
C:\Masm32>chcp
Tabella codici attiva: 850
C:\Masm32>testmb
Hällö Wörld and Привет, мир!
C:\Masm32>chcp
Tabella codici attiva: 65001
And the next time you open a console, it's 850 again. So what?
deleted
@nidud :biggrin:
First and foremost: SetConsoleOutputCP does not change the code page, when the console application terminates all is back to normal.
In relationship to your considerations and desiderata about who uses what and when, let's leave this behavioural part aside. However, I know the best approach is Unicode, the other approaches some times work some times do not work. Germans don't have a code page for themselves, but conversions upper-to-lowercase under CP 1252 appear to work well for Germans. Under OEM CP 850 may not work so well.
deleted
Quote from: AW on March 09, 2019, 06:40:06 AMSetConsoleOutputCP does not change the code page, when the console application terminates all is back to normal.
Actually, what I observe (Win7-64) is that the code page remains active within the same console instance. If you exit and restart the console, it's back to 850 (in my case), but if my application sets cp 65001, the next application launched in the same console will work with 65001, too. Which is pretty irrelevant in most cases - in a batch file, you either continue with 65001, or you set chcp manually.
@nidud
These old shells you like to use redraw themselves while the codepage has changed when probably should redraw after the application terminate and restored the codepage.
Actually, these are not the only problem these shells face. I prefer Explorer.exe as my shell although have purchased Take Command & TCC which is very good but a little buggy in my opinion.
deleted
Quote
Actually, what I observe (Win7-64) is that the code page remains active within the same console instance
I don't observe it neither in Windows 10 nor in Windows 7 64-bit when I build the above code:
(https://www.dropbox.com/s/zom9hoczak8aqyv/russian.png?dl=1)
However, I can observe it if I change the Font in the console System menu after the application terminates. I have no explanation for that, right now. :(
Quote from: AW on March 09, 2019, 06:18:49 PMI don't observe it neither in Windows 10 nor in Windows 7 64-bit when I build the above code:
Strange. Can you post your exe, please? I tried to build it but my 3 C settings report errors.
C:\code\PellesC\Console>chcp
Active code page: 850
C:\code\PellesC\Console>TestCP.exe
Текст на кирилица
C:\code\PellesC\Console>chcp
Active code page: 850
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
void __cdecl mainCRTStartup(void)
{
char szCyr[] = u8"Текст на кирилица\r\n";
SetConsoleOutputCP(65001);
WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), szCyr, sizeof(szCyr)-1, NULL, NULL);
ExitProcess(0);
}
EDIT: uppercase with Win32 API#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#pragma comment(lib, "user32.lib")
void __cdecl mainCRTStartup(void)
{
char szCyr[] = u8"Текст на кирилица\r\n";
wchar_t wczCyr[sizeof(szCyr)];
char szCyrC[sizeof(szCyr)];
UINT uiCP = GetConsoleOutputCP();
MultiByteToWideChar(CP_UTF8, 0, szCyr, sizeof(szCyr), wczCyr, sizeof(szCyr));
CharUpperBuffW(wczCyr, sizeof(wczCyr));
WideCharToMultiByte(CP_UTF8, 0, wczCyr, -1, szCyrC, sizeof(szCyrC), 0, 0);
if (uiCP != 65001)
SetConsoleOutputCP(65001);
WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), szCyr, sizeof(szCyr)-1, NULL, NULL);
WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), szCyrC, sizeof(szCyrC)-1, NULL, NULL);
SetConsoleOutputCP(uiCP);
ExitProcess(0);
}
@JJ
It is here.
deleted
Right, we need to save the Output CP in the beginning then restore it in the end. This will definitely make happy the Necromancer's DOS Navigator :biggrin: , the Doszip Commander, and the Asmc main shell.
Quote from: AW on March 09, 2019, 08:37:54 PM
@JJ
It is here.
Thanks, José. Interesting:
00E010C0 ³> 68 E9FD0000 push 0FDE9 ; ³Arg1 = 0FDE9
00E010C5 ³. FF15 0820E000 call near [<&KERNEL32.SetConsoleOutp ; Àkernel32.SetConsoleOutputCP
There is
no call that would reset the codepage. So I made some more tests and voilà,
mystery solved:
SetConsoleOutputCP (https://docs.microsoft.com/en-us/windows/console/setconsoleoutputcp) function: Sets the output code page used by the console associated with the calling process
SetConsoleCP (https://docs.microsoft.com/en-us/windows/console/setconsolecp) function: Sets the input code page used by the console associated with the calling process
Apart from one being output, the other input, do you see the difference?
No? Me neither, but one of the two sets the CP permanently, the other doesn't. Greetings to Redmond, Micros**t at its best :bgrin:
To set a console's output code page, use the SetConsoleOutputCP function. To set and query a console's input code page, use the SetConsoleCP and GetConsoleCP functions. (https://docs.microsoft.com/en-us/windows/console/getconsoleoutputcp)
Also here. (https://docs.microsoft.com/en-us/windows/console/console-code-pages)
In our cases we are dealing with output so we must restore the change. In my previous code:
Quote
#include <stdlib.h>
#include <stdio.h>
#include <conio.h>
#include <locale.h>
#include <stdint.h>
#include <wchar.h>
#include <Windows.h>
// Assuming the string is in UTF8
char strMB[] = "КТО-ТО СКАЗАЛ — НЕ ПОМНЮ, КОГДА И ГДЕ";
#define _LEN_ 100
int main()
{
wchar_t strWide[_LEN_] = { 0 };
_locale_t loc;
int len, i;
char LowerCaseMB[_LEN_] = { 0 };
wchar_t LowerCase[_LEN_];
UINT oldCodePage = GetConsoleOutputCP();
len = strlen(strMB);
loc = _create_locale(LC_CTYPE, "Russian");
MultiByteToWideChar(CP_UTF8, 0, strMB, len, strWide, _LEN_);
len = wcslen(strWide);
for (i = 0; i <= len; i++) {
LowerCase = _towlower_l(strWide, loc);
}
SetConsoleOutputCP(CP_UTF8);
printf("Original: %s\n", strMB);
WideCharToMultiByte(CP_UTF8, 0, LowerCase, len, LowerCaseMB, _LEN_, NULL, NULL);
printf("Small case: %s\n", LowerCaseMB);
SetConsoleOutputCP(oldCodePage);
_getch();
return 0;
}
Quote from: AW on March 09, 2019, 09:56:14 PMIn our cases we are dealing with output so we must restore the change
The point is, see my post immediately before yours: No, you don't have to. The
output bit is limited to the current process. It is the
input function that causes trouble because it is permanent.
Below a session made with this little testbed built as "setcp" - pure Masm32:
include \masm32\include\masm32rt.inc
.code
start:
MsgBox 0, "Do you want to use SetConsoleCP", "Hi", MB_YESNO
.if eax==IDYES
invoke SetConsoleCP, 65001
print "SetConsoleCP 65001", 13, 10
.endif
invoke SetConsoleOutputCP, 437
print "SetConsoleOutputCP 437"
exit
end start
Microsoft Windows [Versione 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. Tutti i diritti riservati.
C:\Masm32\MasmBasic\Members\aw27>chcp
Tabella codici attiva: 850
C:\Masm32\MasmBasic\Members\aw27>setcp
SetConsoleOutputCP 437
C:\Masm32\MasmBasic\Members\aw27>chcp
Tabella codici attiva: 850
C:\Masm32\MasmBasic\Members\aw27>setcp
SetConsoleCP 65001
SetConsoleOutputCP 437
C:\Masm32\MasmBasic\Members\aw27>chcp
Tabella codici attiva: 65001
C:\Masm32\MasmBasic\Members\aw27>
In the first run of setcp, I reply "no" to the MsgBox, and the codepage remains 850.
In the second run of setcp, I reply "yes" to the MsgBox, and the codepage gets set permanently to 65001 (of course, when exiting the console and launching a new one, it will be 850 again - yet another undocumented behaviour).
deleted
Quote from: nidud on March 09, 2019, 10:59:08 PMBoth of these functions makes "permanent changes" to the current console.
SetConsoleOutputCP changes only the codepage of its own process,
not of the current console. At least on my Windows versions (Win XP, Win7-64, Win 10).
Quote from: jj2007 on March 10, 2019, 01:50:29 AM
SetConsoleOutputCP changes only the codepage of its own process, not of the current console. At least on my Windows versions (Win XP, Win7-64, Win 10).
Your program replace 850 codepage. After close it, codepage is 1252 in current console.
deleted
Quote from: nidud on March 10, 2019, 03:16:43 AMGiven the result from HSE, MasmBasic must invoke both of these functions by default.
That is correct, and I wonder what would change from a user perspective if I dropped the input CP setting.
I have put together a little testbed - pure Masm32. Extract to a folder and launch the batch file.
What is confusing here is that chcp corresponds to SetConsoleCP. There is no DOS command for SetConsoleOutputCP.
deleted