News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Macro lcase$

Started by clamicun, March 07, 2019, 12:50:41 AM

Previous topic - Next topic

aw27

#30
Now, lower-casing Russian:

Original: КТО-ТО СКАЗАЛ — НЕ ПОМНЮ, КОГДА И ГДЕ
Small case: кто-то сказал — не помню, когда и где

It is true, they will reduce the font size!  :biggrin:



#include <stdlib.h>
#include <stdio.h>
#include <conio.h>
#include <locale.h>
#include <stdint.h>
#include <wchar.h>
#include <Windows.h>

// Assuming the string is in UTF8
char strMB[] = "КТО-ТО СКАЗАЛ — НЕ ПОМНЮ, КОГДА И ГДЕ";
#define _LEN_ 100

int main()
{
wchar_t strWide[_LEN_] = { 0 };
_locale_t loc;
int len, i;
char LowerCaseMB[_LEN_] = { 0 };
wchar_t LowerCase[_LEN_];

len = strlen(strMB);

loc = _create_locale(LC_CTYPE, "Russian");
MultiByteToWideChar(CP_UTF8, 0, strMB, len, strWide, _LEN_);
len = wcslen(strWide);

for (i = 0; i <= len; i++) {
LowerCase[i] = _towlower_l(strWide[i], loc);
}
SetConsoleOutputCP(CP_UTF8);
printf("Original: %s\n", strMB);

WideCharToMultiByte(CP_UTF8, 0, LowerCase, len, LowerCaseMB, _LEN_, NULL, NULL);
printf("Small case: %s\n", LowerCaseMB);

_getch();
return 0;
}


Edited:
Some #include files were not pasted to the original source file.

jj2007

Quote from: clamicun on March 08, 2019, 08:11:24 AM

;=====================
Jochen,
I think I understand what you mean.
Within my search routine - bevor and after Let esi=Lower$(edi) - esi is very frequently used with different offsets.
I am not going to change this. It works perfectly since years.

Three days ago I wanted to make the routine not case sensitiv changing the haystack and the needle to lower case letters and realized that lcase$(offset string) is not going to do the job on Ü Ä Ö.
Of course it would be nicer if it worked not case sensitiv.

You don't have to change it! I often use Let esi="..." because esi is a non-volatile register, and that produces short code. But a global variable Let some$="hello" works equally well, and there is no risk to trash the pointer.

Re case-sensitivity, Instr_("Sometest", "TEST", 1) works fine.

clamicun

Jochen,
Tip of the day. My searchroutine used  "mov eax,find$(1,haystack,needle)".
It is not case insensitive and beside that returns -1 if the strings have the same length.
So - in this case  - you have to compare them.

mov pos,Instr_(1,haystack,needle,1) is definitely much better !
Many thanks Michael
MASM BASIC !!

jj2007

 :biggrin:

Note the special modes, too:
mov pos, Instr_(1,haystack,needle, 1+4)  ; case-insensitive, full word
mov pos, Instr_(1,haystack,needle, 2)  ; case-insensitive for the first character (e.g. Hello = hello)

nidud

#34
deleted

jj2007

Yeah, Norwegian is the only valid codepage and programming language :t

Quote from: nidud on March 09, 2019, 03:28:40 AMAs for the use of SetConsoleOutputCP() this is rather intrusive

Microsoft Windows [Versione 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. Tutti i diritti riservati.

C:\Masm32>chcp
Tabella codici attiva: 850

C:\Masm32>testmb
Hällö Wörld and Привет, мир!

C:\Masm32>chcp
Tabella codici attiva: 65001


And the next time you open a console, it's 850 again. So what?

nidud

#36
deleted

aw27

@nidud  :biggrin:

First and foremost: SetConsoleOutputCP does not change the code page, when the console application terminates all is back to normal.

In relationship to your considerations and desiderata about who uses what and when, let's leave this behavioural part aside. However, I know the best approach is Unicode, the other approaches some times work some times do not work. Germans don't have a code page for themselves, but conversions upper-to-lowercase under CP 1252 appear to work well for Germans. Under OEM CP 850 may not work so well.

nidud

#38
deleted

jj2007

Quote from: AW on March 09, 2019, 06:40:06 AMSetConsoleOutputCP does not change the code page, when the console application terminates all is back to normal.

Actually, what I observe (Win7-64) is that the code page remains active within the same console instance. If you exit and restart the console, it's back to 850 (in my case), but if my application sets cp 65001, the next application launched in the same console will work with 65001, too. Which is pretty irrelevant in most cases - in a batch file, you either continue with 65001, or you set chcp manually.

aw27

@nidud

These old shells you like to use redraw themselves while the codepage has changed when probably should redraw after the application terminate and restored the codepage.
Actually, these are not the only problem these shells face. I prefer Explorer.exe as my shell although have purchased Take Command & TCC which is very good but a little buggy in my opinion.

nidud

#41
deleted

aw27

Quote
Actually, what I observe (Win7-64) is that the code page remains active within the same console instance

I don't observe it neither in Windows 10 nor in Windows 7 64-bit when I build the above code:



However, I can observe it if I change the Font in the console System menu after the application terminates. I have no explanation for that, right now.  :(

jj2007

Quote from: AW on March 09, 2019, 06:18:49 PMI don't observe it neither in Windows 10 nor in Windows 7 64-bit when I build the above code:

Strange. Can you post your exe, please? I tried to build it but my 3 C settings report errors.

TimoVJL

#44
C:\code\PellesC\Console>chcp
Active code page: 850

C:\code\PellesC\Console>TestCP.exe
Текст на кирилица

C:\code\PellesC\Console>chcp
Active code page: 850
#define WIN32_LEAN_AND_MEAN
#include <windows.h>

void __cdecl mainCRTStartup(void)
{
char szCyr[] = u8"Текст на кирилица\r\n";
SetConsoleOutputCP(65001);
WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), szCyr, sizeof(szCyr)-1, NULL, NULL);
ExitProcess(0);
}
EDIT: uppercase with Win32 API#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#pragma comment(lib, "user32.lib")

void __cdecl mainCRTStartup(void)
{
char szCyr[] = u8"Текст на кирилица\r\n";
wchar_t wczCyr[sizeof(szCyr)];
char szCyrC[sizeof(szCyr)];
UINT uiCP = GetConsoleOutputCP();
MultiByteToWideChar(CP_UTF8, 0, szCyr, sizeof(szCyr), wczCyr, sizeof(szCyr));
CharUpperBuffW(wczCyr, sizeof(wczCyr));
WideCharToMultiByte(CP_UTF8, 0, wczCyr, -1, szCyrC, sizeof(szCyrC), 0, 0);
if (uiCP != 65001)
SetConsoleOutputCP(65001);
WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), szCyr, sizeof(szCyr)-1, NULL, NULL);
WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), szCyrC, sizeof(szCyrC)-1, NULL, NULL);
SetConsoleOutputCP(uiCP);
ExitProcess(0);
}
May the source be with you