The MASM Forum

General => The Laboratory => Topic started by: guga on April 27, 2019, 09:32:07 AM

Title: URL Encoder/decode/escape/unescape
Post by: guga on April 27, 2019, 09:32:07 AM
Hi guys

Someone have a example of how to use the necessary apis to do a URI Encode/encode ? (And also, uri escape/unescape) ?

The goal is to do things like this:

Input text:
"Hello World.
From 2019."

Output text:
Hello%20World.%0D%0AFrom%202019.

or...

Input:
Hello world. My name is Guga.How are you doing ?
Output
Hello%20world.%20My%20name%20is%20Guga.How%20are%20you%20doing%20?


And using also the escape/unescape routines as in:
http://xkr.us/articles/javascript/encode-compare/

https://www.freeformatter.com/json-escape.html This have some info too of escape/unescape method)


Note: What is needed is to convert/encode/escape a large text file (Around 45 Mb of plain text. In and out utf8 format) and do the operation back.

I´m doing a translation routines using google api, but i faced this tiny problem of encoding (or escape -  i don´t know exactly wich one of them is the proper way to fix. So´, i´ll test it both.). It seems google handles better using escape texts rather then encoded.
Title: Re: URL Encoder/decode/escape/unescape
Post by: aw27 on April 27, 2019, 04:03:39 PM
You don't need, I mean you should not URL-encoded, large amounts of data.
This is for GET, where data is sent on the command line after the url.

Large amounts of data are sent using POST. With POST, you define what type of data you are sending on the Content-Type of the header. You fill in the remaining header fields, after that you insert a couple of carriage returns/line feeds and then append the data you want to send.

This is the way servers send HTML web pages to your browser, as you know they are not URL-encoded. Or when you receive .zipped files.
Title: Re: URL Encoder/decode/escape/unescape
Post by: morgot on April 27, 2019, 07:39:29 PM
There is sample in pure Masm, but comments in Russian
https://kaimi.io/wp-content/uploads/2009/05/converter.zip

desciption (rus) https://kaimi.io/2009/06/%D1%83%D0%BD%D0%B8%D0%B2%D0%B5%D1%80%D1%81%D0%B0%D0%BB%D1%8C%D0%BD%D1%8B%D0%B9-%D0%BA%D0%BE%D0%BD%D0%B2%D0%B5%D1%80%D1%82%D1%80-%D1%82%D0%B5%D0%BA%D1%81%D1%82%D0%B0-1-0/
Title: Re: URL Encoder/decode/escape/unescape
Post by: aw27 on April 28, 2019, 02:08:20 AM
Quote from: morgot on April 27, 2019, 07:39:29 PM
There is sample in pure Masm, but comments in Russian
https://kaimi.io/wp-content/uploads/2009/05/converter.zip

desciption (rus) https://kaimi.io/2009/06/%D1%83%D0%BD%D0%B8%D0%B2%D0%B5%D1%80%D1%81%D0%B0%D0%BB%D1%8C%D0%BD%D1%8B%D0%B9-%D0%BA%D0%BE%D0%BD%D0%B2%D0%B5%D1%80%D1%82%D1%80-%D1%82%D0%B5%D0%BA%D1%81%D1%82%D0%B0-1-0/

At first sight it appears to work, except that transliteration part, (the last option of the combo box). Of course, it may work with the Windows-1251 encoding (Cyrillic). 
Title: Re: URL Encoder/decode/escape/unescape
Post by: TimoVJL on April 28, 2019, 03:42:53 AM
One idea to convert to an asm code.
// https://en.wikipedia.org/wiki/Percent-encoding#Types_of_URI_characters
//! # $ & ' ( ) * + , / : ; = ? @ [ ]
//%21 %23 %24 %26 %27 %28 %29 %2A %2B %2C %2F %3A %3B %3D %3F %40 %5B %5D
// "!*'();:@&=+$,/?%#[]";
//https://secure.n-able.com/webhelp/NC_9-1-0_SO_en/Content/SA_docs/API_Level_Integration/API_Integration_URLEncoding.html
char url_table[256] =
{// 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F,0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, // 0x 1x
1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1, // 2x 3x
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,0, // 4x 5x
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1, // 6x 7x
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
};
short hex_table[256] =
{
0x3030,0x3130,0x3230,0x3330,0x3430,0x3530,0x3630,0x3730,0x3830,0x3930,0x4130,0x4230,0x4330,0x4430,0x4530,0x4630,
0x3031,0x3131,0x3231,0x3331,0x3431,0x3531,0x3631,0x3731,0x3831,0x3931,0x4131,0x4231,0x4331,0x4431,0x4531,0x4631,
0x3032,0x3132,0x3232,0x3332,0x3432,0x3532,0x3632,0x3732,0x3832,0x3932,0x4132,0x4232,0x4332,0x4432,0x4532,0x4632,
0x3033,0x3133,0x3233,0x3333,0x3433,0x3533,0x3633,0x3733,0x3833,0x3933,0x4133,0x4233,0x4333,0x4433,0x4533,0x4633,
0x3034,0x3134,0x3234,0x3334,0x3434,0x3534,0x3634,0x3734,0x3834,0x3934,0x4134,0x4234,0x4334,0x4434,0x4534,0x4634,
0x3035,0x3135,0x3235,0x3335,0x3435,0x3535,0x3635,0x3735,0x3835,0x3935,0x4135,0x4235,0x4335,0x4435,0x4535,0x4635,
0x3036,0x3136,0x3236,0x3336,0x3436,0x3536,0x3636,0x3736,0x3836,0x3936,0x4136,0x4236,0x4336,0x4436,0x4536,0x4636,
0x3037,0x3137,0x3237,0x3337,0x3437,0x3537,0x3637,0x3737,0x3837,0x3937,0x4137,0x4237,0x4337,0x4437,0x4537,0x4637,
0x3038,0x3138,0x3238,0x3338,0x3438,0x3538,0x3638,0x3738,0x3838,0x3938,0x4138,0x4238,0x4338,0x4438,0x4538,0x4638,
0x3039,0x3139,0x3239,0x3339,0x3439,0x3539,0x3639,0x3739,0x3839,0x3939,0x4139,0x4239,0x4339,0x4439,0x4539,0x4639,
0x3041,0x3141,0x3241,0x3341,0x3441,0x3541,0x3641,0x3741,0x3841,0x3941,0x4141,0x4241,0x4341,0x4441,0x4541,0x4641,
0x3042,0x3142,0x3242,0x3342,0x3442,0x3542,0x3642,0x3742,0x3842,0x3942,0x4142,0x4242,0x4342,0x4442,0x4542,0x4642,
0x3043,0x3143,0x3243,0x3343,0x3443,0x3543,0x3643,0x3743,0x3843,0x3943,0x4143,0x4243,0x4343,0x4443,0x4543,0x4643,
0x3044,0x3144,0x3244,0x3344,0x3444,0x3544,0x3644,0x3744,0x3844,0x3944,0x4144,0x4244,0x4344,0x4444,0x4544,0x4644,
0x3045,0x3145,0x3245,0x3345,0x3445,0x3545,0x3645,0x3745,0x3845,0x3945,0x4145,0x4245,0x4345,0x4445,0x4545,0x4645,
0x3046,0x3146,0x3246,0x3346,0x3446,0x3546,0x3646,0x3746,0x3846,0x3946,0x4146,0x4246,0x4346,0x4446,0x4546,0x4646,
};

//char *IntToHex(int nNum, char *szBuf, int nLen);
void url_escape(char *s, char *so)
{
while (*s) {
if (url_table[*s])  {
*so++ = '%';
//IntToHex(*s, so, 2);
*(short*)so = hex_table[*s];
so += 2;
} else *so++ = *s;
s++;
}
*so = 0;
}

int __cdecl main(void)
{
char s1[] = "Hello World.\nFrom 2019.\nHello world. My name is Guga.How are you doing ?\n";
char s2[512];
url_escape(s1, s2);
printf("%s\n", s2);
return 0;
}
Hello%20World.%0AFrom%202019.%0AHello%20world.%20My%20name%20is%20Guga.How%20are%20you%20doing%20%3F%0Aobjconv result in zip
Title: Re: URL Encoder/decode/escape/unescape
Post by: guga on April 28, 2019, 09:40:15 AM
Thank you a lot, guys.

This will really be helpfull. I´m doing a translator from english to portuguese using google translator, that uses those encodings as the result. I´ll open another thread on this subject because the server is returning an error 503  after downloading some amount of data.