Author Topic: URL Encoder/decode/escape/unescape  (Read 867 times)

guga

  • Member
  • *****
  • Posts: 1074
  • Assembly is a state of art.
    • RosAsm
URL Encoder/decode/escape/unescape
« on: April 27, 2019, 09:32:07 AM »
Hi guys

Someone have a example of how to use the necessary apis to do a URI Encode/encode ? (And also, uri escape/unescape) ?

The goal is to do things like this:
 
Input text:
"Hello World.
From 2019."

Output text:
Hello%20World.%0D%0AFrom%202019.

or...

Input:
Hello world. My name is Guga.How are you doing ?
Output
Hello%20world.%20My%20name%20is%20Guga.How%20are%20you%20doing%20?


And using also the escape/unescape routines as in:
http://xkr.us/articles/javascript/encode-compare/

https://www.freeformatter.com/json-escape.html This have some info too of escape/unescape method)


Note: What is needed is to convert/encode/escape a large text file (Around 45 Mb of plain text. In and out utf8 format) and do the operation back.

I´m doing a translation routines using google api, but i faced this tiny problem of encoding (or escape -  i don´t know exactly wich one of them is the proper way to fix. So´, i´ll test it both.). It seems google handles better using escape texts rather then encoded.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

AW

  • Member
  • *****
  • Posts: 2442
  • Let's Make ASM Great Again!
Re: URL Encoder/decode/escape/unescape
« Reply #1 on: April 27, 2019, 04:03:39 PM »
You don't need, I mean you should not URL-encoded, large amounts of data.
This is for GET, where data is sent on the command line after the url.

Large amounts of data are sent using POST. With POST, you define what type of data you are sending on the Content-Type of the header. You fill in the remaining header fields, after that you insert a couple of carriage returns/line feeds and then append the data you want to send.

This is the way servers send HTML web pages to your browser, as you know they are not URL-encoded. Or when you receive .zipped files.


AW

  • Member
  • *****
  • Posts: 2442
  • Let's Make ASM Great Again!
Re: URL Encoder/decode/escape/unescape
« Reply #3 on: April 28, 2019, 02:08:20 AM »
There is sample in pure Masm, but comments in Russian
https://kaimi.io/wp-content/uploads/2009/05/converter.zip

desciption (rus) https://kaimi.io/2009/06/%D1%83%D0%BD%D0%B8%D0%B2%D0%B5%D1%80%D1%81%D0%B0%D0%BB%D1%8C%D0%BD%D1%8B%D0%B9-%D0%BA%D0%BE%D0%BD%D0%B2%D0%B5%D1%80%D1%82%D1%80-%D1%82%D0%B5%D0%BA%D1%81%D1%82%D0%B0-1-0/

At first sight it appears to work, except that transliteration part, (the last option of the combo box). Of course, it may work with the Windows-1251 encoding (Cyrillic). 

TimoVJL

  • Member
  • ***
  • Posts: 476
Re: URL Encoder/decode/escape/unescape
« Reply #4 on: April 28, 2019, 03:42:53 AM »
One idea to convert to an asm code.
Code: [Select]
// https://en.wikipedia.org/wiki/Percent-encoding#Types_of_URI_characters
//! # $ & ' ( ) * + , / : ; = ? @ [ ]
//%21 %23 %24 %26 %27 %28 %29 %2A %2B %2C %2F %3A %3B %3D %3F %40 %5B %5D
// "!*'();:@&=+$,/?%#[]";
//https://secure.n-able.com/webhelp/NC_9-1-0_SO_en/Content/SA_docs/API_Level_Integration/API_Integration_URLEncoding.html
char url_table[256] =
{// 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F,0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, // 0x 1x
1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1, // 2x 3x
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,0, // 4x 5x
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1, // 6x 7x
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
};
short hex_table[256] =
{
0x3030,0x3130,0x3230,0x3330,0x3430,0x3530,0x3630,0x3730,0x3830,0x3930,0x4130,0x4230,0x4330,0x4430,0x4530,0x4630,
0x3031,0x3131,0x3231,0x3331,0x3431,0x3531,0x3631,0x3731,0x3831,0x3931,0x4131,0x4231,0x4331,0x4431,0x4531,0x4631,
0x3032,0x3132,0x3232,0x3332,0x3432,0x3532,0x3632,0x3732,0x3832,0x3932,0x4132,0x4232,0x4332,0x4432,0x4532,0x4632,
0x3033,0x3133,0x3233,0x3333,0x3433,0x3533,0x3633,0x3733,0x3833,0x3933,0x4133,0x4233,0x4333,0x4433,0x4533,0x4633,
0x3034,0x3134,0x3234,0x3334,0x3434,0x3534,0x3634,0x3734,0x3834,0x3934,0x4134,0x4234,0x4334,0x4434,0x4534,0x4634,
0x3035,0x3135,0x3235,0x3335,0x3435,0x3535,0x3635,0x3735,0x3835,0x3935,0x4135,0x4235,0x4335,0x4435,0x4535,0x4635,
0x3036,0x3136,0x3236,0x3336,0x3436,0x3536,0x3636,0x3736,0x3836,0x3936,0x4136,0x4236,0x4336,0x4436,0x4536,0x4636,
0x3037,0x3137,0x3237,0x3337,0x3437,0x3537,0x3637,0x3737,0x3837,0x3937,0x4137,0x4237,0x4337,0x4437,0x4537,0x4637,
0x3038,0x3138,0x3238,0x3338,0x3438,0x3538,0x3638,0x3738,0x3838,0x3938,0x4138,0x4238,0x4338,0x4438,0x4538,0x4638,
0x3039,0x3139,0x3239,0x3339,0x3439,0x3539,0x3639,0x3739,0x3839,0x3939,0x4139,0x4239,0x4339,0x4439,0x4539,0x4639,
0x3041,0x3141,0x3241,0x3341,0x3441,0x3541,0x3641,0x3741,0x3841,0x3941,0x4141,0x4241,0x4341,0x4441,0x4541,0x4641,
0x3042,0x3142,0x3242,0x3342,0x3442,0x3542,0x3642,0x3742,0x3842,0x3942,0x4142,0x4242,0x4342,0x4442,0x4542,0x4642,
0x3043,0x3143,0x3243,0x3343,0x3443,0x3543,0x3643,0x3743,0x3843,0x3943,0x4143,0x4243,0x4343,0x4443,0x4543,0x4643,
0x3044,0x3144,0x3244,0x3344,0x3444,0x3544,0x3644,0x3744,0x3844,0x3944,0x4144,0x4244,0x4344,0x4444,0x4544,0x4644,
0x3045,0x3145,0x3245,0x3345,0x3445,0x3545,0x3645,0x3745,0x3845,0x3945,0x4145,0x4245,0x4345,0x4445,0x4545,0x4645,
0x3046,0x3146,0x3246,0x3346,0x3446,0x3546,0x3646,0x3746,0x3846,0x3946,0x4146,0x4246,0x4346,0x4446,0x4546,0x4646,
};

//char *IntToHex(int nNum, char *szBuf, int nLen);
void url_escape(char *s, char *so)
{
while (*s) {
if (url_table[*s])  {
*so++ = '%';
//IntToHex(*s, so, 2);
*(short*)so = hex_table[*s];
so += 2;
} else *so++ = *s;
s++;
}
*so = 0;
}

int __cdecl main(void)
{
char s1[] = "Hello World.\nFrom 2019.\nHello world. My name is Guga.How are you doing ?\n";
char s2[512];
url_escape(s1, s2);
printf("%s\n", s2);
return 0;
}
Code: [Select]
Hello%20World.%0AFrom%202019.%0AHello%20world.%20My%20name%20is%20Guga.How%20are%20you%20doing%20%3F%0Aobjconv result in zip
May the source be with you

guga

  • Member
  • *****
  • Posts: 1074
  • Assembly is a state of art.
    • RosAsm
Re: URL Encoder/decode/escape/unescape
« Reply #5 on: April 28, 2019, 09:40:15 AM »
Thank you a lot, guys.

This will really be helpfull. I´m doing a translator from english to portuguese using google translator, that uses those encodings as the result. I´ll open another thread on this subject because the server is returning an error 503  after downloading some amount of data.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com