(http://www.phrio.biz/download/$CatchWebImages.jpg)
Use the arrows of the keyboard (Up and Down) to see all the images. I removed the scrollbar which was not very sexy.
Now I think it is finished. When I began this program I said that it's two days programming... I did not think how curious, strange, web programmers are. In the previous version I stopped if the file image extension was not found rather that continuing...
If you try under Pinterest, it downloads images files that you can't open with any image editor. That is because these files are html files. Inside them you can see the html code!
Some parts of the code have been improved.
I make the following searches :
- src="
- src='
- href="
- href='
- url="
- url='
- href="/\/\
- href='/\/\
- "https:
- 'https:
- "http:
- 'http:
- data-desktop-url="
- data-desktop-url='
- data-tablet-url="
- data-tablet-url='
- data-phone-url="
- data-phone-url='
- data-full-url="
- data-full-url='
- data-url="
- data-url='
- data-src="
- data-src='
- data-highres="
- data-highres='
- .jpg"
- .png"
- .jpg?
- .png?
Download : From my site (http://www.phrio.biz/mediawiki/Catch_Web_Pages)
Thanks to : JJ2007, Siekmanski, ToutEnAsm and Guga for their help and their comments.
great work :t
Hello Grincheux
I have test your CatchWeb but your dialog froze a while by downloading :icon_confused:
Why use your downloader not a Thread?
Greets,
I had the same problem with/without a thread.
New version, shorter, cleaner, quicker! Even if the program is... bigger, it's Hutch's fault!
I only search by extension, once the extension is found I make a backward loop to find " or '.
This part could be improved using SSE but I don't know.
I don't very if the file is an html file but if it is a valid image file like Siekmanski said :
Quote from: Siekmanski on December 29, 2015, 07:27:12 AM
Hi ToutEnMasm,
Just skip "PHPSESSID=8b66972e915875a2a5ed239bbb76b96b&" no cookies needed to download the avatar.
To know the kind of image-format, you could check the file by reading the first 10 bytes and check those and then write the correct file-extension.
BMP = BM at offset 0
GIF = GIF87a or GIF89a at offset 0
JPG = JFIF at offset 6
PNG = PNG at offset 1
If you download the second html-page of Topic InString = http://masm32.com/board/index.php?topic=4946.15
Then search for "avatar" src=" you'll find 5 different avatars
http://masm32.com/board/index.php?action=dlattach;attach=2;type=avatar hutch
http://masm32.com/board/index.php?action=dlattach;attach=4892;type=avatar Grincheux
http://masm32.com/board/index.php?action=dlattach;attach=164;type=avatar Siekmanski
http://masm32.com/board/index.php?action=dlattach;attach=407;type=avatar ToutEnMasm
http://masm32.com/board/index.php?action=dlattach;attach=6;type=avatar jj2007
Download (http://www.phrio.biz/mediawiki/Catch_Web_Pages)
Hello Grincheux
There's another web image format - SVG - Scalable Vector Graphics (http://www.webopedia.com/TERM/S/SVG.html) which is definitely not easy to handle in application like yours.
Copy this and paste into your browser's address bar

Don't be afraid , it's an image . Here's how I've got it:
Yes I knew. It's the format used by wikipedia commons.
Base64 decoding i done that for Phillipe on the RosAsm ported version i sent to him. Decoding with base64 is not that hard. Here is a base64 decoder
[b64_table_decode: B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
B$ 0-1, 0-1, 0-1, 62, 0-1, 0-1, 0-1, 63, 52, 53
B$ 54, 55, 56, 57, 58, 59, 60, 61, 0-1, 0-1
B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0, 01, 02, 03, 04
B$ 05, 06, 07, 08, 09, 10, 11, 12, 13, 14
B$ 15, 16, 17, 18, 19, 20, 21, 22, 23, 24
B$ 25, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 26, 27, 28
B$ 29, 30, 31, 32, 33, 34, 35, 36, 37, 38
B$ 39, 40, 41, 42, 43, 44, 45, 46, 47, 48
B$ 49, 50, 51, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1]
;;
input : Input the encoded base64 string.
output: Output the decoded string
lenght: The len of the input including the zero termination char.
Example:
[Guga: B$ 0 #128]
call base64_decode {B$ "aGVsbG8=", 0}, Guga, 9
;;
Proc base64_decode:
Arguments @input, @output, @lenght
Uses ebx, esi, edi
mov ecx D@lenght
mov ebx b64_table_decode
mov edi D@output
mov esi D@input
@loop:
xor eax eax
L1:
dec ecx | js @no_input
lodsb
xlatb
test al al | js L1<
mov edx eax
shl edx 2
L2:
dec ecx | js @save_last
lodsb
xlatb
test al al | js L2<
mov ah al
shr al 4
shl ah 4
or edx eax
L3:
dec ecx | js @save_last
lodsb
xlatb
test al al | js L3< ; Code04010CD
mov ah al
shr al 2
shl ah 6
shl eax 8
or edx eax
L4:
dec ecx | js @save_last
lodsb
xlatb
test al al | js L4<
shl eax 16
or edx eax
mov D$edi edx
add edi 3
jmp @loop
@save_last:
mov D$edi edx
add edi 3
@no_input:
sub edi D@output
mov eax edi
EndP
Also, could be good if you analyse the image from the img src tag (or other tags as described on the 1st post) and not only the extension. This will grant that all you are trying to grab are images. When they have a extention (jpg, gif etc) ok, you do the normal way, but when they are encoded with base64, for example you need to set the data contents to the proper decoder function. In other cases, you can simply check for the Image signatures.
Example of other cases are addressees inside img src tag that are like this:
https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcQZyYQTPbazTzPEtYgTdu4ypPm_K6X44KecNqnoSWjTvbfVkeZ3PqetsSYs
The above link is a image. All it have to be done is set the header parameter of InternetOpenUrl to point to a specific address. In this case "encrypted-tbn2.gstatic.com".
A pseudo code can be represented as:
;;
[lpszHeaders: B$ "Host: encrypted-tbn2.gstatic.com
Accept: */*
", 0]
call 'wininet.InternetOpenUrlA' D@hInternet, D@lpszUrl, lpszHeaders, 0-1, &INTERNET_FLAG_RELOAD, 0
Then, once the url is opened you smply check for the intial tags "JFIF", GIF", "PNG" etc
Once the internet api grab the data, all you need is check for their signatures to know what format they have ;)
So, basically, you can use 3 methods to grab the data.
1) the direct way checking for the extension as it is already (No need for setting a host)
2) a deeper way checking for encoders such as base64 (No need for setting a host)
3) another direct method when no extension is found collecting the data and analyzing it´s signature. (It is better setting a host)
And all of them always checking 1st for what is inside "img src" tag (or other tags as described by Phillipe). So, what can be done 1st before all the analysis is simply collect all data and parse it to list everything existent inside the "img src" tag. Once the list is ready, then the other methods can took place to distinguish what is what ;)
I am not couragous, I know that I must set base64.
In the next version...
Now I am decoding code 81h
What is code 81h?
I think you will understand once you follow the link : http://www.phrio.biz/mediawiki/Code_81h
Ahn..it´s for the disassembler.
But, you realize that this can also be a jno, sbb, adc, add,cmp, xor instructions, right?
If you want to distinguish, the easier to do is analyse the byte that immediately follows it. Take a look at DisEngine title (inside RosAsm) and check for the Op81 function.
; esi points to the 1st byte after 081
mov bl B$esi | inc esi | DigitMask bl To al ; ModRm with /2 ?
.If al = 0
mov B$LockPrefix &FALSE
mov D$edi 'add ' | add edi 4
.Else_If al = 1 ; OR r/m16,imm16 // OR r/m32,imm32
mov B$LockPrefix &FALSE
mov D$edi 'or ' | add edi 3
.Else_If al = 2
mov B$LockPrefix &FALSE
mov D$edi 'adc ' | add edi 4
.Else_If al = 3 ; 81 /3 iw SBB r/m16,imm16 ; 81 /3 id SBB r/m32,imm32
mov B$LockPrefix &FALSE
mov D$edi 'sbb ' | add edi 4
.Else_If al = 4
mov B$LockPrefix &FALSE
mov D$edi 'and ' | add edi 4
.Else_If al = 5 ; 81 /5 iw SUB r/m16,imm16 ; 81 /5 id SUB r/m32,imm32
mov B$LockPrefix &FALSE
mov D$edi 'sub ' | add edi 4
.Else_If al = 6 ; 81 /6 iw XOR r/m16,imm16 ; 81 /6 id XOR r/m32,imm32
mov B$LockPrefix &FALSE
mov D$edi 'xor ' | add edi 4
.Else_If al = 7
mov D$edi 'cmp ' | add edi 4
.End_If
The macro DigitMask is unfolded as
MOV AL BL
AND AL 00111000
SHR AL 3
80h, 81h and 83h are the biggest. +66h +67h...
I have found that JWasm does not allow AAD 16 or AAM 16 for 16 bits applications !
Opcodes D5 xx and D4 xx where xx = the numeric base. By default it's ten this is the reason they are encoded D5 0A and D4 0A. But you can change the base and if you select the base 16 it becomes D5 10 and D4 10 !
Good Night and Thank you.
Guga, what happens when you decode this : http://www.phrio.biz/mediawiki/Strange_Codings
For the opcode AAD 16 and AAM 16 they also are named AADB 16 and AAMB 16
Op67 is, in general, a escape prefix.
It is only a valid instruction for SSE (packuswb) when it contains before 66+0F (66 is other prefix, btw)