News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

CatchWebImages

Started by Grincheux, January 02, 2016, 03:03:54 PM

Previous topic - Next topic

Grincheux



Use the arrows of the keyboard (Up and Down) to see all the images. I removed the scrollbar which was not very sexy.

Now I think it is finished. When I began this program I said that it's two days programming... I did not  think how curious, strange, web programmers are. In the previous version I stopped if the file image extension was not found rather that continuing...

If you try under Pinterest, it downloads images files that you can't open with any image editor. That is because these files are html files. Inside them you can see the html code!

Some parts of the code have been improved.


I make the following searches :


  • src="
  • src='
  • href="
  • href='
  • url="
  • url='
  • href="/\/\
  • href='/\/\
  • "https:
  • 'https:
  • "http:
  • 'http:
  • data-desktop-url="
  • data-desktop-url='
  • data-tablet-url="
  • data-tablet-url='
  • data-phone-url="
  • data-phone-url='
  • data-full-url="
  • data-full-url='
  • data-url="
  • data-url='
  • data-src="
  • data-src='
  • data-highres="
  • data-highres='
  • .jpg"
  • .png"
  • .jpg?
  • .png?


Download : From my site


Thanks to : JJ2007, Siekmanski, ToutEnAsm and Guga for their help and their comments.

guga

Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

ragdog

Hello Grincheux

I have test your CatchWeb but your dialog froze a while by downloading  :icon_confused:
Why use your downloader not a Thread?

Greets,

Grincheux

I had the same problem with/without a thread.

Grincheux

New version, shorter, cleaner, quicker! Even if the program is... bigger, it's Hutch's fault!

I only search by extension, once the extension is found I make a backward loop to find " or '.
This part could be improved using SSE but I don't know.
I don't very if the file is an html file but if it is a valid image file like Siekmanski said :

Quote from: Siekmanski on December 29, 2015, 07:27:12 AM
Hi ToutEnMasm,

Just skip "PHPSESSID=8b66972e915875a2a5ed239bbb76b96b&" no cookies needed to download the avatar.

To know the kind of image-format, you could check the file by reading the first 10 bytes and check those and then write the correct file-extension.

BMP = BM at offset 0
GIF = GIF87a or GIF89a at offset 0
JPG = JFIF at offset 6
PNG = PNG at offset 1

If you download the second html-page of Topic InString = http://masm32.com/board/index.php?topic=4946.15

Then search for "avatar" src=" you'll find 5 different avatars

http://masm32.com/board/index.php?action=dlattach;attach=2;type=avatar     hutch
http://masm32.com/board/index.php?action=dlattach;attach=4892;type=avatar  Grincheux
http://masm32.com/board/index.php?action=dlattach;attach=164;type=avatar   Siekmanski
http://masm32.com/board/index.php?action=dlattach;attach=407;type=avatar   ToutEnMasm
http://masm32.com/board/index.php?action=dlattach;attach=6;type=avatar     jj2007

Download

GoneFishing

Hello Grincheux

There's another web image format - SVG - Scalable Vector Graphics   which is definitely not easy to handle in application like yours.

Copy this and paste into your browser's address bar




Don't be afraid  , it's an image . Here's how I've got it:


Grincheux

Yes I knew. It's the format used by wikipedia commons.

guga

Base64 decoding i done that for Phillipe on the RosAsm ported version i sent to him. Decoding with base64 is not that hard. Here is a base64 decoder


[b64_table_decode:  B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
                    B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
                    B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
                    B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
                    B$ 0-1, 0-1, 0-1, 62, 0-1, 0-1, 0-1, 63, 52, 53
                    B$ 54, 55, 56, 57, 58, 59, 60, 61, 0-1, 0-1
                    B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0, 01, 02, 03, 04
                    B$ 05, 06, 07, 08, 09, 10, 11, 12, 13, 14
                    B$ 15, 16, 17, 18, 19, 20, 21, 22, 23, 24
                    B$ 25, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 26, 27, 28
                    B$ 29, 30, 31, 32, 33, 34, 35, 36, 37, 38
                    B$ 39, 40, 41, 42, 43, 44, 45, 46, 47, 48
                    B$ 49, 50, 51, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
                    B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
                    B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
                    B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
                    B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
                    B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
                    B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
                    B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
                    B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
                    B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
                    B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
                    B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
                    B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1, 0-1
                    B$ 0-1, 0-1, 0-1, 0-1, 0-1, 0-1]

;;
    input : Input the encoded base64 string.
    output: Output the decoded string
    lenght: The len of the input including the zero termination char.


    Example:
            [Guga: B$ 0 #128]
            call base64_decode {B$ "aGVsbG8=", 0}, Guga, 9
;;

Proc base64_decode:
    Arguments @input, @output, @lenght
    Uses ebx, esi, edi

    mov ecx D@lenght
    mov ebx b64_table_decode
    mov edi D@output
    mov esi D@input

@loop:
    xor eax eax

L1:
    dec ecx | js @no_input
    lodsb
    xlatb
    test al al | js L1<
    mov edx eax
    shl edx 2

L2:
    dec ecx | js @save_last
    lodsb
    xlatb
    test al al | js L2<
    mov ah al
    shr al 4
    shl ah 4
    or edx eax

L3:
    dec ecx | js @save_last
    lodsb
    xlatb
    test al al | js L3<  ; Code04010CD
    mov ah al
    shr al 2
    shl ah 6
    shl eax 8
    or edx eax

L4:
    dec ecx | js @save_last
    lodsb
    xlatb
    test al al | js L4<
    shl eax 16
    or edx eax
    mov D$edi edx
    add edi 3

    jmp @loop

@save_last:
    mov D$edi edx
    add edi 3

@no_input:
    sub edi D@output
    mov eax edi

EndP



Also, could be good if you analyse the image from the img src tag (or other tags as described on the 1st post) and not only the extension. This will grant that all you are trying to grab are images. When they have a extention (jpg, gif etc) ok, you do the normal way, but when they are encoded with base64, for example you need to set the data contents to the proper decoder function. In other cases, you can simply check for the Image signatures.

Example of other cases are addressees inside img src tag that are like this:

https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcQZyYQTPbazTzPEtYgTdu4ypPm_K6X44KecNqnoSWjTvbfVkeZ3PqetsSYs

The above link is a image. All it have to be done is set the header parameter of InternetOpenUrl to point to a specific address. In this case "encrypted-tbn2.gstatic.com".

A pseudo code can be represented as:


;;

    [lpszHeaders: B$ "Host: encrypted-tbn2.gstatic.com
    Accept: */*
    ", 0]

    call 'wininet.InternetOpenUrlA' D@hInternet, D@lpszUrl, lpszHeaders, 0-1, &INTERNET_FLAG_RELOAD, 0

    Then, once the url is opened you smply check for the intial tags "JFIF", GIF", "PNG" etc




Once the internet api grab the data, all you need is check for their signatures to know what format they have ;)

So, basically, you can use 3 methods to grab the data.
1) the direct way checking for the extension as it is already (No need for setting a host)
2) a deeper way checking for encoders such as base64 (No need for setting a host)
3) another direct method when no extension is found collecting the data and analyzing it´s signature. (It is better setting a host)

And all of them always checking 1st for what is inside "img src" tag (or other tags as described by Phillipe). So, what can be done 1st before all the analysis is simply collect all data and parse it to list everything existent inside the "img src" tag. Once the list is ready, then the other methods can took place to distinguish what is what ;)
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Grincheux

I am not couragous, I know that I must set base64.
In the next version...
Now I am decoding code 81h

guga

Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Grincheux

I think you will understand once you follow the link : http://www.phrio.biz/mediawiki/Code_81h

guga

Ahn..it´s for the disassembler.

But, you realize that this can also be a jno, sbb, adc, add,cmp, xor  instructions, right?

If you want to distinguish, the easier to do is analyse the byte that immediately follows it. Take a look at DisEngine title (inside RosAsm) and check for the Op81 function.


         ; esi points to the 1st byte after 081
        mov bl B$esi | inc esi | DigitMask bl To al              ; ModRm with /2 ?

        .If al = 0
            mov B$LockPrefix &FALSE
            mov D$edi 'add ' | add edi 4
        .Else_If al = 1      ; OR r/m16,imm16 // OR r/m32,imm32
            mov B$LockPrefix &FALSE
            mov D$edi 'or  ' | add edi 3
        .Else_If al = 2
            mov B$LockPrefix &FALSE
            mov D$edi 'adc ' | add edi 4
        .Else_If al = 3  ; 81 /3 iw SBB r/m16,imm16  ; 81 /3 id SBB r/m32,imm32
            mov B$LockPrefix &FALSE
            mov D$edi 'sbb ' | add edi 4
        .Else_If al = 4
            mov B$LockPrefix &FALSE
            mov D$edi 'and ' | add edi 4
        .Else_If al = 5  ; 81 /5 iw SUB r/m16,imm16 ; 81 /5 id SUB r/m32,imm32
            mov B$LockPrefix &FALSE
            mov D$edi 'sub ' | add edi 4
        .Else_If al = 6  ; 81 /6 iw XOR r/m16,imm16 ; 81 /6 id XOR r/m32,imm32
            mov B$LockPrefix &FALSE
            mov D$edi 'xor ' | add edi 4
        .Else_If al = 7
            mov D$edi 'cmp ' | add edi 4
        .End_If


The macro DigitMask is  unfolded as
    MOV AL BL
    AND AL 00111000
    SHR AL 3
   
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Grincheux

80h, 81h and 83h are the biggest. +66h +67h...
I have found that JWasm does not allow AAD 16 or AAM 16 for 16 bits applications !
Opcodes D5 xx and D4 xx where xx = the numeric base. By default it's ten this is the reason they are encoded D5 0A and D4 0A. But you can change the base and if you select the base 16 it becomes D5 10 and D4 10 !

Good Night and Thank you.

Grincheux

Guga, what happens when you decode this : http://www.phrio.biz/mediawiki/Strange_Codings

Grincheux

For the opcode AAD 16 and AAM 16 they also are named AADB 16 and AAMB 16