The MASM Forum

General => The Campus => Topic started by: colinr on November 05, 2019, 10:27:21 AM

Title: 128 bit Comparison on 32-bit CPU
Post by: colinr on November 05, 2019, 10:27:21 AM
Hello all,

Firstly, thank's for letting me join this forum.

I'm trying to figure out how to conduct comparison of 128-bit unsigned integers, I've been trying this for months without getting anywhere.

Essentially, I have a data set of IPv6 addresses, which comprises of a start address, end address and the two digit ISO country code to which this block of v6 addresses is assigned to. These are stored as 128-bit little endian integers, and I would like to iterate through this data set to establish which block a particular IP address sits.

So, I'm not testing for equality, but rather if the IP address that I'm looking for is greater or less than each of these blocks.

IPv4 was no problem, these 128-bit IPv6 addresses are killing me!

I'm using WinAsm Studio and MASM32 if that makes any difference.

Thank you in advance.

Title: Re: 128 bit Comparison on 32-bit CPU
Post by: nidud on November 05, 2019, 11:16:53 AM
deleted
Title: Re: 128 bit Comparison on 32-bit CPU
Post by: hutch-- on November 05, 2019, 12:16:18 PM
Hi Colin,

If the task is what I think it is, I would create a structure to match the data layout and then just read the bits you want.
Title: Re: 128 bit Comparison on 32-bit CPU
Post by: aw27 on November 05, 2019, 03:25:33 PM
Quote
I'm trying to figure out how to conduct comparison of 128-bit unsigned integers, I've been trying this for months without getting anywhere

The solution is quite easy but you have 2 cases, either the list of Start IPs is sorted or is unsorted.
Start by making a flowchart, this will make a difference.
Title: Re: 128 bit Comparison on 32-bit CPU
Post by: colinr on November 05, 2019, 06:46:19 PM
Thanks for your replies so far (I've not checked the sample above yet).

Essentially, the data set is numerically in order, low to high, each entry that I iterate through is 34 bytes in length (arbitrary example below).

20010df7 82000000 00000000 00000000 / 20010df7 8200ffff ffffffff ffffffff / ID

u_int128, u_int128, CHAR, CHAR

The IPv6 data is little endian, so I've flipped it around above for ease on the eyeball!

So, for example, it I was looking for this IP address: 20010df7 82000000 1234abcde 12345678, it would fit within the block above.

I've tried using XMM but it would appear that these are floating point registers and not supported as a GP register for direct comparison.

As it can be seen from the example from the data set above, it's not possible to simply compare the first DWORD as this value may also appear in the next entry (not shown) of the data set.


EDIT: The sample above does not seem to work


Title: Re: 128 bit Comparison on 32-bit CPU
Post by: aw27 on November 05, 2019, 09:14:09 PM
This is a sample that is suitable when all ranges are contiguous (i.e no empty IPs zones between ranges). So, here you don't need to know about the End IP. Were the ranges not contiguous you might have multiple Start IPs above your test IP, so you would need to check the End IP to decide what is the country.

Here, this is done with vector instructions SSE 4.1. Everybody has that (except here, some people don't).


.model flat, stdcall

includelib \masm32\lib\msvcrt.lib
printf proto C :ptr, :vararg

IPV6Struct struct
country db 20 dup (0)
startIP OWORD (0)
endIP OWORD (0)
IPV6Struct ends

.data

ipv6_1 IPV6Struct <'country1',20010df7820000000000000000000000h, 20010df78200ffffffffffffffffffffh>
ipv6_2 IPV6Struct <'country2',20020df7820000000000000000000000h, 20020df78200ffffffffffffffffffffh>
ipv6_3 IPV6Struct <'country3',20030df7820000000000000000000000h, 20030df78200ffffffffffffffffffffh>
ipv6_4 IPV6Struct <'country4',20040df7820000000000000000000000h, 20040df78200ffffffffffffffffffffh>
myIP oword 20030df7820000000000000000000001h
msg db "Country is: %s",10,0

.code

main proc uses esi
lea esi, ipv6_1
@checkStart:
movups xmm0, (IPV6Struct ptr [esi]).startIP
movups xmm1, myIP
PCMPGTQ  xmm0, xmm1
or eax,-1 ; clear zero flag
ptest xmm0, xmm1
jnz @exit
add esi, sizeof IPV6Struct
jmp short @checkStart
@exit:
sub esi, sizeof IPV6Struct
invoke printf, offset msg, esi
ret
main endp

end


Title: Re: 128 bit Comparison on 32-bit CPU
Post by: colinr on November 05, 2019, 09:57:34 PM
Thanks,

When I try to assemble this code, the PCMPGTQ and ptest instructions are not recognised.

I'm setting this at the top of the code:

.686
.xmm

I'm using WinAsm Studio and MASM32 (unsure of which version).

Title: Re: 128 bit Comparison on 32-bit CPU
Post by: aw27 on November 05, 2019, 10:10:23 PM
You don't need to change anything. Paste exactly what is above in Notepad and save as test.asm.
Use a recent version of MASM (they come free with Visual Studio community edition). The MASM that comes with MASM32 predates SSE.

Run a batch file containing:
ml -c -coff test.asm
link /entry:main /machine:x86 test.obj
Title: Re: 128 bit Comparison on 32-bit CPU
Post by: colinr on November 06, 2019, 09:07:58 PM
@AW - Thank you

I've got VS2017 installed on my workstation, and all that I needed to do was copy out ml.exe, link,exe, msobj140.dll and mspdb140.dll to my existing MASM32 folder, seems to assemble fine.

So my next steps are to scrutinse the code and attempt to integrate it into my current test code, I'll let you know how I get on.
Title: Re: 128 bit Comparison on 32-bit CPU
Post by: colinr on November 06, 2019, 10:07:42 PM
OK, I've implemented the code, but not getting the expected results.

It's probably the jump condition that I'm getting wrong, but in the sample posted below, I kind of thought that the first instance of the test should fail and and jump back into the loop as the sample IP address should fit within the second block (GB).

The code at the beginning is simply to swap the Endianness to allow the IP addresses to be viewed easier.

I replaced jnz with jb and it appeared to work, but will need to do further testing to make sure, I think it would also be prudent to test against the end address too.

Sorry if I'm being stupid with this, but my brain is mashed.

.data

ipv6 dd 02A002381h,0E8C00000h,000000000h,000000000h
dd 02A002381h,0E8C0FFFFh,0FFFFFFFFh,0FFFFFFFFh
db "CH" ;Switzerland
dd 02A002381h,0E8C10000h,000000000h,000000000h
dd 02A0023FFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh
db "GB" ;Great Britain
dd 02A002400h,000000000h,0000000000h,000000000h
dd 02A003FFFh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh
db "XX" ;Unassigned
dd 02A004000h,000000000h,000000000h,000000000h
dd 02A004007h,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh
db "IR" ;Iran
dd 02A004008h,000000000h,000000000h,000000000h
dd 02A00401Fh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh
db "XX" ;Unassigned
dd 02A004020h,000000000h,000000000h,000000000h
dd 02A004020h,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh
db "DK" ;Denmark
dd 02A004021h,000000000h,000000000h,000000000h
dd 02A00403Fh,0FFFFFFFFh,0FFFFFFFFh,0FFFFFFFFh
db "XX" ;Unassigned

myIP dd 02a0023a8h, 04825eda1h, 0ac528803h, 0001b9000h

.data?
hInstance HINSTANCE ?

V6Start DWORD 4 dup (?)
V6End DWORD 4 dup (?)
ToFind DWORD 4 dup (?)

.const

.code

start:
lea edi, myIP
mov eax, dword ptr[edi]
mov ebx, dword ptr[edi+4]
mov ecx, dword ptr[edi+8]
mov edx, dword ptr[edi+12]

lea edi, ToFind
mov dword ptr[edi], edx
mov dword ptr[edi+4], ecx
mov dword ptr[edi+8], ebx
mov dword ptr[edi+12], eax

lea esi, ipv6

checkStart:
mov eax, dword ptr[esi]
mov ebx, dword ptr[esi+4]
mov ecx, dword ptr[esi+8]
mov edx, dword ptr[esi+12]

mov dword ptr[V6Start], edx
mov dword ptr[V6Start+4], ecx
mov dword ptr[V6Start+8], ebx
mov dword ptr[V6Start+12], eax

mov eax, dword ptr[esi+16]
mov ebx, dword ptr[esi+20]
mov ecx, dword ptr[esi+24]
mov edx, dword ptr[esi+28]

mov dword ptr[V6End], edx
mov dword ptr[V6End+4], ecx
mov dword ptr[V6End+8], ebx
mov dword ptr[V6End+12], eax

movups xmm0, V6Start ;[esi]
movups xmm1, ToFind
PCMPGTQ xmm0, xmm1
or eax,-1 ; clear zero flag
ptest xmm0, xmm1
jnz exit
lea esi, [esi+34]
jmp short checkStart
exit:
sub esi, 34
ret

invoke ExitProcess,0h

End start
Title: Re: 128 bit Comparison on 32-bit CPU
Post by: hutch-- on November 06, 2019, 11:57:33 PM
Colin,

If you begin at the starting offset plus the offset for the member you are after, then you just ADD that data length to get the next member offset in the following data set. If I was working on a data set of this type, I would construct an array of pointers to the start of each data set then just reference the data set plus offset You will find techniques of this type reasonably fast.
Title: Re: 128 bit Comparison on 32-bit CPU
Post by: colinr on November 07, 2019, 12:10:37 AM
Thanks Hutch

The dataset has over 527,000 entries, and changes on a monthly basis as allocations change.

I self generate these datasets from an ANSI encoded csv file which contains  the IP addresses in their decimal notation.

I think I might have figured it out, just testing it today, I'll report back.
Title: Re: 128 bit Comparison on 32-bit CPU
Post by: aw27 on November 07, 2019, 12:35:29 AM
Well, my example is broken  :sad:
What happens is that PCMPGTQ makes a signed compare. So my example worked but  you choose an IP where this qword 0ac528803h0001b9000h is negative. I am not seeing another easy SSE solution at the moment because there is no equivalent packed unsigned compare. Probably you should try to do it using General Purpose Registers as suggested here http://masm32.com/board/index.php?topic=8148.msg89445#msg89445.
Title: Re: 128 bit Comparison on 32-bit CPU
Post by: colinr on November 07, 2019, 12:46:20 AM
@AW  - Don't be so disappointed! Actually you have been a great help.

I've implemented the code into my DLL that does the lookup, and it seems to work!

The code is still in that swaps the Endianness of the addresses, but that is just a necessary evil at the moment.

Take a look at the routine below, is there anything you can think of that will cause a lookup failure?

EDIT: I've tested against the IP address previously quoted and tested against the very last entry in the data-set by 'faking' an IP address to sit within this final range, both worked!

LookUpIPv6Country:

xor edx, edx
xor eax, eax
mov eax, dword ptr[IPv6CountryFileSize]
mov ebx, 34 ;Will be dividing by 34 bytes
div ebx

mov edx, eax
mov esi, dword ptr[IPv6CountryMem] ;Pointer to IPv6 data

lea edi, CountryFormedIpV6_1 ;The IP to find
FindV6CountryLoop:

push edx ;Save loop counter

mov eax, dword ptr[esi]
mov ebx, dword ptr[esi+4]
mov ecx, dword ptr[esi+8]
mov edx, dword ptr[esi+12]
bswap eax
bswap ebx
bswap ecx
bswap edx
mov dword ptr[TempIpV6StartBlock1], edx
mov dword ptr[TempIpV6StartBlock1+4], ecx
mov dword ptr[TempIpV6StartBlock1+8], ebx
mov dword ptr[TempIpV6StartBlock1+12], eax

movups xmm0, [TempIpV6StartBlock1]
movups xmm1, [edi]
PCMPGTQ xmm0, xmm1
or eax,-1 ; clear zero flag
ptest xmm0, xmm1
jb short V6CountryNext

pop edx
lea esi, [esi+34]
dec edx
jne FindV6CountryLoop
jmp NotAV6CountryAddress

V6CountryNext:
lea esi,[esi-34]
mov eax, dword ptr[esi+16]
mov ebx, dword ptr[esi+20]
mov ecx, dword ptr[esi+24]
mov edx, dword ptr[esi+28]
bswap eax
bswap ebx
bswap ecx
bswap edx
mov dword ptr[TempIpV6EndBlock1], edx
mov dword ptr[TempIpV6EndBlock1+4], ecx
mov dword ptr[TempIpV6EndBlock1+8], ebx
mov dword ptr[TempIpV6EndBlock1+12], eax

movups xmm0, [TempIpV6EndBlock1] ;[esi]
movups xmm1, [edi]
PCMPGTQ xmm0, xmm1
or eax,-1 ; clear zero flag
ptest xmm0, xmm1
jb short V6CountryFound

pop edx
lea esi, [esi+34]
dec edx
jne FindV6CountryLoop
jmp short NotAV6CountryAddress

V6CountryFound:
pop edx

lea eax, IPv6CountryISOCode
xor ecx, ecx
mov cl, byte ptr[esi+32]
mov byte ptr[eax], cl
mov cl, byte ptr[esi+33]
mov byte ptr[eax+1], cl
ret

;Routine unused - maybe use in proxy lookup
IPv6CountryIsSame:
ret

NotAV6CountryAddress:
mov byte ptr[IPv6CountryISOCode], "Z"
mov byte ptr[IPv6CountryISOCode+1], "Z"
lea eax, IPv6CountryISOCode
ret
Title: Re: 128 bit Comparison on 32-bit CPU
Post by: hutch-- on November 07, 2019, 01:12:46 AM
Colin,

With a data layout of this type, what is the search criterion you need to use ?
Title: Re: 128 bit Comparison on 32-bit CPU
Post by: colinr on November 07, 2019, 01:21:02 AM
Just identify which country an IP address is assigned to.

I've written a tool to parse log files to identify accounts that may have been compromised.

So I'm looking up the countries (someone who logs in from the UK, should not have account access from Nigeria!).

I then use another data set that's published daily to identify proxies/VPN's and know malicious IP addresses. Everything is fine with IPv4, just IPv6 is painful.

So the tool needs to identify which range a particular IP address sits, from that I then pull out the 2 digit country ISO code.
Title: Re: 128 bit Comparison on 32-bit CPU
Post by: jj2007 on November 07, 2019, 01:53:10 AM
Can you translate them to strings? They are easier to sort and compare.
Title: Re: 128 bit Comparison on 32-bit CPU
Post by: colinr on November 07, 2019, 01:58:09 AM
Would that help as I'm not after equality, more of establishing if they fit within 2 ranges (start and end) x 500,000 times!
Title: Re: 128 bit Comparison on 32-bit CPU
Post by: aw27 on November 07, 2019, 03:26:48 AM
This shall work except when there is no range that fits the IP. I will not cover that case (or any other case  :biggrin:)




.model flat, stdcall

includelib \masm32\lib\msvcrt.lib
printf proto C :ptr, :vararg

IPV6Struct struct
country db 20 dup (0)
startIP OWORD (0)
endIP OWORD (0)
IPV6Struct ends

.data
COMMENT #
ipv6_1 IPV6Struct <'country1',20010df7820000000000000000000000h, 20010df78200ffffffffffffffffffffh>
ipv6_2 IPV6Struct <'country2',20020df7820000000000000000000000h, 20020df78200ffffffffffffffffffffh>
ipv6_3 IPV6Struct <'country3',20030df7820000000000000000000000h, 20030df78200ffffffffffffffffffffh>
ipv6_4 IPV6Struct <'country4',20040df7820000000000000000000000h, 20040df78200ffffffffffffffffffffh>
myIP oword 20030df7820000000000000000000001h ; passed
#
ipv6_1 IPV6Struct <"CH",02A002381E8C000000000000000000000h, 02A002381E8C0FFFFFFFFFFFFFFFFFFFFh>
ipv6_2 IPV6Struct <"GB",02A002381E8C100000000000000000000h, 02A0023FFFFFFFFFFFFFFFFFFFFFFFFFFh>
ipv6_3 IPV6Struct <"XX",02A002400000000000000000000000000h, 02A003FFFFFFFFFFFFFFFFFFFFFFFFFFFh>
ipv6_4 IPV6Struct <"IR",02A004000000000000000000000000000h, 02A004007FFFFFFFFFFFFFFFFFFFFFFFFh>
ipv6_5 IPV6Struct <"XX",02A004008000000000000000000000000h, 02A00401FFFFFFFFFFFFFFFFFFFFFFFFFh>
ipv6_6 IPV6Struct <"DK",02A004020000000000000000000000000h, 02A004020FFFFFFFFFFFFFFFFFFFFFFFFh>
ipv6_7 IPV6Struct <"XX",02A004021000000000000000000000000h, 02A00403FFFFFFFFFFFFFFFFFFFFFFFFFh>

myIP oword 02a0023a84825eda1ac528803001b9000h ; Passed
;myIP oword 02A0040200003300070000000000000eeh ;Passed
msg db "Country is: %s",10,0
msgerror db "Error", 10,0

.code

main proc
lea esi, ipv6_1
lea edi, myIP
mov eax, [edi+3*sizeof dword]
@checkStart:
.while 1
.while 1
.if eax > dword ptr (IPV6Struct ptr [esi]).startIP+12
add esi, sizeof IPV6Struct
.continue
.endif
jb @checkEnd
mov eax, [edi + 2*sizeof dword]
.if eax > dword ptr (IPV6Struct ptr [esi]).startIP+8
add esi, sizeof IPV6Struct
.continue
.endif
jb @checkEnd
mov eax, [edi + sizeof dword]
.if eax > dword ptr (IPV6Struct ptr [esi]).startIP+4
add esi, sizeof IPV6Struct
.continue
.endif
jb @checkEnd
mov eax, [edi]
.if eax > dword ptr (IPV6Struct ptr [esi]).startIP
.break
.else
jmp @error
.endif

.endw
lea edi, myIP
mov eax, [edi+3*sizeof dword]
@checkEnd:
.while 1
.if eax > dword ptr (IPV6Struct ptr [esi]).endIP+12
jmp short @preTop
.elseif eax < dword ptr (IPV6Struct ptr [esi]).endIP+12
jmp short @exit
.endif
mov eax, [edi + 2*sizeof dword]
.if eax > dword ptr (IPV6Struct ptr [esi]).endIP+8
jmp short @preTop
.elseif eax < dword ptr (IPV6Struct ptr [esi]).endIP+8
jmp short @exit
.endif
mov eax, [edi + sizeof dword]
.if eax > dword ptr (IPV6Struct ptr [esi]).endIP+4
jmp short @preTop
.elseif eax < dword ptr (IPV6Struct ptr [esi]).endIP+4
jmp short @exit
.endif
mov eax, [edi]
.if eax > dword ptr (IPV6Struct ptr [esi]).endIP
jmp short @preTop
.else
jmp @exit
.endif
@preTop:
add esi, sizeof IPV6Struct
jmp @checkStart
.endw
.endw
@error:
ret
@exit:
sub esi, sizeof IPV6Struct
invoke printf, offset msg, esi
ret
main endp



end


Title: Re: 128 bit Comparison on 32-bit CPU
Post by: colinr on November 07, 2019, 05:12:38 AM
Good grief!

I'll have to throw that one into OllyDbg as I tend not to use the if statement to figure out what's going on.

Like I said earlier, your previous example, by using jb instead of jnz appears to work just fine.

The data set has now been converted to little endian so I can get rid of all that horrible byte order swapping nonsense.

Slightly off topic, but as I've previously said, I've been using WinAsm Studio and a legacy version of MASM32 (now updated), what would your recommendations be for an IDE to replace WinAsm Studio (ideally one that will support x86 and x64 using MASM32 and MASM64) and a debugger that will support x86/x64 to replace OllyDbg.

Kind Regards
Title: Re: 128 bit Comparison on 32-bit CPU
Post by: aw27 on November 07, 2019, 06:07:53 AM
Quote from: colinr on November 07, 2019, 05:12:38 AM
Slightly off topic, but as I've previously said, I've been using WinAsm Studio and a legacy version of MASM32 (now updated), what would your recommendations be for an IDE to replace WinAsm Studio (ideally one that will support x86 and x64 using MASM32 and MASM64) and a debugger that will support x86/x64 to replace OllyDbg.

This is a matter of personal preferences. I use Notepad++ as an editor for masm and an editor for many other things, so it is always open. Sometimes I use Visual Studio as well for masm when I am doing a masm module for HLL. This is very handy because it builds everything at the same time and the VS debugger is top notch.
Title: Re: 128 bit Comparison on 32-bit CPU
Post by: colinr on November 07, 2019, 08:54:06 PM
@AW Back again!

I've tried to assemble the code above with WinAsm and the method that you described before, both report the following error:

LNK1120: 1 unresolved externals

LINK : error LNK2001: unresolved external symbol _WinMainCRTStartup

I have no clue what this means


Title: Re: 128 bit Comparison on 32-bit CPU
Post by: hutch-- on November 07, 2019, 09:48:14 PM
It means you have not specified an entry point so the linker wants the C entry point code instead.
Title: Re: 128 bit Comparison on 32-bit CPU
Post by: colinr on November 07, 2019, 10:06:01 PM
Spot on! I didn't realise the Start: label was missing.

Thanks for that
Title: Re: 128 bit Comparison on 32-bit CPU
Post by: TimoVJL on November 07, 2019, 10:59:07 PM
just for fun for x64 :smiley:
if main symbol exists, link.exe wants mainCRTStartup as entry point
so just give it ; ml64.exe hello64m.asm
includelib msvcrt
extern printf : proc
extern exit : proc

.data
msg   db "Hello world!",0

.code
public main
main: ; just tells to link.exe of console program
mainCRTStartup proc ; for link.exe if main defined
sub rsp, 28h
mov rcx, offset msg
call printf
call exit
main endp
end

EDIT: ip list:
ip2location.com/lite/ (https://download.ip2location.com/lite/)
IP to Country Lite (https://www.db-ip.com/db/download/ip-to-country-lite)
Title: Re: 128 bit Comparison on 32-bit CPU
Post by: colinr on November 08, 2019, 12:35:40 AM
@AW

I've implemented the new code that you wrote into my project (after reversing it), and initial testing (with 1 IP) indicates that it may be working, however, there were a couple of bugs.

I you take a look at labels L0085100F and L00851050, they were originally on the next instruction down (CMP), I found that EAX was being trashed by the value written to EAX from EDI+8, thus causing the lookup to fail. I suppose I could optomise this by using EBX instead of EAX, and save some cycles instead of reading EDI+12 on each iteration that it returns here.

Other than that, thank's very much for helping me out with this, what in theory is easy to achieve proved extremely difficult.

I also don't understand what condition would cause L008510A2 to be reached.

I'll do some more testing and post back.

LookUpIPv6Country:
xor edx, edx
xor eax, eax
mov eax, dword ptr[IPv6CountryFileSize]
mov ebx, 34 ;Will be dividing by 34 bytes
div ebx

mov edx, eax
mov esi, dword ptr[IPv6CountryMem] ;Pointer to IPv6 data

lea edi, CountryFormedIpV6_1 ;The IP to find

L0085100F:
mov eax,dword ptr [edi+12]

cmp eax,dword ptr [esi+12]
jbe short L00851019
dec edx
je NotAV6CountryAddress
add esi,34
jmp short L0085100F
L00851019:
jb short L00851050
mov eax,dword ptr [edi+8]
cmp eax,dword ptr [esi+8]
jbe short L00851028
dec edx
je short NotAV6CountryAddress
add esi,34
jmp short L0085100F
L00851028:
jb short L00851050
mov eax,dword ptr [edi+4]
cmp eax,dword ptr [esi+4]
jbe short L00851037
dec edx
je short NotAV6CountryAddress
add esi,34
jmp short L0085100F
L00851037:
jb short L00851050
mov eax,dword ptr [edi]
cmp eax,dword ptr [esi]
jbe short L00851043
jmp short L00851047
L00851043:
jmp short L008510A2
L00851045:
jmp short L0085100F
L00851047:
lea edi, CountryFormedIpV6_1
L00851050:
mov eax,dword ptr [edi+12]

cmp eax,dword ptr [esi+28]
jbe short L00851059
jmp short L00851093
L00851059:
cmp eax,dword ptr [esi+28]
jae short L00851060
jmp short L008510A3
L00851060:
mov eax,dword ptr [edi+8]
cmp eax,dword ptr [esi+24]
jbe short L0085106C
jmp short L00851093
L0085106C:
cmp eax,dword ptr [esi+24]
jae short L00851073
jmp short L008510A3
L00851073:
mov eax,dword ptr [edi+4]
cmp eax,dword ptr [esi+20]
jbe short L0085107F
jmp short L00851093
L0085107F:
cmp eax,dword ptr [esi+20]
jae short L00851086
jmp short L008510A3
L00851086:
mov eax,dword ptr [edi]
cmp eax,dword ptr [esi+16]
jbe short L00851091
jmp short L00851093
L00851091:
jmp short L008510A3
L00851093:
dec edx
je short NotAV6CountryAddress
add esi,34
jmp L0085100F

L008510A2:
nop
retn
L008510A3:
nop
sub esi,34

retn

NotAV6CountryAddress:
lea eax, IPv6CountryISOCode
mov byte ptr[eax], "Z"
mov byte ptr[eax+1], "Z"
ret
Title: Re: 128 bit Comparison on 32-bit CPU
Post by: aw27 on November 08, 2019, 12:51:24 AM
 :biggrin:

Before I look at the code, make sure you are using the latest releases either of VS 2017 or 2019 because there was a "while" bug only recently fixed.
Either way, what I made was just a sketch. I can't spend much more time with that. It is a good exercise for you.
Title: Re: 128 bit Comparison on 32-bit CPU
Post by: colinr on November 08, 2019, 01:19:06 AM
No problem,

Figured out that L008510A2 is equality!
Title: Re: 128 bit Comparison on 32-bit CPU
Post by: TimoVJL on November 09, 2019, 08:58:09 PM
Something with C too, but not tested well.
//#include <intrin.h>
//#include <emmintrin.h>
//#pragma comment(lib, "msvcrt.lib")
#pragma comment(linker, "-defaultlib:msvcrt.lib -subsystem:console")
__declspec(dllimport) int printf(const char * format, ...);
__declspec(dllimport) void exit(int status);
//__declspec(dllimport) void __stdcall ExitProcess(long);

typedef struct _IPV6ACC {
//__m128i ipv6a, ipv6a2;
union ipv6a {
unsigned char b[16];
unsigned short w[8];
unsigned long l[4];
unsigned long long ll[2];
//__m128i ul128;
} ipv6a;
union ipv6a2 {
unsigned char b[16];
unsigned short w[8];
unsigned long l[4];
unsigned long long ll[2];
//__m128i ul128;
} ipv6a2;
char cc[3];
} IPV6ACC;

IPV6ACC ipv6acc[] = {
//{.ipv6a.l={0x2A002381,0xE8C00000},.ipv6a2.l={0x2A002381,0xE8C0FFFF,0xFFFFFFFF,0xFFFFFFFF}, {'C','H'}}, // Switzerland
//{.ipv6a.l={0x2A002381,0xE8C10000},.ipv6a2.l={0x2A0023FF,0xE8C1FFFF,0xFFFFFFFF,0xFFFFFFFF}, {'G','B'}}, // Great Britain
{.ipv6a.ll={0x2A002381E8C10000},.ipv6a2.ll={0x2A0023FFE8C1FFFF,0xFFFFFFFFFFFFFFFF}, {'G','B'}}, // Great Britain
};
//unsigned long ipv6ax[4] = {0x2a0023a8,0x4825eda1,0xac528803,0x001b9000};
unsigned long long ipv6ax[4] = {0x2a0023a84825eda1,0xac528803001b9000};

int IsIp6InMask(unsigned long *ip1, unsigned long *ip2, unsigned long *ip3)
{ // ip1 find ip2 lower limit ip3 upper limit
int cnt = 3;
if (*ip1 >= *ip2 && *ip1 <= *ip3) { // is in first part ?
//printf(" %lXh >= %lXh\n", *ip1, *ip2);
do {
ip1++; ip2++; ip3++; // next semi octects
if (*ip1 == *ip2 && *ip1 < *ip3) { // same or below lowerlimit
//printf(" %lXh == %lXh\n", *ip1, *ip3);
return 0;
} else if (*ip1 <= *ip3) { // over lower limit and below upper limit
//printf(" %lXh > %lXh - %lXh\n", *ip1, *ip2, *ip3);
cnt--; // passed
} else return 0; // failed
} while (cnt);
return 1; // passed
}
return 0;
}

int IsIp6InMask32(unsigned long *ip1, unsigned long *ip2, unsigned long *ip3)
{ // ip1 find ip2 lower limit ip3 upper limit
int cnt = 3;
if (*(ip1+1) >= *(ip2+1) && *(ip1+1) <= *(ip3+1)) { // is in first part ?
//printf(" %lXh >= %lXh\n", *ip1, *ip2);
do {
//ip1++; ip2++; ip3++; // next semi octects
if (*ip1 == *ip2 && *ip1 < *ip3) { // same or below lowerlimit
//printf(" %lXh == %lXh\n", *ip1, *ip3);
return 0;
} else if (*ip1 > *ip3) { // over lower limit and below upper limit
//printf(" %lXh > %lXh - %lXh\n", *ip1, *ip2, *ip3);
return 0; // failed
}
ip1++; ip2++; ip3++; // next semi octects
cnt--;
} while (cnt);
return 1; // passed
}
return 0;
}

int IsIp6InMask64(unsigned long long *ip1, unsigned long long *ip2, unsigned long long *ip3)
{ // ip1 find ip2 lower limit ip3 upper limit
if (*ip1 >= *ip2 && *ip1 <= *ip3) { // is in first part ?
//printf(" %llXh >= %llXh\n", *ip1, *ip2);
ip1++; ip2++; ip3++; // next semi octects
if (*ip1 == *ip2 && *ip1 < *ip3) { // same or below lowerlimit
//printf(" %lXh == %lXh\n", *ip1, *ip3);
return 0;
} else if (*ip1 > *ip3) { // over lower limit and below upper limit
//printf(" %lXh > %lXh - %lXh\n", *ip1, *ip2, *ip3);
return 0; // failed
}
return 1; // passed
}
return 0;
}

void __cdecl mainCRTStartup(void)
{
//printf("search:\n%llX %llX\n", *(unsigned long long*)&ipv6ax, *(unsigned long long*)&ipv6ax);
printf("search:\n%lX %lX-%lX %lX\n", ipv6ax[0], ipv6ax[1], ipv6ax[0], ipv6ax[1]);
for (int i=0; i<sizeof(ipv6acc)/sizeof(ipv6acc[0]); i++) {
//printf("%llX %llX\n", ipv6acc[i].ipv6a.ll[0], ipv6acc[i].ipv6a2.ll[0]);
printf("%lX %lX-%lX %lX\n", ipv6acc[i].ipv6a.l[0], ipv6acc[i].ipv6a.l[1], ipv6acc[i].ipv6a2.l[0], ipv6acc[i].ipv6a2.l[1]);
//if (IsIp6InMask(ipv6ax, (unsigned long *)&ipv6acc[i].ipv6a, (unsigned long *)&ipv6acc[i].ipv6a2)) {
if (IsIp6InMask32((unsigned long *)ipv6ax, (unsigned long *)&ipv6acc[i].ipv6a, (unsigned long *)&ipv6acc[i].ipv6a2)) {
//if (IsIp6InMask64((unsigned long long *)ipv6ax, (unsigned long long *)&ipv6acc[i].ipv6a, (unsigned long long *)&ipv6acc[i].ipv6a2)) {
printf("i: %d found %s\n", i+1, ipv6acc[i].cc);
break;
}
}
exit(0);
//ExitProcess(0);
}
_IsIp6InMask32:
00000060  53                       push ebx
00000061  56                       push esi
00000062  8B44240C                 mov eax, dword ptr [esp+Ch]
00000066  8B542410                 mov edx, dword ptr [esp+10h]
0000006A  8B4C2414                 mov ecx, dword ptr [esp+14h]
0000006E  8B5804                   mov ebx, dword ptr [eax+4h]
00000071  3B5A04                   cmp ebx, dword ptr [edx+4h]
00000074  7234                     jb L_AA
00000076  8B5804                   mov ebx, dword ptr [eax+4h]
00000079  3B5904                   cmp ebx, dword ptr [ecx+4h]
0000007C  772C                     jnbe L_AA
0000007E  BB03000000               mov ebx, 3h
00000083  8B30                     mov esi, dword ptr [eax]
00000085  3B32                     cmp esi, dword ptr [edx]
00000087  7506                     jnz L_8F
00000089  8B30                     mov esi, dword ptr [eax]
0000008B  3B31                     cmp esi, dword ptr [ecx]
0000008D  721F                     jb L_AE
0000008F  8B30                     mov esi, dword ptr [eax]
00000091  3B31                     cmp esi, dword ptr [ecx]
00000093  771D                     jnbe L_B2
00000095  83C004                   add eax, 4h
00000098  83C204                   add edx, 4h
0000009B  83C104                   add ecx, 4h
0000009E  4B                       dec ebx
0000009F  85DB                     test ebx, ebx
000000A1  75E0                     jnz L_BF
000000A3  B801000000               mov eax, 1h
000000A8  EB0A                     jmp L_B4
000000AA  31C0                     xor eax, eax
000000AC  EB06                     jmp L_B4
000000AE  31C0                     xor eax, eax
000000B0  EB02                     jmp L_B4
000000B2  31C0                     xor eax, eax
000000B4  5E                       pop esi
000000B5  5B                       pop ebx
000000B6  C3                       ret

IsIp6InMask64:
000000A0  488B02                   mov rax, qword ptr [rdx]
000000A3  483901                   cmp qword ptr [rcx], rax
000000A6  7234                     jb L_DC
000000A8  498B00                   mov rax, qword ptr [r8]
000000AB  483901                   cmp qword ptr [rcx], rax
000000AE  772C                     jnbe L_DC
000000B0  4883C108                 add rcx, 8h
000000B4  4883C208                 add rdx, 8h
000000B8  4983C008                 add r8, 8h
000000BC  488B12                   mov rdx, qword ptr [rdx]
000000BF  483911                   cmp qword ptr [rcx], rdx
000000C2  750B                     jnz L_CF
000000C4  498B00                   mov rax, qword ptr [r8]
000000C7  483901                   cmp qword ptr [rcx], rax
000000CA  7303                     jnb L_CF
000000CC  31C0                     xor eax, eax
000000CE  C3                       ret
000000CF  4D8B00                   mov r8, qword ptr [r8]
000000D2  4C3901                   cmp qword ptr [rcx], r8
000000D5  0F96D0                   setbe al
000000D8  0FB6C0                   movzx eax, al
000000DB  C3                       ret
000000DC  31C0                     xor eax, eax
000000DE  C3                       ret
msvc 2019_IsIp6InMask32:
00000050  8B442408                 mov eax, dword ptr [esp+8h]
00000054  56                       push esi
00000055  57                       push edi
00000056  8B7C240C                 mov edi, dword ptr [esp+Ch]
0000005A  BE03000000               mov esi, 3h
0000005F  8B5704                   mov edx, dword ptr [edi+4h]
00000062  3B5004                   cmp edx, dword ptr [eax+4h]
00000065  722D                     jb L_94
00000067  8B4C2414                 mov ecx, dword ptr [esp+14h]
0000006B  3B5104                   cmp edx, dword ptr [ecx+4h]
0000006E  7724                     jnbe L_94
00000070  2BF8                     sub edi, eax
00000072  8B1438                   mov edx, dword ptr [eax+edi*1]
00000075  3B10                     cmp edx, dword ptr [eax]
00000077  7504                     jnz L_7D
00000079  3B11                     cmp edx, dword ptr [ecx]
0000007B  7217                     jb L_94
0000007D  3B11                     cmp edx, dword ptr [ecx]
0000007F  7713                     jnbe L_94
00000081  83C104                   add ecx, 4h
00000084  83C004                   add eax, 4h
00000087  83EE01                   sub esi, 1h
0000008A  75E6                     jnz L_A2
0000008C  5F                       pop edi
0000008D  B801000000               mov eax, 1h
00000092  5E                       pop esi
00000093  C3                       ret
00000094  5F                       pop edi
00000095  33C0                     xor eax, eax
00000097  5E                       pop esi
00000098  C3                       ret
IsIp6InMask64:
000000A0  488B01                   mov rax, qword ptr [rcx]
000000A3  483B02                   cmp rax, qword ptr [rdx]
000000A6  7221                     jb L_C9
000000A8  493B00                   cmp rax, qword ptr [r8]
000000AB  771C                     jnbe L_C9
000000AD  488B4108                 mov rax, qword ptr [rcx+8h]
000000B1  483B4208                 cmp rax, qword ptr [rdx+8h]
000000B5  7506                     jnz L_BD
000000B7  493B4008                 cmp rax, qword ptr [r8+8h]
000000BB  720C                     jb L_C9
000000BD  493B4008                 cmp rax, qword ptr [r8+8h]
000000C1  7706                     jnbe L_C9
000000C3  B801000000               mov eax, 1h
000000C8  C3                       ret
000000C9  33C0                     xor eax, eax
000000CB  C3                       ret

ip list:
ip2location.com/lite/ (https://download.ip2location.com/lite/)
IP to Country Lite (https://www.db-ip.com/db/download/ip-to-country-lite)
convert program for csv
// https://www.db-ip.com/db/download/ip-to-country-lite
//#pragma comment(lib, "msvcrt.lib")
#pragma comment(linker, "-defaultlib:msvcrt.lib -subsystem:console")
__declspec(dllimport) int printf(const char * format, ...);
__declspec(dllimport) void exit(int status);
typedef void* FILE;
typedef unsigned long size_t;
__declspec(dllimport) FILE * fopen(const char *name, const char *mode);
__declspec(dllimport) int fclose(FILE *stream);
__declspec(dllimport) char * fgets(char * restrict dst, int max, FILE * restrict stream);
__declspec(dllimport) size_t fwrite(const void * restrict src, size_t size, size_t num, FILE * restrict stream);
//__declspec(dllimport) void __stdcall ExitProcess(long);
#pragma pack(push, 1)
typedef struct _IPV6ACC {
//__m128i ipv6a, ipv6a2;
union ipv6a {
unsigned char b[16];
unsigned short w[8];
unsigned long l[4];
unsigned long long ll[2];
//__m128i ul128;
} ipv6a;
union ipv6a2 {
unsigned char b[16];
unsigned short w[8];
unsigned long l[4];
unsigned long long ll[2];
//__m128i ul128;
} ipv6a2;
char cc[3];
} IPV6ACC;
#pragma pack(pop)

unsigned short hex2ushort(char *hex, char **next);
void fix_endian(IPV6ACC *ipv6acc);

void __cdecl mainCRTStartup(void)
{
char buf[100];
IPV6ACC ipv6acc;
FILE *fp1, *fp2;
//fp1 = fopen("test.csv", "r");
fp1 = fopen("dbip-country-lite-2019-11.csv", "r");
if (fp1) {
fp2 = fopen("test.bin", "wb+");
while (fgets(buf, sizeof(buf), fp1)) {
char *p = buf;
if (*(p+4) == ':') {
memset(&ipv6acc, 0, sizeof(ipv6acc));
for (int i=0; i<4; i++) {
if (*p == ',') break;
//ipv6acc.ipv6a.w[i] = hex2ushort(p, &p);
ipv6acc.ipv6a.w[i] = hex2ushort(p, &p);
}
//while (*p != ',') p++;
if (*p == ',') p++;
for (int i=0; i<8; i++) {
if (*p == ',') break;
ipv6acc.ipv6a2.w[i] = hex2ushort(p, &p);
}
if (*p == ',') {
ipv6acc.cc[0] = *++p;
ipv6acc.cc[1] = *++p;
}
//fix_endian(&ipv6acc);
// fix endian
unsigned short wtmp;
for (int i=0; i<2; i++) {
wtmp = ipv6acc.ipv6a.w[i];
ipv6acc.ipv6a.w[i] = ipv6acc.ipv6a.w[3-i];
ipv6acc.ipv6a.w[3-i] = wtmp;
}
for (int i=0; i<2; i++) {
wtmp = ipv6acc.ipv6a2.w[i];
ipv6acc.ipv6a2.w[i] = ipv6acc.ipv6a2.w[3-i];
ipv6acc.ipv6a2.w[3-i] = wtmp;
}
fwrite(&ipv6acc, sizeof(ipv6acc), 1, fp2);
}
}
fclose(fp1);
if (fp2) fclose(fp2);
}
exit(0);
}

unsigned short hex2ushort(char *hex, char **next)
{
unsigned short val = 0;
while (*hex && !(*hex == ':' || *hex == ','))
{
char byte = *hex++;
if (byte < '0' || byte > 'f')
break;
if (byte >= '0' && byte <= '9')
byte = byte - '0';
else if (byte >= 'a' && byte <= 'f')
byte = byte - 'a' + 10;
else if (byte >= 'A' && byte <= 'F')
byte = byte - 'A' + 10;
val = (val << 4) | (byte & 0xF);
}
if (*hex == ':') hex++;
*next = hex;
return val;
}
/*
void fix_endian(IPV6ACC *ipv6acc)
{
unsigned short wtmp;
for (int i=0; i<2; i++) {
wtmp = ipv6acc->ipv6a.w[i];
ipv6acc->ipv6a.w[i] = ipv6acc->ipv6a.w[3-i];
ipv6acc->ipv6a.w[3-i] = wtmp;
}
for (int i=0; i<2; i++) {
wtmp = ipv6acc->ipv6a2.w[i];
ipv6acc->ipv6a2.w[i] = ipv6acc->ipv6a2.w[3-i];
ipv6acc->ipv6a2.w[3-i] = wtmp;
}
}
*/
/*
::,1fff:ffff:ffff:ffff:ffff:ffff:ffff:ffff,ZZ
2000::,2001:4:ffff:ffff:ffff:ffff:ffff:ffff,CH
2001:5::,2001:5:0:ffff:ffff:ffff:ffff:ffff,NL
2001:5:1::,2001:5:1:ffff:ffff:ffff:ffff:ffff,TR
*/
Title: Re: 128 bit Comparison on 32-bit CPU
Post by: colinr on November 13, 2019, 08:03:19 PM
Wow, C looks even more complicated  :dazzled:

The sample by AW seems to be working a charm, going to push a load of IP's through it shortly to give it some real testing.

Thanks everyone for your input with this, I'm sure others will find this thread useful too, perhaps even experienced ASM coders.