News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Find duplicates strings

Started by ragdog, November 01, 2014, 02:39:56 AM

Previous topic - Next topic

ragdog

Hello Community

I need for my project a way to find duplicates strings in a string array
I have search in all boards or with google without any result

This strings sort i with Quicksort example from Vortex
http://masm32.com/board/index.php?topic=1267.msg12713#msg12713

Must i sort a list for find duplicates strings or works without?

But how i can find all duplicates strings

mov esi, ptrMem  ;<< Alloc mem after QuickSort
xor ecx,ecx
.repeat
lea eax,(Items ptr[esi+sizeof Items])._String
push eax
lea eax,(Items ptr[esi])._String
push eax
call strcmp
.if (eax)
push ecx
invoke crt_printf ,chr$ ("%s",CR,LF),addr (Items ptr[esi])._String
pop ecx
.endif
inc ecx
add esi,sizeof Items
.until (ecx>=nOfEntys)


Gives any information or old code in the Masm32 board how to solve it?

regards,

TouEnMasm

Can I see a little sample  of text ?.No code it's useless.Case sensitive or Not ?.
Fa is a musical note to play with CL

ragdog

Here is my text file a entry is 140 bytes long

Items struct
   _String BYTE 140 dup (?)
Items ends

After qsort have i this sorted list


Anaconda
Anaconda
Flatfish
Dalmatian
Dalmatian
Dalmatian
Elephant
Llama
Giraffe
Canary



TouEnMasm

here a test code,when loaded in memory the file can be zero terminated or not.
If not,use the size of the file to stop the loop


;################################################################
RechercheDoublons PROC uses esi edi ebx pmem:DWORD ;pmem point on  FICHMEM
Local cpt:DWORD
Local  retour:SDWORD
         mov retour,0
         mov cpt,0
         
         mov edx,pmem
         mov esi,[edx].FICHMEM.Mpoint    ;pointer on memory
         mov ecx,1 ;first item exist
         cherche:
         mov retour,ecx
         add esi,sizeof Items        ;add size of structure
         .if byte ptr [esi] != 0        ;esi point on each item
          inc ecx                       ;one item more
      jmp cherche
         .endif
mov retour,ecx ;return number of items (10)
FindeRechercheDoublons:
         mov eax,retour
         ret
RechercheDoublons endp
Fa is a musical note to play with CL

TouEnMasm


It's not a text file
Quote
001434e0  41 6e 61 63 6f 6e 64 61-00 00 00 00 00 00 00 00  Anaconda........
Fa is a musical note to play with CL

jj2007

Plain Masm32:

include \masm32\include\masm32rt.inc
.code
start:
  strsize=140
  mov esi, InputFile("text.txt")
  add rv(filesize, "text.txt"), esi
  push eax
  .Repeat
   mov edi, esi
   .Repeat
      add edi, strsize
      .if !cmp$(esi, edi)
         print edi, 13, 10
      .endif
   .Until edi>=[esp]
   add esi, strsize
  .Until esi>=[esp]
  pop eax
  inkey "ok"
  exit
end start

ragdog

Hello thanks for your post


@ToutEnMasm

No is not realy a text file is a string array as struct
i have send you only the entrys in the text file.

@Jochen

Thanks looks good but i try to compile it for test but i have an error here

.if !cmp$(esi, edi)

error A2008: syntax error : ,
fatal error A1011: directive must be in control block



Greets,


jj2007

Quote from: ragdog on November 01, 2014, 08:09:28 PM
@Jochen

Thanks looks good but i try to compile it for test but i have an error here

.if !cmp$(esi, edi)

Either you have an incredibly old version of the Masm32 library, or you didn't copy the code properly. It works like a charm with JWasm and Masm 6.14 ... 10.0.

ragdog

Ok i have Found this Macro but have the wrong result

All "Dalmatian" strings is found but only a one "Anaconda"

Here is my sorted list with qsort in a text file.

jj2007

Quote from: ragdog on November 01, 2014, 08:36:30 PM
Ok i have Found this Macro but have the wrong result

All "Dalmatian" strings is found but only a one "Anaconda"

Depends on what you want - if you just want to identify the duplicates, my code is good enough. But when Dalmatians #2 enter esi, it lists #3 as a duplicate again. If you want another logic, you might have to mark the duplicates as read etc - there are many ways to do this, as you certainly know ;-)

ragdog

Quoteyou want another logic, you might have to mark the duplicates as read etc

Correct Jochen mark the duplicate was a idea but i have thought it works without marks the dup.

This problem by your code is by start the first loop is in esi the first string
now go in the second loop have edi the next string .

by jump back in the second loop compare it 2te with the 3te string and this is not equal
and the second Anaconda is now lost.

Ok i say thanks anyway
i try it with mark the strings or have you or other a other idea?1