The MASM Forum

General => The Laboratory => Topic started by: jj2007 on December 23, 2022, 12:55:38 AM

Title: A collection of programming problems
Post by: jj2007 on December 23, 2022, 12:55:38 AM
Just in case you have nothing better to do (https://www.quora.com/What-are-some-programming-problems-that-look-hard-at-a-first-glance-but-are-actually-easy) :cool:

Example:
QuoteGiven a list of one million random 7-digit phone numbers write a program that prints a sorted list of the phone numbers with duplicates removed, using less than 2MB of memory and that runs in linear time
Title: Re: A collection of programming problems
Post by: zedd151 on December 23, 2022, 02:06:59 AM
Quote from: jj2007 on December 23, 2022, 12:55:38 AM
Given a list of one million random 7-digit phone numbers write a program that prints a sorted list of the phone numbers with duplicates removed
Okay, that's easy.  :cool:
Quote, using less than 2MB of memory
Still easy...  :cool: :cool:


Quoteand that runs in linear time
Does that mean no loops, jumps or calls? Not so easy.  :sad:


I tried to visit that site (quora) but can not do that on my computer due to Opera certificate issues. Even my dated iPad won't go there. I'll look at that link later today. I need a break from another project that I'm working on.  :tongue:


By 'printing' does that mean physical printing to sheets of paper, or to console window?  :toothy: 
Title: Re: A collection of programming problems
Post by: jj2007 on December 23, 2022, 02:10:31 AM
Quote from: zedd151 on December 23, 2022, 02:06:59 AM
Quote from: jj2007 on December 23, 2022, 12:55:38 AM
Given a list of one million random 7-digit phone numbers write a program that prints a sorted list of the phone numbers with duplicates removed
Okay, that's easy.  :cool:

Good to see that you find it easy - my solution is attached :thumbsup:

The algo is almost 3 times as fast as QuickSort (but you can't keep the duplicates):
5465 µs for  creating an array of random phone numbers
4317 µs for  setting the bits
37 ms  for  sorting & storing 735828 phone numbers
1000024
1000030
1000038
1000045
1000048
1000081
1000093
1000094
1000099
1000105
96 ms for nrQsortA (there are duplicates)
1000024
1000030
1000030
1000038
1000045
1000048
1000081
1000093
1000094
1000099
Title: Re: A collection of programming problems
Post by: zedd151 on December 23, 2022, 02:44:55 AM

Okay, you win.  :tongue:
I don't think I will try working with bits. But also ...
Quote735828 phone numbers
a little short from one million.   :undecided:  You cheated.  :badgrin:
My code would be more usual stuff. I'll post it later.  :cool: 

... maybe
Title: Re: A collection of programming problems
Post by: jj2007 on December 23, 2022, 03:41:56 AM
Quote from: zedd151 on December 23, 2022, 02:44:55 AMYou cheated

Not really. When you create one Million random 7-digit phone numbers, there are lots of duplicates.
Title: Re: A collection of programming problems
Post by: FORTRANS on December 23, 2022, 03:54:30 AM
Hi Jochen,

   Thanks for pointing that problem out.  I liked the solution,
but I would (probably) have never figured it out on my own.
Now one should find an actual application for it.

Cheers,

Steve
Title: Re: A collection of programming problems
Post by: zedd151 on December 23, 2022, 03:54:53 AM
Quote from: jj2007 on December 23, 2022, 03:41:56 AM
Not really. When you create one Million random 7-digit phone numbers, there are lots of duplicates.
Okay, I'll buy that.  :tongue:  But wheres the source code?  :biggrin:
Also, are they valid phone numbers? The article didn't give enough data, also didn't mentioned if the phone numbers need to be formatted in any particular manner. Are invalid phone numbers acceptable? And just a dead listing of the numbers opposed to formatted phone numbers (i.e., in U.S. = "267-4950" vs. '2674950')? So many unanswered questions.  :tongue:  I gotta go to take the dogs for a walk...

Depending upon the answers to my query, it could just be one million decimal numbers as opposed to phone numbers per se.
Title: Re: A collection of programming problems
Post by: jj2007 on December 23, 2022, 04:01:57 AM
Hi Steve,

Quote from: FORTRANS on December 23, 2022, 03:54:30 AM
Now one should find an actual application for it.

The application is an extra fast integer sort: it beats QuickSort by a factor three for the 7-digit range, in spite of bts being a rather slow instruction.

@Z: There is something odd with the number of elements. Too many duplicates for my taste, but I can't see the error in my logic. Source attached, maybe your eyes are better ;-)
Title: Re: A collection of programming problems
Post by: zedd151 on December 23, 2022, 04:12:48 AM
Quote from: jj2007 on December 23, 2022, 04:01:57 AM
Too many duplicates for my taste, but I can't see the bug. Source attached, maybe your eyes are better ;-)
You're a funny guy.  :tongue:  Of course it's coded in MB.
But I agree, ~26% duplicates indicates that the random generator used might not be very random. I would expect some duplicates, but not so many. Seems a very high number of duplicates. Some tests with different random generator algorithms may be in order to test how other PRNG's do. (But of course I'm busy at the moment :tongue: )
Title: Re: A collection of programming problems
Post by: jj2007 on December 23, 2022, 04:14:39 AM
The random generator is excellent. It's something else :rolleyes:
Title: Re: A collection of programming problems
Post by: zedd151 on December 23, 2022, 04:16:03 AM
Also, I noticed that your solution always has 735828 phone numbers generated (after dup removal) no matter how many times run. Obviously a pattern there, indicating issues with randomness? Your program does not print the list of numbers to see if definitely patterns are there. (i.e., same list of numbers generated)
If I had to guess, possibly the same initial seed used?
Title: Re: A collection of programming problems
Post by: zedd151 on December 23, 2022, 04:38:23 AM
Quote from: FORTRANS on December 23, 2022, 03:54:30 AM
Now one should find an actual application for it.
A call center that generates spam robocalls.  :tongue:  Of course the phone numbers would have to be valid.

Hi Steve!
Title: Re: A collection of programming problems
Post by: jj2007 on December 23, 2022, 04:59:26 AM
Quote from: zedd151 on December 23, 2022, 04:16:03 AMsame initial seed used?

Yes indeed, makes spotting bugs easier.
Title: Re: A collection of programming problems
Post by: zedd151 on December 23, 2022, 05:08:50 AM
Quote from: jj2007 on December 23, 2022, 04:59:26 AM
Yes indeed, makes spotting bugs easier.
Did you write a list to file and compare to the list from another run?  I will bet that the results are identical. If it is, then the results are predictable and thus non-random. I had similar issues in writing my sudoku game in generating random game boards. While your algo may be fine for some uses, there are likely instances where it is not. Just my two cents.
In fact some programs actually depend on this anomaly; i.e., the freecell game from windows xp. A user can select a numbered game to run (1-1000000). Guess what? The number the user enters is the seed for the PRNG in that program in that case (user selected game #). That is how that program numbers its games, by using a seed in the 1-1000000 range with predictable results.
Title: Re: A collection of programming problems
Post by: Vortex on December 23, 2022, 05:51:41 AM
Powershell script to create the list of random numbers. No duplicate removal :

for ($num = 1 ; $num -le 1000000 ; $num++)
{ Get-Random -Minimum 1000000 -Maximum 9999999 }

Title: Re: A collection of programming problems
Post by: daydreamer on December 23, 2022, 05:59:03 AM
Use it in American movie universe : every phone number starts with 555-four digit number?
Linear ? Does he mean limited to scalar single thread code, prohibit simd and simt solutions ?
Isnt restricting to same fixed random seed, on purpose to be able to compare speed, with different solutions from different coders?


Title: Re: A collection of programming problems
Post by: Siekmanski on December 23, 2022, 06:01:20 AM
You could use the same principle as my "CalculateMedian" routine.

http://masm32.com/board/index.php?topic=8671.msg94735#msg94735

Highest phone number = 9999999 = 1250000 bytes to populate = 1.1921 MB memory usage.

Read in all phone numbers and set the corresponding bit in the memory table.
for example: 1234567
1234567 shr 3 = 154320 -> jump to byte offset 154320 in memory.
1234567 and 7 = 7    -> set bit 7

Do this for all phone numbers and set the corresponding bits.

When done start reading the first bit, if set the phone number = 0000000
Read the next bit, if set the phone number is 0000001
Read the next bit, if set the phone number is 0000002 etc. till you reach bit number 9999999.

If a bit is not set it is not a phone number.

This way you have automatically sorted the telephone numbers and you will get rid of duplicate numbers.
Very fast routine and it uses only 1.1921 MB memory
Title: Re: A collection of programming problems
Post by: daydreamer on December 23, 2022, 06:05:40 AM
Quote from: jj2007 on December 23, 2022, 04:14:39 AM
The random generator is excellent. It's something else :rolleyes:
As in Great entropy? But if you choose crappiest entropy rndgenerator might result in only 10k phone numbers ?

Title: Re: A collection of programming problems
Post by: jj2007 on December 23, 2022, 06:56:39 AM
Quote from: Siekmanski on December 23, 2022, 06:01:20 AM
You could use the same principle as my "CalculateMedian" routine.

http://masm32.com/board/index.php?topic=8671.msg94735#msg94735

Sorry, I can't see what the two things have in common. Can you explain?

QuoteHighest phone number = 9999999 = 1250000 bytes to populate = 1.1921 MB memory usage.

Read in all phone numbers and set the corresponding bit in the memory table.

That is exactly what my program does, see attachment in reply #7.

In the meantime, I found the little bugger - it's fixed, see attachment.

  elements=1000000
Dim Tel(elements-1) As DWORD
For_ ecx=0 To elements-1 ; one Million random numbers
add Rand(900000), 100000 ; range 100,000 ... 999,999
mov Tel(ecx), eax
Next
...
  lea edi, Tel(0)
  xor ebx, ebx
  .Repeat
mov eax, [edi+4*ebx] ; get a number
bts [esi], eax ; set a bit at this number's distance
inc ebx
  .Until ebx>=elements ; n input numbers


8426 µs for creating an array of random phone numbers
1042 µs for getting the max = 999997
3193 µs for setting the bits
10 ms   for sorting & storing 603368 phone numbers
100000
100001
100002
100003
100005
100006
100007
100008
100009
100012
96 ms  for nrQsortA (there are duplicates)
100000
100001
100002
100002
100003
100005
100006
100007
100008
100009


As you can see, with these parameters we have about 40% duplicates. The algo is damn fast, 10x QuickSort with the chosen range, but, as mentioned before, there is no way to keep the duplicates. The interesting question is now what is the actual usage of a DWORD sorter? Should the result have duplicates or not? I guess we need both algos, depending on the actual requirements.

@Z: There is even a Sudoku example (https://www.quora.com/What-are-some-simple-programming-problems-that-may-seem-difficult-at-first-glance/answer/Gerry-Rzeppa?no_redirect=1)

(https://qph.cf2.quoracdn.net/main-qimg-3ce5f227151fbf030af02014612ad230-lq)
Title: Re: A collection of programming problems
Post by: zedd151 on December 23, 2022, 08:48:26 AM
Quote from: jj2007 on December 23, 2022, 06:56:39 AM
@Z: There is even a Sudoku example (https://www.quora.com/What-are-some-simple-programming-problems-that-may-seem-difficult-at-first-glance/answer/Gerry-Rzeppa?no_redirect=1)

(https://qph.cf2.quoracdn.net/main-qimg-3ce5f227151fbf030af02014612ad230-lq)

:greensml:  I'll look at that later...

edit... a couple minutes later. lol I though you made that puzzle. Anyway, I checked it for validity and it is valid with one solution (for the uniquiness rule). It is a rather easy puzzle though, rated at only 214 (via 'hodoku.exe') for anyone interested in such things.   :biggrin:
Title: Re: A collection of programming problems
Post by: Siekmanski on December 23, 2022, 09:41:00 AM
> Sorry, I can't see what the two things have in common. Can you explain?

Hi Jochen,

I remembered I used the simulair approach to count colors used in a picture and the median calculation to exclude sorting a while ago.
The number storing as a position in a table to exclude sorting would also be usefull for the phone number approuch.
Sorry, didn't know you used the same technique because I missed the source code in your second .zip  :thumbsup:

Title: Re: A collection of programming problems
Post by: jj2007 on December 23, 2022, 10:09:05 AM
Quote from: Siekmanski on December 23, 2022, 09:41:00 AMThe number storing as a position in a table to exclude sorting would also be usefull for the phone number approuch.
Sorry, didn't know you used the same technique because I missed the source code in your second .zip  :thumbsup:

Hi Marinus,

I didn't invent the technique - I just translated David Maynard's instructions (https://www.quora.com/What-are-some-programming-problems-that-look-hard-at-a-first-glance-but-are-actually-easy) to Assembly.

QuoteProblem: Given a list of one million random 7-digit phone numbers write a program that prints a sorted list of the phone numbers with duplicates removed, using less than 2 M Bytes of memory and that runs in linear time.

Solution:

Create a bit vector of 10 million bits (10/8 = 1.25 million bytes) and initialize all bits to 0. Now read through the list of telephone numbers and convert each telephone number into in integer, i, between 0 an 9,999,999. Now set the ith bit of the bit vector to 1. bitVector = 1;
Now iterate thought the bit vector from bitVector
  • through bitVector[9,999,999] and if the bit is set format the index i, as a 7 digit telephone number, and print it.
    Note: This problem was used as a Google Interview question.
Title: Re: A collection of programming problems
Post by: hutch-- on December 23, 2022, 12:55:52 PM
Generate an exhaustive linear list of 7 digit numbers then shuffle them to make them random, no duplicates.
Title: Re: A collection of programming problems
Post by: zedd151 on December 23, 2022, 04:09:47 PM
Quote from: hutch-- on December 23, 2022, 12:55:52 PM
Generate an exhaustive linear list of 7 digit numbers then shuffle them to make them random, no duplicates.
I see your line of thinking,  except that deviates from the given criteria:
QuoteProblem: Given a list of one million random 7-digit phone numbers write a program that prints a sorted list of the phone numbers with duplicates removed, using less than 2 M Bytes of memory and that runs in linear time.
:tongue:
Title: Re: A collection of programming problems
Post by: jj2007 on December 23, 2022, 09:28:45 PM
Quote from: hutch-- on December 23, 2022, 12:55:52 PM
Generate an exhaustive linear list of 7 digit numbers then shuffle them to make them random, no duplicates.

QuoteRand(1, 50, numbers() As DWORD, unique (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1030):6)            ; min, max, array, unique:n: get 6 unique elements in the range 1...49

Feasible but attention, if you set elements=1000000, unique:elements, you run into serious performance issues :cool:
Title: Re: A collection of programming problems
Post by: hutch-- on December 23, 2022, 10:08:25 PM
 :biggrin:

Do it with a hash table and keep pouring them through until you have the count you want.
Title: Re: A collection of programming problems
Post by: jj2007 on December 24, 2022, 06:37:25 AM
Quote from: hutch-- on December 23, 2022, 10:08:25 PMDo it with a hash table and keep pouring them through until you have the count you want.

We are talking DWORDs here, not strings.
Title: Re: A collection of programming problems
Post by: hutch-- on December 24, 2022, 06:56:28 AM
That can be done with a hash table but there is another technique, a bucket sort if you have enough memory, one pass in, one pass out. It may not fit into the memory constraint criterion but done right, its fast.
Title: Re: A collection of programming problems
Post by: jj2007 on December 24, 2022, 10:10:42 PM
I'll call it BitSort, and it will be added to the next MasmBasic version - source & exe attached. The algo is simple and 160 bytes short. Credits go to David Maynard (https://www.quora.com/What-are-some-programming-problems-that-look-hard-at-a-first-glance-but-are-actually-easy/answer/David-Maynard-3) :thumbsup:

invoke BitSort, pArray, elements

BitSort returns a ptr to the array in eax (or zero if HeapAlloc fails), plus #elements in ecx

It is blazing fast for not-so-high maximum values, such as "7 digit phone numbers". For example, with 100 Mio elements and range 1,000,000 ... 9,999,999, it's a factor 16 faster than nrQsortA. And, of course - no duplicates:

604 ms for sorting 7600084 elements with BitSort
10 s for sorting 100000000 elements with nrQsortA
BitSort 1000000
BitSort 1000001
BitSort 1000004
BitSort 1000005
BitSort 1000006
BitSort 1000007
BitSort 1000008
BitSort 1000010
BitSort 1000011
BitSort 1000012
BitSort 1000013
nrQsortA 1000000
nrQsortA 1000000
nrQsortA 1000000
nrQsortA 1000000
nrQsortA 1000000
nrQsortA 1000000
nrQsortA 1000001
nrQsortA 1000001
nrQsortA 1000001
nrQsortA 1000001
nrQsortA 1000001


For settings with few duplicates, nrQsortA and ArraySort (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1176) can be faster, though.
Title: Re: A collection of programming problems
Post by: NoCforMe on December 31, 2022, 03:42:57 PM
Quote from: jj2007 on December 23, 2022, 12:55:38 AM
Just in case you have nothing better to do (https://www.quora.com/What-are-some-programming-problems-that-look-hard-at-a-first-glance-but-are-actually-easy) :cool:

Example:
QuoteGiven a list of one million random 7-digit phone numbers write a program that prints a sorted list of the phone numbers with duplicates removed, using less than 2MB of memory and that runs in linear time

Interesting challenge, but can you explain what "runs in linear time" means? That's a term I'm not familiar with.
Title: Re: A collection of programming problems
Post by: NoCforMe on December 31, 2022, 04:52:59 PM
OK, I read the post where David Maynard threw down his gauntlet (made quite a bang) and explained the bit-vector thing. Which is dead simple: 1 million bits, each representing an integer depending on its position. Even I can understand that.

So what a guy would need in order to access the ith bit in this "vector" are two things: a pointer to a byte and a bit mask to isolate the bit. Here's my take on how to do that:


MOV EAX, i
MOV ECX, EAX ;Save original i
SHR EAX, 3 ;Divide by 8
MOV EDX, EAX ;Pointer to the right byte
SHL EAX, 3 ;Back to original modulo 8
SUB ECX, EAX ;Get the "remainder"
MOV AL, 1 ;Set bit 0
SHL AL, CL ;Set bit [remainder]
MOV theMask, AL

; Now you can access the ith bit with:
MOV AL, [EDX + BitField]
TEST AL, theMask ;0 or 1?


Amiright? Totally untested.

BTW, his use of the term "vector" kind of made me chuckle. It's a $5 word that mathematicians use, where us ordinary folks would say "table" or "bitfield" or some other term. Vector? That's the kind of language one uses with one's nose in the air.
Title: Re: A collection of programming problems
Post by: jj2007 on December 31, 2022, 07:51:23 PM
Quote from: NoCforMe on December 31, 2022, 03:42:57 PM
Quote from: jj2007 on December 23, 2022, 12:55:38 AM
Just in case you have nothing better to do (https://www.quora.com/What-are-some-programming-problems-that-look-hard-at-a-first-glance-but-are-actually-easy) :cool:

Example:
QuoteGiven a list of one million random 7-digit phone numbers write a program that prints a sorted list of the phone numbers with duplicates removed, using less than 2MB of memory and that runs in linear time

Interesting challenge, but can you explain what "runs in linear time" means? That's a term I'm not familiar with.

It means that if sorting 1000 items takes 1 millisecond, sorting 10000 will take ten milliseconds. Many sorting algos behave differently: they would take 100 milliseconds then. Try a bubble sort, for example.

It's irrelevant for small item counts below a few thousand, but once you have a Million to sort, you wish for "linear time" behaviour :cool: