Author Topic: console and low freq cpu programming?  (Read 1566 times)

daydreamer

  • Member
  • *****
  • Posts: 1721
  • building nextdoor
console and low freq cpu programming?
« on: January 18, 2021, 07:21:00 PM »
if you are spoiled by 4.5ghz cpu on PC,you dont notice any lag or problem with your code
but if you program for 1.6+ghz console,I guess its to get most out of SIMT first to distribute between many cores,maybe critical sections are exchanged from scalar code to SIMD

but it must also be advantage to run easier on Atom laptops

also one SIMT question
so I get a separate stack space for each workerthread?so I could keep it running in a PROC,doing stack tricks here,without affecting main threads stack
SIMD fan and macro fan
why assembly is fastest is because its switch has no (brakes) breaks
:P
only in 16bit assembly you can get away with "Only words" :P

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 8493
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: console and low freq cpu programming?
« Reply #1 on: January 19, 2021, 01:28:00 AM »
Magnus, this is why you write efficient assembler, minimum instruction counts, well designed algorithms, writing code for old machines is an art form and apart from some hardware differences, you generally get better code.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

daydreamer

  • Member
  • *****
  • Posts: 1721
  • building nextdoor
Re: console and low freq cpu programming?
« Reply #2 on: January 20, 2021, 04:57:46 AM »
Magnus, this is why you write efficient assembler, minimum instruction counts, well designed algorithms, writing code for old machines is an art form and apart from some hardware differences, you generally get better code.
yes I already try todo lots SIMD before ,it would be good if got suggested some SIMT exercises for 2-4 cores
split execution between several thread and use some timer to synchonize threads or more lowlevel LOCK prefix?

SIMD fan and macro fan
why assembly is fastest is because its switch has no (brakes) breaks
:P
only in 16bit assembly you can get away with "Only words" :P

mikeburr

  • Member
  • **
  • Posts: 134
Re: console and low freq cpu programming?
« Reply #3 on: January 21, 2021, 02:04:23 AM »
heres a rough outline of a program i wrote in Cobol nearly half a century ago
primarily to merge a lot of files [ originally ICL 1900 and a few years later IBM 360/158 ]
maybe youd like to try this ??
 

spec for sort "but largely merge" of data 
assume the data is the key in this case ie chop the data up into double word length bits
if youre doing 32 bit or qword if youre trying out x64
the strategy is in outline

sort small batches  of data
when you get enough release them to a merge
when you get enough batches of merged data release them to subsequent merges

divide up the data roughly into multiples of 2 sized data chunks [ ie nearest power of 2 ]
prob best to stick to a power of 2 amount of data to start to avoid any messy end processing
though its not at all difficult
.................................................................

loop a) until data exhausted
a)
in a new thread sort a small "power of 2" chunk of this data
eg 64 or 128 lots
  if the no of batches reaches a convenient number eg power of 2 ...say 16 , 32 ...
  otherwise
    go to a)
 
......................

loop b) on return of say a small "power of 2"  say 16 or 32  batches of sorted data

b)
in a new thread
keep taking and removing take the lowest key from each of the sorted batches of data to an "outfile"
as an example this can be done roughly as follows
   im going to assume you've used [16] batches of data to avoid any confusion with stage a) the sort and
   128 data items in each sort
   
   initialy
   b0)  compare the [16] lowest keys from each of the batches so you know which is the lowest
     
   b1)  remove the lowest and place it in the "outlist"
     take the next from the batch you just removed the item from
     is it lower than the new lowest key
        if it is [ this is quite likely with general user data ]
            go to b1)
        if it isnt [ this is quite likely with random data ]
           place the new key in sequence in the look up         
             go to b0)
           
c) you now have an "outfile" of 16 * 128 sorted items = 2048
   in a new thread
     take all the lots of 2048 items eg [16] of them
     and do b)
     
repeat stages b) and c)  until all the data is sorted


the machines then were not multiple core but did use Virtual paging [ well the IBM did cant remember whether the ICL
did but it was generally a superior machine to the American offerings ]
there was quite a lot of data went through the IBM version merging 8 files of user related data it was quite quick
but id be interested to see how the threading affects the performance
as theres the opportunity to process much of the sort merge concurrently
i cant remember what the limit to the number of threads is on x32 [64 ????]
also
 you can vary the number of batches and size and see how this imapacts the speed because youve now got
a thread control overhead implicit in the M$ software and its co ordination as well as the sort and merge
both of which provide different challenges

  i hope you like this Magnus and anyone else who is interested in trying it
   obviously if youre merging a lot of files and they're in a convenient similar sequence omit stage a) 
   regards   mike b



 
   
     
       
     
   
   
 

daydreamer

  • Member
  • *****
  • Posts: 1721
  • building nextdoor
Re: console and low freq cpu programming?
« Reply #4 on: January 21, 2021, 05:50:39 AM »
thanks Mike :thumbsup:
SIMD fan and macro fan
why assembly is fastest is because its switch has no (brakes) breaks
:P
only in 16bit assembly you can get away with "Only words" :P

daydreamer

  • Member
  • *****
  • Posts: 1721
  • building nextdoor
Re: console and low freq cpu programming?
« Reply #5 on: January 25, 2021, 01:46:41 AM »
After benchmark peekmessage(millions/second ) vs Workerthread (billions /second
I decided wndproc should be minimum messages, mouse messages just store coordinates and flags for mousemove, mousebutton messages in global variables, also keyboard messages
Maybe some include some code detecting what mouse points to are been clicked on
And do most work in workerthreads

SIMD fan and macro fan
why assembly is fastest is because its switch has no (brakes) breaks
:P
only in 16bit assembly you can get away with "Only words" :P

daydreamer

  • Member
  • *****
  • Posts: 1721
  • building nextdoor
SIMD fan and macro fan
why assembly is fastest is because its switch has no (brakes) breaks
:P
only in 16bit assembly you can get away with "Only words" :P

mineiro

  • Member
  • ****
  • Posts: 725
Re: console and low freq cpu programming?
« Reply #7 on: January 26, 2021, 04:29:15 AM »
I suppose is 'exception handling'.
http://masm32.com/board/index.php?topic=6614.0

A debugger will receive 2 attemps to terminate your program, first one you need deal with supposed error, second will terminate your program.
To test you can do a division by zero.

Link bellow have a document that talk a bit of exception handling (swconventions).
http://masm32.com/board/index.php?topic=5455.0
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

daydreamer

  • Member
  • *****
  • Posts: 1721
  • building nextdoor
Re: console and low freq cpu programming?
« Reply #8 on: January 26, 2021, 05:27:45 AM »
I suppose is 'exception handling'.
http://masm32.com/board/index.php?topic=6614.0

A debugger will receive 2 attemps to terminate your program, first one you need deal with supposed error, second will terminate your program.
To test you can do a division by zero.

Link bellow have a document that talk a bit of exception handling (swconventions).
http://masm32.com/board/index.php?topic=5455.0
thanks mineiro :thumbsup:
much common gp fault,would be nice to catch,especially when I have several threads that can  cause bugs or gp faults,so try /catch block can show a messagebox or something to show which thread is causing the problem
on 32bit it says OS allocates automatically stack space for thread if you dont tell some number,does it work the same with 64bit shadow space? and how much?

SIMD fan and macro fan
why assembly is fastest is because its switch has no (brakes) breaks
:P
only in 16bit assembly you can get away with "Only words" :P

mineiro

  • Member
  • ****
  • Posts: 725
Re: console and low freq cpu programming?
« Reply #9 on: January 26, 2021, 07:41:46 AM »
hello sir daydreamer;
I don't know the answer.
I only played with that to not get boring but after some tries I stay more boring.
Maybe Mark Russinovich book can have an answer.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything


daydreamer

  • Member
  • *****
  • Posts: 1721
  • building nextdoor
Re: console and low freq cpu programming?
« Reply #11 on: January 27, 2021, 01:14:06 AM »
BTW:
TIOBE Index for January 2021
thanks miniero and TimoVJL
assembler risen from 15th to 11th place :greenclp: :thumbsup:
@Hutch more masm videos and we soon reach #1  :greenclp: :thumbsup:

now I have found some exercises and try the producer/consumer way
and learn what algorithms are most suitable for parallel and some less
SIMD fan and macro fan
why assembly is fastest is because its switch has no (brakes) breaks
:P
only in 16bit assembly you can get away with "Only words" :P