News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

BT instruction family

Started by Gunther, April 06, 2014, 02:54:40 AM

Previous topic - Next topic

hutch--

Gunther,

If you need to use the BT family of instructions, if you carefully select the instructions around it you can probably get a few following instruction executions to fit into any stall hole it may leave.

Gunther

Steve,

Quote from: hutch-- on April 07, 2014, 11:15:18 AM
If you need to use the BT family of instructions, if you carefully select the instructions around it you can probably get a few following instruction executions to fit into any stall hole it may leave.

that's my plan. I've finished the BCD to Chen-Ho compression. It's a bit lengthy, but fast. There are no data Cache or RAM accesses necessary. All values are inside of registers. The code contains no loops or jumps, only AND, OR, BTC and MOV on a register to register basis. So the prefetch queue stands healthy over the entire procedure. I think that's a good starting point.

I've now to write the Chen-Ho to BCD expansion. That should work in a similar way. If it works, I'll open a new thread with a test program. I need all that for a special encoding question in Geneva.

Gunther
You have to know the facts before you can distort them.

FORTRANS

Hi Gunther,

Quote from: Gunther on April 07, 2014, 08:20:58 AM
interesting results. The second pre-P4 looks very strange.

   Oh?  Well it is a Pentium MMX in a laptop.  Looking at some old
opcode timings, the bit test instructions are a bit slow compared
to logical or arithmetic operations, but not too bad.  The bit scan
instructions do look rather bad for 386/483/Pentium.  Maybe that
is where the bit test operations got a reputation for being slow.

   The other two processors were a P-III and a P-II.

   By the way, using the Chen-Ho encoding (or DPD) seems
somewhat complex when compared to BCD or binary.  (At first
glance to me anyway.)  When you finish coding your project I
would be interested in a timing comparison of Chen-Ho, BCD,
and binary processing if that is possible.  Idle curiosity, so don't
do it if it does not interest you.

Regards,

Steve N.

Gunther

Hi Steve,

Quote from: FORTRANS on April 07, 2014, 11:56:52 PM
   By the way, using the Chen-Ho encoding (or DPD) seems
somewhat complex when compared to BCD or binary.  (At first
glance to me anyway.)  When you finish coding your project I
would be interested in a timing comparison of Chen-Ho, BCD,
and binary processing if that is possible.  Idle curiosity, so don't
do it if it does not interest you.

you're right. The operating times are: Chen-Ho > BCD > Binary. We can test it, but that'll be the result. Chen-Ho or DPD are only of interest to save bandwith for temporary storage.

Let me give you an example. For some image compression techniques, it's a good idea to save some transformation parameters, like scale and offset. Those parameters are stored in 12 bit each, makes 24 bit together. Chen-Ho or DPD can store 12 bit into 10 bit without information loss. That saves 400 bit for 100 transformations. Large images have often several thousands of such transformations.

Gunther 
You have to know the facts before you can distort them.

FORTRANS

Hi Gunther,

   Well Chen-Ho will save bytes over BCD, but not over binary.  So
I guess my question would be, why do you need to use decimal digits
in your code.  Or maybe I have missed something.

Regards,

Steve N.

Gunther

Steve,

Quote from: FORTRANS on April 08, 2014, 08:07:47 AM
   Well Chen-Ho will save bytes over BCD, but not over binary.  So
I guess my question would be, why do you need to use decimal digits
in your code.  Or maybe I have missed something.

take care. Chen-Ho and DPD have the same efficiency like the binary represantation. For example: The decimal value 999 has the binary representation 1111100111. The Chen-Ho representation is 111 111 1001 and as DPD value it would be 001 111 1111.

I wrote in reply #18 about scale and offset. The offset isn't the point, because these are integer values. But the scale is often a fractional decimal value and that's a real problem because of rounding errors. With BCD it can be avoided, but that's not so efficient. That's the reason for DPD or Chen-Ho.

For a more technical interpretation: The scale is the contrast and the offset is the brightness of an image region.

Gunther
You have to know the facts before you can distort them.

FORTRANS

Hi Gunther,

   Okay.  Thank you for the explanation.  I will have to consider
it to see if I see the practicality of it.

QuoteChen-Ho and DPD have the same efficiency like the binary represantation.

   An interesting observation on its own.  Though 999 versus
1024 if decimal digits are not required.  I suppose I should
investigate the subject further.

Regards,

Steve N.

dedndave

let's say you have 5 values

10000 2710h, 14 bits
10001 2711h, 14 bits
10003 2713h, 14 bits
10005 2715h, 14 bits
10007 2717h, 14 bits


can be stored as

10000 2710h, 14 bits
1     1, 3 bits
3     3, 3 bits
5     5, 3 bits
7     7, 3 bits


MichaelW


AMD-K5(tm) Processor
9882    cycles for 100 * bt
6703    cycles for 100 * test

9883    cycles for 100 * bt
6687    cycles for 100 * test

9886    cycles for 100 * bt
6685    cycles for 100 * test

15      bytes for bt
16      bytes for test
Well Microsoft, here's another nice mess you've gotten us into.