Jochen,
thank you for the test bed. Here are the results:
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
7433 cycles for 100 * bt
7453 cycles for 100 * test
6849 cycles for 100 * bt
6830 cycles for 100 * test
6833 cycles for 100 * bt
6822 cycles for 100 * test
15 bytes for bt
16 bytes for test
--- ok ---
The same results here. On the other hand, the test bed isn't realistic enough:
btc eax, 7
selects bit 7 in EAX, stores the value of that bit in the CF flag, and complements bit 7 in EAX. While TEST computes the bit-wise logical AND. We have to add the time to complement the appropriate bit.
Gunther