News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

how to efficiently align code

Started by flipflop, February 21, 2013, 05:27:52 AM

Previous topic - Next topic

flipflop

Hi all,

I've heard about code alignation. How to do it? How to align code and data. What happens when for example i use directive ALIGN 4. I've heard that important in it is CPU cache parameter BYTE LINE SIZE.

thanks in advance.

Gunther

Hi flipflop,

Quote from: flipflop on February 21, 2013, 05:27:52 AM
I've heard about code alignation. How to do it? How to align code and data. What happens when for example i use directive ALIGN 4. I've heard that important in it is CPU cache parameter BYTE LINE SIZE.

your code will start at an address which is divisible by 4 (without remainder, that's clear).

Gunther
You have to know the facts before you can distort them.

flipflop

that means if for example a variable DWORD "VAR1" which is loaded without align at the address 403003H will be loaded at the address 403008H when i use ALIGN 4. Right?

dedndave

for data, it is often important that data is aligned to the word-size of the machine
(even-aligned for 16-bit code, 4-aligned for 32-bit code, 8-aligned for 64-bit code)
this helps speed up accesses to the data in code

in some cases, it is required that the data be aligned
this is true for many SSE instructions

as for aligning code, it seems to help CALL's if the target is 16-aligned
for NEAR loops, 16-aligned
for SHORT loops, alignment doesn't seem to be critical

this is done using the ALIGN directive
    ALIGN   4
in the data sections, the assembler may place bytes of 0's to pad in order to achieve alignment
in the code section, the assembler may use NOP's or JMP's to achieve alignment

Adamanteus

I myself not straight understood how it' need to do, so could explain more detailed :
1 - segment aliment by dot segment directives is ever proper to system
2 - when code and data in one module for suppress misalignment is need ALIGN directive before PROCs and some datas
3 - structure fields aligning by reserved db fields manually
4 - stack alignment no need to break by push word vars (ever doubled or dword vars instead)
5 - before label that used by far jmps need nops, for proper alignment

dedndave

Quote from: flipflop on February 21, 2013, 06:51:00 AM
that means if for example a variable DWORD "VAR1" which is loaded without align at the address 403003H will be loaded at the address 403008H when i use ALIGN 4. Right?

not quite
403004h is the next 4-aligned address

flipflop

Adamanteus, i'm not so good at assembly. Things about you're writting are not familiar to me. I wanted to learn basics of alignment. Could you explain each of them a bit?

dedndave

i think those are questions   :biggrin:
a little hard to understand exactly what he is asking, though

Gunther

Hi flipflop,

as Dave explained, you can align code and you can align your data. In some cases, for example, the hot spot of a loop, an alignment of 16 is necessary. Another example are SSE instructions. To use for example:


        movaps        xmm0, [value]         


the variable value must be aligned by 16, otherwise the CPU generates an exception. You can avoid that by using:


        movups        xmm0, [value]         


but that's slower.

Gunther
You have to know the facts before you can distort them.

Adamanteus

Quote from: flipflop on February 21, 2013, 07:10:57 AM
Adamanteus, i'm not so good at assembly. Things about you're writting are not familiar to me. I wanted to learn basics of alignment. Could you explain each of them a bit?
That's answer - follow good examples given here, and I could add that most easy way for proper align everything - put all in separate files, each procedure and variable, so dot segment directives will do everything for you.

MichaelW

flipflop:

This code shows the effect of the alignments.

;==============================================================================
    include \masm32\include\masm32rt.inc
;==============================================================================

;----------------------------------------
; Returns the maximum alignment of _ptr.
;----------------------------------------

alignment MACRO _ptr
    push ecx
    xor eax, eax
    mov ecx, _ptr
    bsf ecx, ecx
    jz @F
    mov eax, 1
    shl eax, cl
  @@:
    pop ecx
    EXITM <eax>
ENDM

;==============================================================================
    .data
        D0 dd 0
           db 0
        D1 db 0
        align 16
        D2 db 0
           db 0
        D3 db 0
        align 8
        D4 db 0
           db 0
        D5 db 0
        align 4
        D6 db 0
           db 0
    .code
;==============================================================================
start:
;==============================================================================
    ;------------------------------------------------------------------------
    ; The OFFSET operator specifies the offset address of a memory location.
    ; To get the offset address of a data label you must use the OFFSET
    ; operator, but to get the offset address of a code label you can omit
    ; the operator.
    ;
    ; The align directive aligns the next variable or instruction on a byte
    ; address that is a multiple of the specified number. This ensures that
    ; the minimum alignment will be as specified, but note that the actual
    ; alignment can be greater than specified. At least for ML 6.15 the
    ; number must be 1, 2, 4, 8, or 16.
    ;------------------------------------------------------------------------

    printf("start\t%Xh\t%d\n", start, alignment(start))
  L1:
    align 16
  L2:
    nop         ; 1-byte
  L3:
    align 8
  L4:
    nop
  L5:
    align 4
  L6:

    printf("L1\t%Xh\t%d\n", L1, alignment(L1))
    printf("*L2\t%Xh\t%d\n", L2, alignment(L2))
    printf("L3\t%Xh\t%d\n", L3, alignment(L3))
    printf("*L4\t%Xh\t%d\n", L4, alignment(L4))
    printf("L5\t%Xh\t%d\n", L5, alignment(L5))
    printf("*L6\t%Xh\t%d\n\n", L6, alignment(L6))

    printf("D0\t%Xh\t%d\n", OFFSET D0, alignment(OFFSET D0))
    printf("D1\t%Xh\t%d\n", OFFSET D1, alignment(OFFSET D1))
    printf("*D2\t%Xh\t%d\n", OFFSET D2, alignment(OFFSET D2))
    printf("D3\t%Xh\t%d\n", OFFSET D3, alignment(OFFSET D3))
    printf("*D4\t%Xh\t%d\n", OFFSET D4, alignment(OFFSET D4))
    printf("D5\t%Xh\t%d\n", OFFSET D5, alignment(OFFSET D5))
    printf("*D6\t%Xh\t%d\n\n", OFFSET D6, alignment(OFFSET D6))

    inkey "Press any key to exit..."
    exit
;==============================================================================
end start


start   401000h 4096
L1      401029h 1
*L2     401030h 16
L3      401031h 1
*L4     401038h 8
L5      401039h 1
*L6     40103Ch 4

D0      403000h 4096
D1      403005h 1
*D2     403010h 16
D3      403012h 2
*D4     403018h 8
D5      40301Ah 2
*D6     40301Ch 4


As Dave stated, to achieve the alignment the assembler pads the data or code, using bytes with a value of zero to pad the data and various forms of NOP to pad the code. In the example code I used the actual 1-byte NOP instruction to disturb the alignment, but depending on the size of the required pad the assembler may use various combinations of selected instructions. Since the instructions may become part of the instruction stream, and may be executed, the goal of the selection (a goal not necessarily met, see the 5-byte NOP in the list below) is to pick instructions that will have no adverse effects on the code they are placed in. This is a listing of the NOP sequences used by ML 6.14:


; -------------------------------------------------------------
; No-op sequences inserted for align, MASM 6.14, 1 to 15 bytes:
; -------------------------------------------------------------

00401001 90                     nop

00401006 8BFF                   mov     edi,edi

00401009 8D4900                 lea     ecx,[ecx]

00401014 8D642400               lea     esp,[esp]

0040101B 0500000000             add     eax,0

00401022 8D9B00000000           lea     ebx,[ebx]

00401029 8DA42400000000         lea     esp,[esp]

00401038 8DA42400000000         lea     esp,[esp]
0040103F 90                     nop

00401047 8DA42400000000         lea     esp,[esp]
0040104E 8BFF                   mov     edi,edi

00401056 8DA42400000000         lea     esp,[esp]
0040105D 8D4900                 lea     ecx,[ecx]

00401065 8DA42400000000         lea     esp,[esp]
0040106C 8D642400               lea     esp,[esp]

00401074 8DA42400000000         lea     esp,[esp]
0040107B 0500000000             add     eax,0

00401083 8DA42400000000         lea     esp,[esp]
0040108A 8D9B00000000           lea     ebx,[ebx]

00401092 8DA42400000000         lea     esp,[esp]
00401099 8DA42400000000         lea     esp,[esp]

004010A1 8DA42400000000         lea     esp,[esp]
004010A8 8DA42400000000         lea     esp,[esp]
004010AF 90                     nop


I recall that the GNU assembler at some point used different sequences, and the 15-byte sequence consisted of some number of NOPs preceded by a jump instruction that effectively jumped over the NOPs instead of executing them.
Well Microsoft, here's another nice mess you've gotten us into.