Why Write MASM In The Modern Era ?

There are many reasons why experienced programmers choose to write assembler code, performance issues where speed matters, the architectural freedom to lay out code in any way you like, the capacity to do things that cannot be done in many compilers but the main reason is simply because you can. Many conjure up the image of cobbling together a few DOS interrupts in unintelligible notation to prop up the shortcomings of compilers yet a modern assembler like MASM has the range of a high level language and can be written that way for high level code while retaining all of its power at the lowest level.

With the introduction of the 32 bit Windows API functions, MASM had access at the same functions that compilers had from the operating system but without the clutter and assumption of many of the compilers available. When you write Windows API code in MASM you get perfectly clear minimal precision code that leverages the full power of the Windows operating system and you get it at the code size you write, not with a pile of unwanted extras dumped into your executable by a compiler.

MASM has never been for the faint of heart, it is an uncompromised tool that has never been softened into a user friendly toy and it required the development of expertise to use correctly but for the programmer who already has experience in low level C and similar code, MASM offers power and flexibility that the best of compilers cannot deliver and contrary to popular opinion it can be developed and written at about the same development time as C code.

MASM handle C style structures with ease as it does with a number of other familiar high level constructions.

  ; ---------------------------------------------------
  ; set window class attributes in WNDCLASSEX structure
  ; ---------------------------------------------------
    mov wc.cbSize,         sizeof WNDCLASSEX
    m2m wc.lpfnWndProc,    OFFSET WndProc
    mov wc.cbClsExtra,     NULL
    mov wc.cbWndExtra,     NULL
    m2m wc.hInstance,      hInstance
    m2m wc.hbrBackground,  COLOR_BTNFACE+1
    mov wc.lpszMenuName,   NULL
    mov wc.lpszClassName,  OFFSET szClassName
    m2m wc.hIcon,          hIcon
    m2m wc.hCursor,        hCursor
    m2m wc.hIconSm,        hIcon

Instead of the nightmares of old with unreadable and unmaintainable code, the high level aspects of MASM look and read very much like traditional C code yet at any time it can work directly in mnemonics where performance matters.

Window API calls are clean clear code.

  ; -----------------------------------------------------------------
  ; create the main window with the size and attributes defined above
  ; -----------------------------------------------------------------
    invoke CreateWindowEx,WS_EX_LEFT or WS_EX_ACCEPTFILES,
                          ADDR szClassName,
                          ADDR szDisplayName,
    mov hWnd,eax

For non critical high level code it has a number of built in loop techniques that make for clear and maintainable code.

    .while eax > 0
      sub eax, 1

And the matching,

      sub eax, 1
    .until eax < 1

Where genuine performance matters loop code is written directly in mnemonics using any of the entire instruction set that the processor will support and you are by no means restricted to high level style loop code or the theory behind it. You can write multi-entry and exit loops, nested loops, interdependent loops and manually unrolled loops of many different designs, most of which will give even the best compilers nightmares.

Block structure conditional testing is routine in MASM with an efficient and clear notation that is useful in all but the most demanding algorithms where direct mnemonic code can bypass unrequired structure for maximum performance.

    .if eax == 1
      ; do something
    .elseif eax == 2
      ; do something else
      ; otherwise do this

This of course can be nested in the normal manner for the construction of WndClass() style message processing and with any of the high level flow control notations you can use C style runtime comparisons for complex condition evaluation.

   Operator       Meaning

    ==             Equal
    !=             Not equal
    >              Greater than
    >=             Greater than or equal to
    <              Less than
    <=             Less than or equal to
    &              Bit test (format: expression & bitnumber)
    !              Logical NOT
    &&             Logical AND
    ||             Logical OR
    CARRY?         Carry bit set
    OVERFLOW?      Overflow bit set
    PARITY?        Parity bit set
    SIGN?          Sign bit set
    ZERO?          Zero bit set

This combined capacity alone makes MASM a formidable tool yet its true designation is that of a MACRO assembler, a capacity that few would be familiar with and it has a very powerful pre-processor that substantially extends this already formidable capacity. MASM macros are known to be quirky and at a more advanced level  they can be reasonably difficult to develop end debug but there are a large number of well proven reliable macros available that increase the throughput of reliable code.

For normal integer evaluation.

    switch variable
      case value1
        ; do something here
      case value2
        ; do something else
        ; do any default processing

A variation combined with a hand written string evaluation procedure.

    switch$ string_address
      case$ "quoted_text1"
        ; perform action here
        ; if word maches
      case$ "quoted_text2"
        ; perform action here
        ; if word maches
        ; perform default action

The range of macros to automate tasks in MASM is almost unlimited yet while it may look like high level code it is in fact inlined hand written assembler code to improve your code throughput and maintainance without compromising performance.

While the pseudo high level capacity in MASM is very useful and has a proven track record over time, its real power is in its ability to write the complete Intel instruction set from conventional integer instruction to floating point instructions and register, MMX and the series of late XMM (SSE) instructions with near complete freedom of architecture.

With manually written mnemonic code you have as much or as little structure as you require and this is one of its great advantages over even the best compilers, the near total freedom to write anything you need without artificially imposed limitations.

Below is a simple algorithm using SSE instructions to XOR a random pad against data to be encrypted. It is written using a normal stack frame as the stack overhead is not a factor in its timing and the source and pad must be 16 byte aligned for the SSE instructions used. It is unrolled by a factor of 4 and uses non-temporal writes to avoid cache pollution with incoming data.

Built as an object module it can be used in either a MASM program or linked directly into a Microsoft VC application, one of the major targets of MASM as it is supplied with Visual C.

; いいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいい

SSExor proc src:DWORD,padd:DWORD,ln:DWORD

    push ebx
    push esi
    push edi

    mov esi, src
    mov edi, padd
    mov edx, -64
    mov ebx, ln
    shr ebx, 6                            ; int divide ln by 64

  align 16
    add edx, 64
    movdqa xmm0, [esi+edx]                ; read the source
    movdqa xmm1, [esi+edx+16]             ; read the source
    movdqa xmm2, [esi+edx+32]             ; read the source
    movdqa xmm3, [esi+edx+48]             ; read the source

    pxor xmm0, [edi+edx]                  ; xor pad to source
    pxor xmm1, [edi+edx+16]               ; xor pad to source
    pxor xmm2, [edi+edx+32]               ; xor pad to source
    pxor xmm3, [edi+edx+48]               ; xor pad to source

    movntdq [esi+edx],    xmm0            ; write result back to source
    movntdq [esi+edx+16], xmm1            ; write result back to source
    movntdq [esi+edx+32], xmm2            ; write result back to source
    movntdq [esi+edx+48], xmm3            ; write result back to source

    sub ebx, 1
    jnz lbl0

    mov eax, edx
    sub eax, ln
    cmp eax, 0
    jle lbl2

    add edx, 64

  align 4
    movzx ecx, BYTE PTR [edi+edx]
    xor [esi+edx], cl
    add edx, 1
    sub eax, 1
    jnz lbl1

    pop edi
    pop esi
    pop ebx


SSExor endp

; いいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいい

Algorithms written in the normal integer instructions are commonplace and still do the lion share of work in most applications. Below is an algorithm to perform the mundane task of replacing ASCII zeros with spaces in large text files that occasionally have embedded zeros that prevent them from being read in a normal editor.

It is written without a stack frame and is unrolled by a factor of 4 to improve its throughput by reducing loop code overhead.

; いいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいい


zrep proc ptxt:DWORD,ltxt:DWORD

    mov edx, [esp+4]        ; ptxt
    mov ecx, [esp+8]        ; ltxt

    add edx, ecx
    neg ecx
    jmp label0

    ret 8

  align 4
    mov BYTE PTR [edx+ecx], 32
    add ecx, 1
    jz quit

  align 4
    cmp BYTE PTR [edx+ecx], 0
    je pre
    add ecx, 1
    jz quit

    cmp BYTE PTR [edx+ecx], 0
    je pre
    add ecx, 1
    jz quit

    cmp BYTE PTR [edx+ecx], 0
    je pre
    add ecx, 1
    jz quit

    cmp BYTE PTR [edx+ecx], 0
    je pre
    add ecx, 1
    jnz label0

    ret 8

zrep endp


; いいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいいい

With MASM the sky is the limit and while there are many tasks that are not worth making the effort to write, where you do need to target the speed of a particular process in an application, MASM is the tool that can produce any algorithm you know enough about to write.

MASM is not for the faint of heart, it is a technically demanding tool that requires a good working knowledge of both the operating system and the available mnemonics that the processor supports but it puts the final control of an algorithm in the hands of the programmer who chooses to master a tool of this type.