News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Need Help / Recommendations for my compiler

Started by newbiedude, November 23, 2017, 03:58:08 PM

Previous topic - Next topic

newbiedude

Hey guys, I'm new here, so... (inbefore sorry for my english) I have a school project which involves making a compiler, right now we are at the final phase, only the generation of assembly code is missing, so, I am thinking of using MASM32 and your libs (include \masm32\include\masm32rt.inc) for it (I wanted x64 but seems way harder) and execute the command and program from my WPF app.
My language only has 5 data types: booleans, chars, real numbers, integers and strings, what would you recommend for each data type (byte, dw, sdword, etc.)?
It also has basic console input and output:
For empty reads I think I'll be using inkey " "
and then I have no idea of what to use for the outputs or inputs:
I tested an example from here that used crt_printf and crt_scanf, it worked well on the example calculator (using SDWORD as signed integers I think) but I don't know how well would it work with the rest of the data types, already encountered a problem when trying to load a string in a SDWORD, it was too long for it. What do you think would be good choices? Sorry for my ignorance, I have experience on AVR assembly but it was more on a hardware side than actual programs so this seems a bit complicated.
Thanks for your possible input :)

jj2007

Good morning, and Welcome to the Forum :icon14:

Your project seems to be pretty advanced, and I guess we can help you, in spite of the no homework rule; after all, this goes well beyond the standard homework assignment.

But we need more to put our teeth into. For the time being, these could be your reference and/or source of inspiration:
\Masm32\help\masmlib.chm
\Masm32\help\hlhelp.chm

Syntax-wise, to which language family should your compiler belong? You can go a long way with macros, see MasmBasic, which is "inline Basic for Assembler", but I suppose you plan to produce a real compiler that parses code line by line and translates it to assembler, right?

newbiedude

Hi. Yeah, it's a school project, my teacher and classmates are going to use emu8086, but I don't feel really excited to use it since we are well beyond the 8086 and 16 bits and I don't want to emulate. The sintax is very similar to a C language, it uses { } for zones, the same basic operators, etc. It has sintax support for objects and arrays 1d and 2d but since time is lacking it will be limited to local methods and no arrays. And yes, the plan is to generate all the assembly code, there's already an intermediate code which will be mostly the source for the assembly. I declare every variable from a symbol table, and plan using registers for temporal values from operations instead of creating aditional variables for temporals.
I was looking at the Help files, couldn't find the crt_ functions, I'm still confused on what functions use to read and write my data types, and what data use to each one. I'm also having trouble loading a string into a variable:
cad1_00 SDWORD ?
...
mov cad1_00,"Hola"
invoke crt_printf,OFFSET cad1_00
...
It prints < aloH > and is restricted to 4 chars long.
Do I have to make a special case for strings?

Thanks for your help.

jj2007


newbiedude

Indeed, tried with instruction "sas" and also worked, one issue less. Thanks  :icon_redface:

newbiedude

Hi again, any tips on converting chars, reals and integers to string? I'm using m2m for assignations between variables but no luck trying to figure how to asign chars or reals to strings.
I'm using FLOAT8 for real numbers, load them using "crt_atof".
I load characters directly: mov var, 'c'.
This is the operation table for assignations, the numbers are tokens, as you see the idea is to be able to load multiple data types to strings like any modern programming language.
// 48-boolean, 49-char, 50-real, 51-int, 52-string
        int?[,] opAssignment = new int?[,] // "=" operator
        {//    48    49    50    51    52
            { 0048, null, null, null, null }, // 48
            { null, 0049, null, null, null }, // 49
            { null, null, 0050, null, null }, // 50
            { null, null, null, 0051, null }, // 51
            { 0052, 0052, 0052, 0052, 0052 }  // 52
        };
I'll be using the conversions to concatenate too, but like I said I can't figure how. Been throught the help files with no luck.
Thanks!

hutch--

newbie,

Strings are always a sequence of either 1 byte characters OR with UNICODE 2 byte characters. Shifting from a string to a numerical representation always requires a conversion, the MSVCRT runtime will do at least some of these for you in both directions. When you are converting from a number you need to allocate enough memory to write the string representation into and you determine its size by the maximum character range for each numeric data type. A simple and reliable way to do this is make a LOCAL string that is big enough for any number that is converted.

Something like this.

LOCAL pString :DWORD     ; the string pointer
LOCAL sBuffer[32]:BYTE   ; the buffer it points to

lea eax, sBuffer         ; load the buffer address into eax
mov pString, eax         ; copy the address into a pointer

You then use pString as the address of the buffer you need to write to with your conversion. This can be done a bit more efficiently but it works OK.

newbiedude

Thanks a lot, used that plus left$(pString,1) for char to string conversion.
Ended up using real8$(var) for reals, can't believe I missed that macro before.

newbiedude

And... hi again!
Stuck again now in concatenating using macro add$.
To test it I'm reading a char, then trying to concat another char to it and print the result, this is the generated code:

.const
frmtInt CHAR "%d",0
frmtFloat CHAR "%f",0
frmtString CHAR "%s",13,10,0
frmtChar CHAR "%c",0
.data
hlpFloat REAL8 ?
hlpVar SDWORD ?
c1_00 BYTE ?
c2_00 BYTE ?
...
LOCAL pString : DWORD
LOCAL pString2 : DWORD
...
invoke crt_scanf,OFFSET frmtChar,ADDR c1_00
mov c2_00,'b'
lea eax,c1_00
mov pString2,eax
lea eax,c2_00
mov pString,eax
mov edx,add$(left$(pString2,1),left$(pString,1))
invoke crt_printf,OFFSET frmtString,edx

It prints nothing, tried many things but again no luck.
Sorry for this kind of questions, never really learned before the concept of pointers, addressing and buffers, it's totally new to me since never had the need for that on AVR or any high level language.

jj2007

Looks like a small problem, but currently I have no time to add the missing headers and test it, sorry. If you posted the complete code, I would give it a try.

hutch--

newbie,

Most probably what is happening is the edx register is being overwritten. Use another register. Try this.

push esi

; replace the use of EDX in your code with "ESI"

pop esi

ret


jj2007

mov esi, add$(left$(pString2,1), left$(pString,1))
invoke crt_printf,OFFSET frmtString,esi


Won't help. The macros add$ and left$ have their limitations (in contrast to MB Left$).

I had 5 minutes spare time, and I am n00b-friendly, so I added the missing headers and constructed a full example for you:include \masm32\include\masm32rt.inc
.const
frmtInt CHAR "%d",0
frmtFloat CHAR "%f",0
frmtString CHAR "%s",13,10,0
frmtChar CHAR "%c",0
.data
hlpFloat REAL8 ?
hlpVar SDWORD ?
c1_00 BYTE "what the heck", 0
c2_00 BYTE "is that, dear Masm32 fan club?", 0

.code
mytest proc
LOCAL pString : DWORD
LOCAL pString2 : DWORD

; invoke crt_printf,OFFSET frmtString, Cat$(Left$(addr c1_00, 5)+Left$(addr c2_00, 7)) ; MasmBasic equivalent

lea eax,c1_00
mov pString2,eax

lea eax,c2_00
mov pString,eax

; put an int 3 here and debug it
push left$(pString,7)  ; doc: "The original string data is overwritten"
nop
push left$(pString2,5)
pop eax
pop edx

mov edx, cat$(eax, edx)

invoke crt_printf,OFFSET frmtString,edx
ret
mytest endp
start:
call mytest
inkey
exit
end start

newbiedude

Hey, tried your example and it works fine, thanks.
But back to the reading one it only prints the contents of c1_00 when executing, I'll put the whole code this time:

Programa Prueba {
   Principal () {
      caracter c1;
      c1 = Leer();
      caracter c2 = 'b';
      Escribir(c1 ¬ c2);
      Leer();
   }
}

include \masm32\include\masm32rt.inc
.const
frmtInt CHAR "%d",0
frmtFloat CHAR "%f",0
frmtString CHAR "%s",13,10,0
frmtChar CHAR "%c",0
.data
hlpFloat REAL8 ?
hlpVar SDWORD ?
c1_00 BYTE ?
c2_00 BYTE ?
.code
main PROC
LOCAL pString : DWORD
LOCAL pString2 : DWORD
Programa:
Principal:
invoke crt_scanf,OFFSET frmtChar,ADDR c1_00
mov c2_00,'b'
lea eax,c1_00
mov pString2,eax
lea eax,c2_00
mov pString,eax
push left$(pString,1)
nop
push left$(pString2,1)
pop eax
pop edx
mov edx,cat$(eax,edx)
invoke crt_printf,OFFSET frmtString,edx
inkey " "
exit
main ENDP
end main

It only prints the char that was read.

nidud

#13
deleted

newbiedude

Hey, well yeah it seems everything is caused from the conversion to string from char, having no problem with the rest of the data types, will leave it like this in the meantime.
Quote from: nidud on November 29, 2017, 03:40:06 AM

main proc

  local string[3]:byte

    _getch() 

    mov string[0],al
    mov string[1],'b'
    mov string[2],0

    printf("%s\n", &string)
    xor eax,eax
    ret

main endp



Can't really change the structure of the assembly code since it's generated automatically and would need to add extra stuff for the chars, maybe I'll use only the numeric representation of the char *it's something*.