News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

HTML to RTF

Started by guga, June 08, 2013, 01:56:26 PM

Previous topic - Next topic

guga

Hi JJ, the stack is working now. I succeed to make it work on a dialog. It checks for the existence of other converters etc. It converts fine word97 to rtf etc, but when it comes to html.iec, it do the conversion but a messagebox popups up saying that there is a problem with the file and the conversion may not work properly and it then don´t display it on the richedit control.

I´m following the wordpad source for do this. And for what i understood the code, it was fine But debugging the app it does not even call the conversion routines (but they are inside the source and are active).

I´m fixing a couple of routines on the test i´m making and will post it complete later. I plan to build a dll for us that do the conversions to we use it in whatever converter editors we want to build. But, understanding exactly how the Apis works just following the sdk documentations reveals to be a pain.


Tout en masm, tks, i´ll analyse this source to see w2hat i´m doing wrong.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

Tout en masm, the source works fine.

I compiled it and placed a teste.bat on the debug directory. It do converts some html files. File"teste.html" is correctly converted (err...kind of, it seems the converter don´t like some javascript tokens - needed to parse the buffer to remove some of them to works properly and also, it don´t seems to parse properly some commented html files - again...writting a parser to remove the comments seems better to avoid bad convertions )

For images, it is a question of configuration, but i´ll take a look at it later, when i try to reproduce the source.

The compiled file is here
The executable is only 22 Kb, since i used tinylib and some masm libs ins9de to reduce the compiled size.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

TouEnMasm

Quote
The compiled file is here
The link just show advertisement for a navigator.
Fa is a musical note to play with CL

guga

?

that´s weird. I sent the link to the new mega upload.

try it here
https://mega.co.nz/#!X0wU2TBR!DVMPjm-JIs4E2c-QbqnI3DVR6ANRdV75dTHSIAgVLBI

the filename is Testingsrc.zip

If it still not work try it at RosAsmboard here:
http://rosasm.freeforums.org/rtftohtml-t126.html
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

TouEnMasm


Ok thanks,
Here a sample of what can be done,The pecoff in html format.Could make a help file with it.
I could kow import any file in rtf,but i could only export to Html ,any idea ?.
Fa is a musical note to play with CL

TouEnMasm


Your compile of wordcnv.c show a problem with -l option.
I have recompiled it with visual c++ express 10 and it seems to work.
Fa is a musical note to play with CL

guga

It exports only in html because i changed the c source to use only the html.iec file. (I need to debug the working version to compare of what i´m doing wrong. Yours are better then wordpad source, since it correctly points to the converter functions to be compared)

To make it import and export other formats you need to grab the converters through the registry and use them (wpc, iec and cnv extensions).I made a app that is able to grab all converters existent on the System, scanning from the registry. I´m currently cleaning the code before i upload the 1st test (actually, it is the 20th test i made  :greensml:).

After cleaning i´ll make an alternative routine to use the known directories from the system where the converters are stored, and a error check if the user don´t have any converters on the proper folders and neither on the registry. On such cases, the best will be opening a messagebox to the user asking him to browse the proper folder of his converters.

I could upload now for you take a look at the work i´m doing, but the code is kinda messy. It was a real pain not only enumerating all the registry subkeys, comparing the newer converter versions to be used, making the proper filter buffer to be displayed on a dialogbox, see which converters can export or not, but also make that crappy converters works relativelly well.

The good news is that once they are working and running it will be relativelly easy understand the returned values that they register to enable the user to chose if he wants to export images, waves and other embededed objects that each converter eventually are able to export. For instance, i found out that internally html.iec (and the others) varies the export when the appname is WORDPAD or WINWORD or other tokens. So, in fact, each converter can be configurable to use those tokens as simple equates related to the quality of the export (or the type of objects that the converter can embed at export).

There are still some road to go before making it work for all converters, but i´m suceeding to work with some of them. The problem is being in make it work for the html to rtf, because an weird error message is showing and i suspect it is due to the lack of registration of the converter. The damn SDK document does not help at all.The C source you found will be extremelly helpfull on this, btw, since the one i had (wordpad source) was a mess and had those damn stack faults while on the one you provides it seems that there are no stack problems whatsoever  :t

Tell me if you want to take a look at the file on the current development and i´ll upload it.


All other problems of the export related to html.iec like table sizes (they are not exactly the same as the import) can be fixed simple parsing the exported RTF value (or even better,justparse the html on input related to the table token which html.iec is unable to parse) and altering it to it looks relativelly equal to the ones imported. For tables i suspect html.iec is unable to understand some tokens related to the table size, that´s why tables looks different at export, but it seems easy to fix once we can make all of this fully working.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

A note of interest. M$ converters dispites all of it´s annoyances of internal codage,uses Bison Yacc parser to parse the RTF and HTML tags. I keep wondering why M$ programmers don´t make their own code to parse those tokens and avoid using 16 bits programming techniques. It would be way much better if html.iec, and all the other Office converters could simply be updated to migrate from 16 to full 32 bits and using simpler parsers to use inside Office package.

Analysing html.iec shows that it have most necessary for an excelent converter, but the code is such a mess that keeps me wondering why they don´t update the source in order to make Office an ever more efficient word processor ?

The converters sources don´t seems to be that big and they are relativelly the same - except for the targeted files routines used onto the parsing functions.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

TouEnMasm


I am now working on my own source code (in asm):
General import and exports are soluce.
Enumeration of exports and  imports is also made threw a menu.
The only difficult I have is to exports rtf to docx..
Don't want to work.
Here is the draft of my code.
Just declare the import,export file (menu files) and use the menu "conversion".It will show you the export and import functions in your computer.
The result will be in test.rtf and testeur.doc (same dir as richedit.exe).
Perhaps if export rtf to doc failed,it is because i haven't word.Just give an unknown error.


Fa is a musical note to play with CL

guga

docx

the converter  is located in C:\Program Files\Microsoft Office\Office12\Wordcnvpxy.cnv

do you want me to send to you for you test ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jj2007

Quote from: ToutEnMasm on June 21, 2013, 11:40:58 PM
Here is the draft of my code.

No source?? Which role does Kernel32's "Cancello" have in your code??
And why do I get an empty box "format", then testeur.doc "réussite" but with 0 bytes??

guga

what is kernel32 cancello ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

TouEnMasm

Here the source code
The most important file is convert.inc
The lib made the dynamic link with any converter you want.
There is nothing to search,the export and import menu give all is needed to run the various converters present in your system.
The last question to solve is why the exports word converters don't work ?.
(rtf to word).
Quote
Which role does Kernel32's "Cancello" have in your code??
I don't see Cancello in my code ??????.

one answer:
The RtfToForeign32 return a 16 bits error code,to have a 32 modify eax as follow.

      .if eax != 1
         or eax,0FFFF0000h
      .endif


Fa is a musical note to play with CL

guga

Another issue about the converters

This time is with write.wpc.

On my machine (WinXP), there seems to be an error on the registry. Internally,this converter have both extensions for import and export, but the registration is disallowed somehow. In the text converters registry key, it have the extensions and path for the import (This converter don´t seems to export), but, when we try to get the extensions with GetReadname the function returns 1 (which accordying to the sdk should also be equivalent to no error), but, in fact, i suspect this is more likely to means that the converter is not registered.

This is because write.wpc should have a key on the registry at: Software\Microsoft\Windows\CurrentVersion\Applets\Wordpad

But, this key simply don´t exists on my OS.

The same problems seems to happens on lotus32.cnv.

Even if both seems correctly registered inside the text converters registry key, when we try to use them it will fail, if other keys are not properly find.
The same happens on html32.cnv, mswrd832.cnv, mswrd632.cnv (this one causes a stack error inside nt.dll, due to the failure of releasing memory from globalalloc - Also, on this the extension values, instead returning".doc" or something, returns a binary data starting with 0FF)

All of these files points internally to msconv97.dll functions.

I´ll analyse them better to see if this sort of return value (1) is really what i´m thinking (unregistered converter)

Why analysing all of this, instead simply forcing the convertion ?

Well...it is better know what those converters really returns and warn the user to update his system or allow that the app can update it for him.

About write.wpc errors there are some material here
http://www.wincert.net/tips/microsoft-windows/windows-7/1786-word-cannot-start-the-converter-mswrd632wpc-error
http://support.microsoft.com/kb/973904/pt-br
http://support.microsoft.com/kb/973904/en-us
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

TouEnMasm

Mswrite and mswrd632 (\Import\MSWord6.wpc) don't work for me also.

Fix from microsoft are simples,delet the key ,that is don't use the converter.
I can do these myself choosing another converter.
On xp i have six words converters,two can be on defaut,4 are enough.

The word export converters seems to have a special usage.
They don't need a rtf file (as expexted from the RtfToForeign32 function) but need a word file to export.They give an "invalid format" with RTF.
There is a search to do , find which output format they accept (ghclass) .
The value of ghclass in:
RtfToForeign32,ghszFile, NULL, ghBuf,ghclass,ReadCallback
Explain of this is in in the sdk32.doc
Quote
Converting RTF into a Foreign Binary File or Embedding
Word saves documents to non-Word formats by calling RtfToForeign32.

Fa is a musical note to play with CL