Hi guys
I´m doing a translator using google server to translate a huge text (around 45 Mb) from english to portuguese.
So far, i suceeded to make it work But, i faced some weird Http status code error 503 after downloading some chunck of code. How to overcome this ?
The main function i built to download chuncks of data from google (or any other site) is:
Proc XMLDownloadtoFile:
Arguments @pUrlString, @pSzAgent, @pSzFeedHeaders
Local @hFeed, @hFeedURL, @XML_Size, @IsMemFilled, @lpdwNumberOfBytesAvailable, @XML_lpdwNumberOfBytesRead, @OldMemBuffer, @StoredBuffer, @CurMem, @StatusCode, @BuffLen, @dwIndex
Uses edi, ecx, edx, esi, ebx
mov D@OldMemBuffer 0
mov D@lpdwNumberOfBytesAvailable 0
mov D@XML_Size 0
mov D@IsMemFilled 0
mov D@XML_lpdwNumberOfBytesRead 0
mov D@StoredBuffer 0
mov D@StatusCode 0
mov D@BuffLen 4
mov D@dwIndex 0
;call 'wininet.InternetOpenA' D@pSzAgent, &INTERNET_OPEN_TYPE_DIRECT, &NULL, &NULL, 0
call 'wininet.InternetOpenA' D@pSzAgent, &INTERNET_OPEN_TYPE_PRECONFIG, &NULL, &NULL, 0
On eax = 0, ExitP
mov D@hFeed eax
call 'wininet.InternetOpenUrlA' eax, D@pUrlString, D@pSzFeedHeaders, 0-1, &INTERNET_FLAG_RELOAD__&INTERNET_FLAG_PRAGMA_NOCACHE, 0
...If eax <> 0
mov D@hFeedURL eax
lea ebx D@StatusCode
lea ecx D@BuffLen
lea edx D@dwIndex
call 'wininet.HttpQueryInfoA' eax, &HTTP_QUERY_STATUS_CODE__&HTTP_QUERY_FLAG_NUMBER, ebx, ecx, edx
If D@StatusCode <> 200
; Report Http Status Code Error
call HttpErrorCode D@StatusCode
call 'wininet.InternetCloseHandle' D@hFeedURL
call 'wininet.InternetCloseHandle' D@hFeed
;mov eax D@StoredBuffer
xor eax eax
ExitP
End_If
mov eax D@hFeedURL
..While eax <> 0
lea edx D@lpdwNumberOfBytesAvailable
call 'wininet.InternetQueryDataAvailable' D@hFeedURL, edx, 0, 0
..If D@lpdwNumberOfBytesAvailable = 0
xor eax eax
..Else
mov ecx D@lpdwNumberOfBytesAvailable
add ecx D@XML_Size
;call AllocateMemory ecx
mov D@CurMem 0 | lea eax D@CurMem
call 'RosMem.VMemAlloc' eax, ecx
mov D@StoredBuffer eax
.If D@IsMemFilled = &TRUE
call CopyMemory eax, D@OldMemBuffer, D@XML_Size
add eax D@XML_Size | mov B$eax 0
; mov esi D@OldMemBuffer
; mov edi eax
; While B$esi <> 0 | movsb | End_While
;call FreeMemory D@OldMemBuffer
call 'RosMem.VMemFree' D@OldMemBuffer
mov D@OldMemBuffer 0
.End_If
mov eax D@StoredBuffer
add eax D@XML_Size
lea edx D@XML_lpdwNumberOfBytesRead
call 'wininet.InternetReadFile' D@hFeedURL, eax, D@lpdwNumberOfBytesAvailable, edx
mov ecx D@XML_lpdwNumberOfBytesRead
add D@XML_Size ecx
mov D@IsMemFilled &TRUE
move D@OldMemBuffer D@StoredBuffer
..End_If
..End_While
call 'wininet.InternetCloseHandle' D@hFeedURL
call 'wininet.InternetCloseHandle' D@hFeed
mov eax D@StoredBuffer
...Else
call 'wininet.InternetCloseHandle' D@hFeed
xor eax eax
...End_If
EndP
Proc HttpErrorCode:
Arguments @ErrorValue
Structure @StringtoAdd 64, @StringtoAdd_DataDis 0
Uses ebx, ecx, edx
C_call 'msvcrt.sprintf' D@StringtoAdd, {B$ 'HTTP Error Code value = %d', 0}, D@ErrorValue
call 'User32.MessageBoxA' &NULL, D@StringtoAdd, { B$ "Connection Error", 0}, &MB_ICONERROR
EndP
The XMLDownloadtoFile does not only download xml files, it is used to download any kind of file, disregarding the name i gave to the function.
The main problem is that after translating some chuncks of the text, the server returned this annoying error. I tried to avoid that using a sleep function before the main call to XMLDownloadtoFile, like this:
Proc FullTranslate:
Arguments @pString
Local @FirstPass, @StringLen, @pReturnTranslatedBuffer, @TmpBuffSize, @TmpOutBuffer, @TranslatedText, @CharsCount, @EndString
Uses esi, edi, ecx, ebx
mov D@FirstPass 0
; 1st we get the total size of our string to be translated
call StrLenProc D@pString | mov D@StringLen eax
...If D@StringLen <= MAX_GOOGLE_BYTES ; Google limit is only 5000 chars. So if our text is smaller or equal to 5000 we go to this routine and translated it at once
;lea eax D@TranslatedText | mov D@TranslatedText 0
lea eax D@pReturnTranslatedBuffer | mov D@pReturnTranslatedBuffer 0
call CreateTextTranslateGoogle D@pString, eax ; <----------- Inside thsi function has the main routines to the buffer allocation and the XMLDownloadtoFile fucntion
If eax = 0-1
mov eax D@pReturnTranslatedBuffer
Else_If eax <> 0
;mov esi D@TranslatedText
;mov D@pReturnTranslatedBuffer esi
;mov eax D@pReturnTranslatedBuffer
mov eax D@pReturnTranslatedBuffer
End_If
...Else <----------------------------------------------- Otherwise, if the text file contains more then 5000 chars, we are translating it by chunks, taking care of the lexical routines.
; <----------------------------------------------- I mean, when the end of the text reaches 5000 chars on each chunk it starts searching backward for a "." (dots - end of sentence) and translated it until this new end
mov D@CharsCount 0
mov esi D@pString
shl eax 2 | Align_On 4 eax | mov D@TmpBuffSize eax
lea eax D@TmpOutBuffer | mov D@TmpOutBuffer 0
call CreateOutputDataBase eax, D@TmpBuffSize
;mov edi eax | mov D@CopyStart edi
mov edi eax | mov D@pReturnTranslatedBuffer edi
mov ecx D@StringLen
mov edx esi | add edx ecx | mov D@EndString edx
.Do
lea eax D@TranslatedText | mov D@TranslatedText 0
call GetTranslationChunck esi, eax, D@EndString ; <----------- Inside this function has the main routines to the buffer allocation and the XMLDownloadtoFile function
If eax = 0
jmp L9>
Else_If eax = 0-1 ; <---------------------------------------- This is the results of the error generated by HTTP Status Code 503. I settle eax to 0-1 to distinguish the error type only and avoid crashing
call 'RosMem.VMemFree' D@TranslatedText
mov eax D@pReturnTranslatedBuffer
ExitP
End_If
add esi eax
sub ecx MAX_GOOGLE_BYTES
;add D@CharsCount eax
call 'KERNEL32.Sleep' 500 ; <----- A sleep to try avoinding the Error 503
;..If ecx > MAX_GOOGLE_BYTES
;mov eax eax
mov ebx D@TranslatedText
.If D@FirstPass <> 0
; The next string at esi starts with '[[["'. Bypass this to only '['
add ebx 2 ; we are at 1st '['
; The next string ends with '[[["' and ends with "],null,"en"]", go back to it
; The previous string at edi ends "],null,"en"]", go back to it
Do
dec edi
Loop_Until D$edi = '],nu'
; it must be replaced with ],[
mov B$edi "," | inc edi
.Else
mov D@FirstPass 1
.End_If
;call CopyString D@TranslatedText, edi
call CopyString ebx, edi
add edi eax
;..End_If
;call CopyString D@TranslatedText, edi
;add edi eax
; free allocated memory from translated chunck
call 'RosMem.VMemFree' D@TranslatedText
inc D$DummyTextPass
;.Loop_Until ecx =<s MAX_GOOGLE_BYTES;= D@StringLen
.Loop_Until B$esi = 0
L9:
; mov ecx esi | sub ecx D@pString
.If B$esi <> 0;ecx > 0
; we still have remainder
lea eax D@TranslatedText | mov D@TranslatedText 0
call CreateTextTranslateGoogle esi, eax
If eax = 0
ExitP
Else_If eax = 0-1
call 'RosMem.VMemFree' D@TranslatedText
mov eax D@pReturnTranslatedBuffer
ExitP
End_If
mov esi D@TranslatedText
; The next string at esi starts with '[[["'. Bypass this to only '['
add esi 2 ; we are at 1st '['
; The next string ends with '[[["' and ends with "],null,"en"]", go back to it
; The previous string at edi ends "],null,"en"]", go back to it
Do
dec edi
Loop_Until D$edi = '],nu'
; it must be replaced with ],[
mov B$edi "," | inc edi
call CopyString esi, edi
call 'RosMem.VMemFree' D@TranslatedText
.End_If
;mov esi D@CopyStart
mov eax D@pReturnTranslatedBuffer
...End_If
EndP
Google strings to translate are simple a pass to the server like this:
https://translate.googleapis.com/translate_a/single?client=gtx&sl=en&tl=pt&dt=t&q=Hello world. My name is Guga
Or...i can also use some of the escape chars for urls as well. (This is what the EncodeString function actually do. But, for google purposes it seems it only needs to convert the Carriage and Line Feed to %0D%0. All the other chars seems to be translated ok) One minor isseus are found on the resultant text that may contaions weird chars not necessarily related to UTF-8, like (\r\n that is the decoded of Carriage and Line feed, etc).
So..the question is..how to avoid google to return this error of status 503, so i can translate the whole text at once ?
Btw...a Example of usage of XMLDownloadtoFile function is as this:
[lpszHeaders5: B$ "Host: translate.googleapis.com
Accept: */*
", 0]
call XMLDownloadtoFile D@StringtoTranslate, {B$ "Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gercko", 0}, lpszHeaders5