Hi guys
I´m doing a translator using google server to translate a huge text (around 45 Mb) from english to portuguese.
So far, i suceeded to make it work But, i faced some weird Http status code error 503 after downloading some chunck of code. How to overcome this ?
The main function i built to download chuncks of data from google (or any other site) is:
Proc XMLDownloadtoFile:
Arguments @pUrlString, @pSzAgent, @pSzFeedHeaders
Local @hFeed, @hFeedURL, @XML_Size, @IsMemFilled, @lpdwNumberOfBytesAvailable, @XML_lpdwNumberOfBytesRead, @OldMemBuffer, @StoredBuffer, @CurMem, @StatusCode, @BuffLen, @dwIndex
Uses edi, ecx, edx, esi, ebx
mov D@OldMemBuffer 0
mov D@lpdwNumberOfBytesAvailable 0
mov D@XML_Size 0
mov D@IsMemFilled 0
mov D@XML_lpdwNumberOfBytesRead 0
mov D@StoredBuffer 0
mov D@StatusCode 0
mov D@BuffLen 4
mov D@dwIndex 0
;call 'wininet.InternetOpenA' D@pSzAgent, &INTERNET_OPEN_TYPE_DIRECT, &NULL, &NULL, 0
call 'wininet.InternetOpenA' D@pSzAgent, &INTERNET_OPEN_TYPE_PRECONFIG, &NULL, &NULL, 0
On eax = 0, ExitP
mov D@hFeed eax
call 'wininet.InternetOpenUrlA' eax, D@pUrlString, D@pSzFeedHeaders, 0-1, &INTERNET_FLAG_RELOAD__&INTERNET_FLAG_PRAGMA_NOCACHE, 0
...If eax <> 0
mov D@hFeedURL eax
lea ebx D@StatusCode
lea ecx D@BuffLen
lea edx D@dwIndex
call 'wininet.HttpQueryInfoA' eax, &HTTP_QUERY_STATUS_CODE__&HTTP_QUERY_FLAG_NUMBER, ebx, ecx, edx
If D@StatusCode <> 200
; Report Http Status Code Error
call HttpErrorCode D@StatusCode
call 'wininet.InternetCloseHandle' D@hFeedURL
call 'wininet.InternetCloseHandle' D@hFeed
;mov eax D@StoredBuffer
xor eax eax
ExitP
End_If
mov eax D@hFeedURL
..While eax <> 0
lea edx D@lpdwNumberOfBytesAvailable
call 'wininet.InternetQueryDataAvailable' D@hFeedURL, edx, 0, 0
..If D@lpdwNumberOfBytesAvailable = 0
xor eax eax
..Else
mov ecx D@lpdwNumberOfBytesAvailable
add ecx D@XML_Size
;call AllocateMemory ecx
mov D@CurMem 0 | lea eax D@CurMem
call 'RosMem.VMemAlloc' eax, ecx
mov D@StoredBuffer eax
.If D@IsMemFilled = &TRUE
call CopyMemory eax, D@OldMemBuffer, D@XML_Size
add eax D@XML_Size | mov B$eax 0
; mov esi D@OldMemBuffer
; mov edi eax
; While B$esi <> 0 | movsb | End_While
;call FreeMemory D@OldMemBuffer
call 'RosMem.VMemFree' D@OldMemBuffer
mov D@OldMemBuffer 0
.End_If
mov eax D@StoredBuffer
add eax D@XML_Size
lea edx D@XML_lpdwNumberOfBytesRead
call 'wininet.InternetReadFile' D@hFeedURL, eax, D@lpdwNumberOfBytesAvailable, edx
mov ecx D@XML_lpdwNumberOfBytesRead
add D@XML_Size ecx
mov D@IsMemFilled &TRUE
move D@OldMemBuffer D@StoredBuffer
..End_If
..End_While
call 'wininet.InternetCloseHandle' D@hFeedURL
call 'wininet.InternetCloseHandle' D@hFeed
mov eax D@StoredBuffer
...Else
call 'wininet.InternetCloseHandle' D@hFeed
xor eax eax
...End_If
EndP
Proc HttpErrorCode:
Arguments @ErrorValue
Structure @StringtoAdd 64, @StringtoAdd_DataDis 0
Uses ebx, ecx, edx
C_call 'msvcrt.sprintf' D@StringtoAdd, {B$ 'HTTP Error Code value = %d', 0}, D@ErrorValue
call 'User32.MessageBoxA' &NULL, D@StringtoAdd, { B$ "Connection Error", 0}, &MB_ICONERROR
EndP
The XMLDownloadtoFile does not only download xml files, it is used to download any kind of file, disregarding the name i gave to the function.
The main problem is that after translating some chuncks of the text, the server returned this annoying error. I tried to avoid that using a sleep function before the main call to XMLDownloadtoFile, like this:
Proc FullTranslate:
Arguments @pString
Local @FirstPass, @StringLen, @pReturnTranslatedBuffer, @TmpBuffSize, @TmpOutBuffer, @TranslatedText, @CharsCount, @EndString
Uses esi, edi, ecx, ebx
mov D@FirstPass 0
; 1st we get the total size of our string to be translated
call StrLenProc D@pString | mov D@StringLen eax
...If D@StringLen <= MAX_GOOGLE_BYTES ; Google limit is only 5000 chars. So if our text is smaller or equal to 5000 we go to this routine and translated it at once
;lea eax D@TranslatedText | mov D@TranslatedText 0
lea eax D@pReturnTranslatedBuffer | mov D@pReturnTranslatedBuffer 0
call CreateTextTranslateGoogle D@pString, eax ; <----------- Inside thsi function has the main routines to the buffer allocation and the XMLDownloadtoFile fucntion
If eax = 0-1
mov eax D@pReturnTranslatedBuffer
Else_If eax <> 0
;mov esi D@TranslatedText
;mov D@pReturnTranslatedBuffer esi
;mov eax D@pReturnTranslatedBuffer
mov eax D@pReturnTranslatedBuffer
End_If
...Else <----------------------------------------------- Otherwise, if the text file contains more then 5000 chars, we are translating it by chunks, taking care of the lexical routines.
; <----------------------------------------------- I mean, when the end of the text reaches 5000 chars on each chunk it starts searching backward for a "." (dots - end of sentence) and translated it until this new end
mov D@CharsCount 0
mov esi D@pString
shl eax 2 | Align_On 4 eax | mov D@TmpBuffSize eax
lea eax D@TmpOutBuffer | mov D@TmpOutBuffer 0
call CreateOutputDataBase eax, D@TmpBuffSize
;mov edi eax | mov D@CopyStart edi
mov edi eax | mov D@pReturnTranslatedBuffer edi
mov ecx D@StringLen
mov edx esi | add edx ecx | mov D@EndString edx
.Do
lea eax D@TranslatedText | mov D@TranslatedText 0
call GetTranslationChunck esi, eax, D@EndString ; <----------- Inside this function has the main routines to the buffer allocation and the XMLDownloadtoFile function
If eax = 0
jmp L9>
Else_If eax = 0-1 ; <---------------------------------------- This is the results of the error generated by HTTP Status Code 503. I settle eax to 0-1 to distinguish the error type only and avoid crashing
call 'RosMem.VMemFree' D@TranslatedText
mov eax D@pReturnTranslatedBuffer
ExitP
End_If
add esi eax
sub ecx MAX_GOOGLE_BYTES
;add D@CharsCount eax
call 'KERNEL32.Sleep' 500 ; <----- A sleep to try avoinding the Error 503
;..If ecx > MAX_GOOGLE_BYTES
;mov eax eax
mov ebx D@TranslatedText
.If D@FirstPass <> 0
; The next string at esi starts with '[[["'. Bypass this to only '['
add ebx 2 ; we are at 1st '['
; The next string ends with '[[["' and ends with "],null,"en"]", go back to it
; The previous string at edi ends "],null,"en"]", go back to it
Do
dec edi
Loop_Until D$edi = '],nu'
; it must be replaced with ],[
mov B$edi "," | inc edi
.Else
mov D@FirstPass 1
.End_If
;call CopyString D@TranslatedText, edi
call CopyString ebx, edi
add edi eax
;..End_If
;call CopyString D@TranslatedText, edi
;add edi eax
; free allocated memory from translated chunck
call 'RosMem.VMemFree' D@TranslatedText
inc D$DummyTextPass
;.Loop_Until ecx =<s MAX_GOOGLE_BYTES;= D@StringLen
.Loop_Until B$esi = 0
L9:
; mov ecx esi | sub ecx D@pString
.If B$esi <> 0;ecx > 0
; we still have remainder
lea eax D@TranslatedText | mov D@TranslatedText 0
call CreateTextTranslateGoogle esi, eax
If eax = 0
ExitP
Else_If eax = 0-1
call 'RosMem.VMemFree' D@TranslatedText
mov eax D@pReturnTranslatedBuffer
ExitP
End_If
mov esi D@TranslatedText
; The next string at esi starts with '[[["'. Bypass this to only '['
add esi 2 ; we are at 1st '['
; The next string ends with '[[["' and ends with "],null,"en"]", go back to it
; The previous string at edi ends "],null,"en"]", go back to it
Do
dec edi
Loop_Until D$edi = '],nu'
; it must be replaced with ],[
mov B$edi "," | inc edi
call CopyString esi, edi
call 'RosMem.VMemFree' D@TranslatedText
.End_If
;mov esi D@CopyStart
mov eax D@pReturnTranslatedBuffer
...End_If
EndP
Google strings to translate are simple a pass to the server like this:
https://translate.googleapis.com/translate_a/single?client=gtx&sl=en&tl=pt&dt=t&q=Hello world. My name is Guga
Or...i can also use some of the escape chars for urls as well. (This is what the EncodeString function actually do. But, for google purposes it seems it only needs to convert the Carriage and Line Feed to %0D%0. All the other chars seems to be translated ok) One minor isseus are found on the resultant text that may contaions weird chars not necessarily related to UTF-8, like (\r\n that is the decoded of Carriage and Line feed, etc).
So..the question is..how to avoid google to return this error of status 503, so i can translate the whole text at once ?
Btw...a Example of usage of XMLDownloadtoFile function is as this:
[lpszHeaders5: B$ "Host: translate.googleapis.com
Accept: */*
", 0]
call XMLDownloadtoFile D@StringtoTranslate, {B$ "Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gercko", 0}, lpszHeaders5
I don't understand what you posted because my linguistic capabilities have a limit. However, I can tell you that you receive from Google in direct relation to what you pay for. Its translation API will produce errors after too many uses.
Last century, I had an application called Babel Fish Direct (https://www.trichview.com/applications/babelfishdirect.html) and I had to discontinue it after AltaVista (Google) started causing issues. Then they ended removing the API but I understand they have a new one, but History always repeats.
Do you recreate session between calls / translations ?
Quote from: AW on April 28, 2019, 07:32:41 PM
I don't understand what you posted because my linguistic capabilities have a limit. However, I can tell you that you receive from Google in direct relation to what you pay for. Its translation API will produce errors after too many uses.
Last century, I had an application called Babel Fish Direct (https://www.trichview.com/applications/babelfishdirect.html) and I had to discontinue it after AltaVista (Google) started causing issues. Then they ended removing the API but I understand they have a new one, but History always repeats.
Babel fish direct ? Sounds interesting. Do you still have it ? I would like to take a look and see how you managed to download from babel fish. Maybe it can be done using it instead google translator. Which server is used to perform the translation ? (I mean, the strings that are use in InternetOpenUrl api)
Quote from: TimoVJL on April 28, 2019, 10:26:19 PM
Do you recreate session between calls / translations ?
Hi timo
Not sure i understood. What you mean with recreate sessions ? What i did was creating the translation in chunks before it reaches the limit of 5000 chars.
The steps are done like this for huge files:
1 - get the 1st chunk containing a maximum of 5000 chars
2 - Seek inside the chunk for it´s true end. So it will search for the last chars of "?", "!", "." before the limit since those chars represents the true end of a sentence. So, if the limited text (5000 chars) have the last "." "?" or "!" at position 4700, it will then start translating the text from pos 0 (beginning) to 4700. Pos 4700 is saved to continue later
3 - After translating the chunk (4700 bytes, for example) it will restart the translation from pos 4700 to the next 5000 chars (so, at a max pos of 9700) and do the analysis of "?", "." "!" again. After analysing the remaining text it will then continue translating the 2nd chunk. Ex: 2nd chunk may have the "?' "." "!" at pos 9251.
4 - The computations of analysing a text for the true end and translate will be done untill the end of the huge text (byte 0)
One minor question...How to allow a pause/resume downloading/translate ? And how to restart internet connection after it reaches the 503 error ? I tried using sleep api forcing it to wait 5 minutes before it goes inside XMLDownloadtoFile function again, but it didn´t worked.
I meant connection open / close between queries.
How many times it allows to use service ?
I may have it somewhere but it is in Delphi and closed source.
Anyway, it used a protocol called SOAP (https://en.wikipedia.org/wiki/SOAP), which may have disappeared in the meantime because I am not hearing about since long.
Quote from: AW on April 29, 2019, 12:01:47 AM
I may have it somewhere but it is in Delphi and closed source.
Anyway, it used a protocol called SOAP (https://en.wikipedia.org/wiki/SOAP), which may have disappeared in the meantime because I am not hearing about since long.
Thanks a lot. If you succeed to find and post, i´ll take a look. Never heard of SOAP protocol before.
Quote from: TimoVJL on April 28, 2019, 11:56:33 PM
I meant connection open / close between queries.
How many times it allows to use service ?
I´m not sure. It depends. If i don´t use a sleep function, it will stop working after something around 30-40 open/close. So it translate only 200 Kb of text before it stops (The translation is fast. Take around a couple of seconds to translate 200 Kb). But, If i use a sleep function and wait something around 10-13 seconds between each 5000 limits of text, then it can go further but stops after translating something around 2 Mb of text (The main problems of using a sleep fucntion is that it will then take 1 hour to translate only 2 Mb of text (or less)
Perhaps forcing it to send a different IP (a renewed one - Or machine information) to the server on each 5000 translation should do the fix, but i don´t know how to programatically change the IP address or the necessary information to send to google server to allow me to continue the translations without stopping/blocking it.
i read that the error is related to a captcha solving. But how to do it in masm ? I mean, whenever face the 503 error, open the web page to solve the captcha and go back ?
https://support.google.com/websearch/answer/86640
And here it seems to have some info about those captchas
http://codewa.com/question/107867.html
How to make it appear on a http web dialog ?
here are some others examples similar to what i´m doing. Except, mine is for large text files.
https://www.codeproject.com/Articles/12711/Google-Translator?msg=5161148#xx5161148xx
I don't think you can handle that easily in any programming language, very likely it will be close to impossible.
Google wants you to pay for heavy usage.
Quote from: guga on April 29, 2019, 02:33:14 AM
How to make it appear on a http web dialog ?
keywords: AtlAxWinInit CreateWindow("AtlAxWin"
EDIT: don't work with translate.google.com captcha :(
Thanks a lot Timo.
Can you make it work ? I mean, it is showing the captcha but not returning after pressing the verify btn
Btw...Here is some part of the translated files.
Parte01_en - The part of the English text to be translated.
Parte01_PT_Decoded - The same part above, translated to portuguese
I couldn´t upload it here due to the size. So, i posted it in https://we.tl/t-DOtUDR8R9p
You can also test on this site (actually, it is mine but I have never used it, lol):
http://tests.vgpt.com/captach.aspx
The captcha has a "site key" and "secret key", which we get free from Google. So, this is not a demo like the other page from google. On the other end the ATL control host fails to load it - chokes with the Javascript.