The MASM Forum

General => The Workshop => Topic started by: jj2007 on April 25, 2023, 08:52:48 PM

Title: Raymond Chen: Why is Zip folders support stuck at the turn of the Century?
Post by: jj2007 on April 25, 2023, 08:52:48 PM
Raymond Chen, May 15th, 2018 (https://devblogs.microsoft.com/oldnewthing/20180515-00/?p=98755):
QuoteOne of the terms of the license is that the compression and decompression code for Zip folders should be tied to UI actions and not be programmatically drivable

There is C:\Windows\System32\zipfldr.dll but it's not really usable. There are several threads on the VB6 forum (https://www.vbforums.com/showthread.php?808681-VB6-Create-a-ZIP-file-without-any-DLL-depends-using-IStorage-and-IDropTarget) dealing with zipping files without external DLLs; one day, I'll try to port it to Assembly.

This is Micros*t: no zipping and unzipping because, well, that was a third party thing - too complicated for Redmond. Essential but simple functions such as zipping or the RichEdit control (http://masm32.com/board/index.php?topic=5383.0) suck because apparently all the M$ staff is busy adding another gigabyte of useless crap to the C:\Windows\winsxs folder :cool:

P.S. unzipping is easy, here is code to install UAsm on your machine:

include \masm32\MasmBasic\MasmBasic.inc
  Init
  UnzipInit "http://www.terraspace.co.uk/uasm256_x64.zip" ; UnzipInit expects a filename or URL, returns #files in edx
  lea ecx, [edx-1]
  .Repeat
.if Instr_(Files$(ecx), "uasm64.exe")
.if Exist("\Masm32\bin\uasm64.exe")
PrintLine "Old: ", GfDate$(-1), Spc2$, GfTime$(-1), CrLf$, "New: ", GfDate$(ecx), Spc2$, GfTime$(ecx)
MsgBox 0, "Overwrite existing Uasm64.exe?", "Hi", MB_YESNO
.endif
.Break .if eax==IDNO
UnzipFile(ecx, Cat$(Left$(MbExeFolder$)+":\Masm32\bin\"))
MsgBox 0, "UAsm successfully installed", "Hi", MB_OK
.Break
.endif
dec ecx
  .Until Sign?
  UnzipExit
EndOfCode
Title: Re: Raymond Chen: Why is Zip folders support stuck at the turn of the Century?
Post by: NoCforMe on April 26, 2023, 02:18:33 AM
Quote from: jj2007 on April 25, 2023, 08:52:48 PM
This is Micros*t: no zipping and unzipping because, well, that was a third party thing - too complicated for Redmond.

I hate to defend Micro$oft, I really do, but according to what I read on Ray Chen's blog, it's a matter of contractual limitations, not laziness on Redmond's part:

Quote
Bonus chatter: On of the terms of the license is that the compression and decompression code for Zip folders should be tied to UI actions and not be programmatically drivable. The main product for the company that provided the compression and decompression code is the compression and decompression code itself. If Windows allowed programs to compress and decompression files by driving the shell namespace directly, then that company would have given away their entire business!
Title: Re: Raymond Chen: Why is Zip folders support stuck at the turn of the Century?
Post by: jj2007 on April 26, 2023, 03:58:37 AM
Quote from: NoCforMe on April 26, 2023, 02:18:33 AMit's a matter of contractual limitations, not laziness on Redmond's part

It's laziness, because they should have done it in-house. Zipping is a pretty essential feature of an OS.
Title: Re: Raymond Chen: Why is Zip folders support stuck at the turn of the Century?
Post by: Vortex on April 26, 2023, 04:58:52 AM
QuoteIt's laziness, because they should have done it in-house. Zipping is a pretty essential feature of an OS.

I agree with Jochen. The command prompt of Linux, the Terminal is providing the zip and unzip tool since long time.
Title: Re: Raymond Chen: Why is Zip folders support stuck at the turn of the Century?
Post by: daydreamer on April 26, 2023, 02:58:29 PM
Quote from: Vortex on April 26, 2023, 04:58:52 AM
QuoteIt's laziness, because they should have done it in-house. Zipping is a pretty essential feature of an OS.
I agree with Jochen. The command prompt of Linux, the Terminal is providing the zip and unzip tool since long time.
Even my latest android tablet provide zip and unzip
Most programmable driven I seen long ago a library that let's you read directly from zip files
Title: Re: Raymond Chen: Why is Zip folders support stuck at the turn of the Century?
Post by: jj2007 on April 26, 2023, 06:23:18 PM
Quote from: daydreamer on April 26, 2023, 02:58:29 PM
Most programmable driven I seen long ago a library that let's you read directly from zip files
I've seen one, too (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1234) :thumbsup:

TinyZip looks promising (https://github.com/bitstorm/tiny-zip), but it's Java :rolleyes:
Title: Re: Raymond Chen: Why is Zip folders support stuck at the turn of the Century?
Post by: mineiro on April 26, 2023, 10:50:10 PM
Quote from: jj2007 on April 26, 2023, 03:58:37 AM
Zipping is a pretty essential feature of an OS.

I disagree.

There's tons of source codes in internet about data compression. Dictionaries (LZ family), transformers (BWT,FIB), encoding (AC,ANS,Huffman), PPM (predict partial match), joining A.I (slow processing) to this context mixing and entropy of target model is reach.

The entropy can be reached (arithmetic coding, asymmetric numeral system, and under some circumstances Huffman). The main problem until today is the used model. The best model (universal model) cannot be predicted.

So, the real question is:
Why should be ZIP?
Title: Re: Raymond Chen: Why is Zip folders support stuck at the turn of the Century?
Post by: TimoVJL on April 26, 2023, 10:50:30 PM
my favorite is MiniZ.
My help system use it.
Title: Re: Raymond Chen: Why is Zip folders support stuck at the turn of the Century?
Post by: jj2007 on April 26, 2023, 10:55:12 PM
My favourite is tinf.lib, it weighs in at 3,662 bytes and can be asked from Assembly to extract a file:
include \masm32\MasmBasic\MasmBasic.inc
  Init
  UnzipInit "http://www.jj2007.eu/Bible.zip"
  UnzipFile(0, "C:\Masm32")     ; extract C:\Masm32\Bible.txt
EndOfCode


Unfortunately, it does only unzipping, but no compression. That's why I am looking for a tiny compression library.


Quote from: mineiro on April 26, 2023, 10:50:10 PMThere's tons of source codes in internet about data compression.

Great. Can you post Assembly code that compresses some files using one of these source codes?
Title: Re: Raymond Chen: Why is Zip folders support stuck at the turn of the Century?
Post by: mineiro on April 26, 2023, 11:13:57 PM
I prefer paq series.
https://mattmahoney.net/dc/paq.html

Z sound to be the default these days by big techs.
https://mattmahoney.net/dc/zpaq.html

Well, I know it's outside this scope, but look for big bang theory.
James Webb discovered some galaxies that should not exist following actual model (3.4 to 3.6 billions years). So, this means that to a galaxy be created it's necessary time and by discovering that galaxies means that model is wrong.
What scientists will do? Well, update forever the model at each new discoveries found. Or just throw away this model and select other.
That's the importance of model in science (also data compression).
Title: Re: Raymond Chen: Why is Zip folders support stuck at the turn of the Century?
Post by: mineiro on April 26, 2023, 11:21:54 PM
Quote from: jj2007 on April 26, 2023, 10:55:12 PM
Great. Can you post Assembly code that compresses some files using one of these source codes?
Oh, really sorry, I only read this now.
Yes, give to me 2 days and I can release. Well, I will release to linux, is this ok? (having in mind that will be translated to windows). I have some of these codes here, I only need find and polish.
Title: Re: Raymond Chen: Why is Zip folders support stuck at the turn of the Century?
Post by: mineiro on April 28, 2023, 12:14:25 PM
This is what I found in this pc, only 2 compression/decompression programs.
One is Arithmetic Coding model order 0, byte based chars
Other is Dynamic Markov Chain with Guazzo arithmetic coding, bit based char.

I have tried to clean and comment the most that I can but probably have variables not used in source code.
I used uasm version 255 in linux x86_64.

PS: If you try to compress a big file (above 100MB), have in mind that dmc can fail. I think that the reason is lost of SIMD (real4 or real8) accuracy. But you can divide that giant file in N byte blocks and compress each block as an alternative.
PS2: The most easy to play and change is ac. Try order 1 model (digraphs) or 3 (trigraphs) and compression will be better.  Actual compression result is not the best, but this example is the best to be changed.
Title: Re: Raymond Chen: Why is Zip folders support stuck at the turn of the Century?
Post by: jj2007 on April 28, 2023, 04:35:21 PM
Great stuff, mineiro, but what I wanted is
a) a library that works under Windows
b) a library that understands invoke AddToZipArchive, pZipfile, pAddfile

Can you write a little wrapper, please?

I need it for zipping only; for unzipping, I have already UnzipFile (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1234):

include \masm32\MasmBasic\MasmBasic.inc
  Init
  UnzipInit "http://www.jj2007.eu/Bible.zip" ; file or URL
  UnzipFile(0, "C:\Masm32") ; extract C:\Masm32\Bible.txt
EndOfCode
Title: Re: Raymond Chen: Why is Zip folders support stuck at the turn of the Century?
Post by: mineiro on April 28, 2023, 08:51:25 PM
I'm not playing with Zip sir jj2007, I don't know how their header work, to me will be hard to create something and be compatible with zip standards.
I think I can't do this.
If I found some usable windows library (.dll) in internet and do some tests that work then I come back. But for a while, I can't.

Title: Re: Raymond Chen: Why is Zip folders support stuck at the turn of the Century?
Post by: morgot on April 29, 2023, 09:56:47 PM
I also don't understand why winapi doesn't have any ZIP-archiving function. There is a lot of garbage like "infrared sockets", but this is not. There is rtldecopressbuffer api, which is absolutely stupid and incompatible with anything.
Title: Re: Raymond Chen: Why is Zip folders support stuck at the turn of the Century?
Post by: jj2007 on April 29, 2023, 10:50:42 PM
Quote from: morgot on April 29, 2023, 09:56:47 PM
I also don't understand why winapi doesn't have any ZIP-archiving function. There is a lot of garbage like "infrared sockets", but this is not. There is rtldecopressbuffer api, which is absolutely stupid and incompatible with anything.

Yes, that's the point, Morgot :thumbsup:

Thanks for pointing me (indirectly) to RtlCompressBuffer (https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-rtlcompressbuffer) - looks interesting :rolleyes:

Marpon's code looks good (https://www.freebasic.net/forum/viewtopic.php?p=235436#p235436). However, RtlCompressBuffer doesn't seem that useful:
QuoteCraig_Barkhouse (https://community.osr.com/discussion/comment/181107/#Comment_181107)
April 2010
RtlCompressBuffer() implements LZNT1 compression. It's fast enough and good enough, though not great. You can only compress up to 64K at a time (hardcoded limit), which really limits the overall compression ratio you can achieve when compressing a large stream. This is the compression routine that NTFS uses for its compression feature. NTFS compresses a file in units of up to 64K at a time, which jives nicely with the hardcoded limit in RtlCompressBuffer().

Explorer's "Send to compressed (zipped) folder" feature simply creates a .zip file, using the public domain ZIP format.

CABARC creates .cab files, which use the Cabinet format. Cabinet is a proprietary Microsoft format which is most definitely not the same as the ZIP format. The codec options (MSZIP, LZX) are also proprietary Microsoft formats and not the same as those used in ZIP. MSZIP is based on Huffman, while LZX is based on Lempel-Ziv and is generally much better than MSZIP. Note that Cabinet takes advantage of cross-file compression (giving Cabinet a compression advantage over ZIP), whereas ZIP compresses each file individually (giving ZIP a speed advantage when extracting a single file).
Title: Re: Raymond Chen: Why is Zip folders support stuck at the turn of the Century?
Post by: HSE on April 29, 2023, 11:19:47 PM
There is a Compression API beginning with Windows 8.
Title: Re: Raymond Chen: Why is Zip folders support stuck at the turn of the Century?
Post by: jj2007 on April 29, 2023, 11:44:31 PM
Interesting (https://learn.microsoft.com/en-us/windows/win32/api/compressapi/nf-compressapi-compress), but apparently you still need to handle the composition of the zip archive, which is not trivial.
Title: Re: Raymond Chen: Why is Zip folders support stuck at the turn of the Century?
Post by: daydreamer on April 30, 2023, 02:54:28 PM
That ntfs compression paper reminded me of ntfs compressed folder on PC copied to my Android tablet got zipped in there
Title: Re: Raymond Chen: Why is Zip folders support stuck at the turn of the Century?
Post by: morgot on May 03, 2023, 09:34:30 PM
Hello, jj2007

This feature is useful when passing data from MS Windows to another Windows. And if you need to decrypt on Linux / Web/ etc? I didn't find an algorithm that decodes api RtlCompressBuffer. As I know, LZNT is a proprietary MS format. Same as .CAB.
Title: Re: Raymond Chen: Why is Zip folders support stuck at the turn of the Century?
Post by: jj2007 on May 11, 2023, 07:36:12 PM
Just stumbled over a strange phenomenon:
- there used to be an option to add a comment to a zip file, and it was stored with the file
- not so with 7-zip: it creates a file Descript.ion, which is a really ugly method
- however, comments to individual files are stored with the archive, and they crash Ibsen's tinf library*)

WinZip (https://kb.winzip.com/help/help_comment.htm):
A comment is optional text information that is embedded in a Zip file

*) P.S., under the hood:
Cß7âƒÙCommentPK?

The start of the decoding buffer should have the PK (Phil Katz) string, but it actually points at "Comment".
Title: Re: Raymond Chen: Why is Zip folders support stuck at the turn of the Century?
Post by: jj2007 on May 12, 2023, 10:03:45 AM
I've installed the WinZip trial now. It eats up gigabytes of disk space and has a horrible, overloaded ribbon interface, but it handles the by-archive comments correctly. The installation adds over one MB of crap to the registry, and adds or changes several hundred registry entries.
Title: Re: Raymond Chen: Why is Zip folders support stuck at the turn of the Century?
Post by: mineiro on June 01, 2023, 11:23:39 PM
I was reading about this:
https://blogs.windows.com/windowsdeveloper/2023/05/23/bringing-the-power-of-ai-to-windows-11-unlocking-a-new-era-of-productivity-for-customers-and-developers-with-windows-copilot-and-dev-home/

windows 11
Quote
We have added native support for additional archive formats, including tar, 7-zip, rar, gz and many others using the libarchive open-source project. You now can get improved performance of archive functionality during compression on Windows.

I think this will be the path to follow.
https://www.libarchive.org/
https://github.com/libarchive/libarchive/wiki/Examples