News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

X86 ASM Programmer requested for compression algorithm

Started by SoraK05, June 09, 2014, 01:27:07 PM

Previous topic - Next topic

hutch--

What you will tend to find is that you try and manually optimise a collection of techniques like this after you have got the logic and algo choice right. Some make the mistake of thinking that assembler is going to make everything faster but unless you get it right you often end up with badly designed slow code. you will find at the design stage that a C compiler is more than fast enough and its only when you have the collection of techniques up to an advanced stage that manual optimisation becomes worthwhile.

There are a couple of areas that you can look at once you have the details right, instruction choice and design AND seeing if there is a way to run the collection of techniques asynchronously in multiple threads. If you can get both going you may see big improvements in speed which will in turn compensate for the additional complexity of the technique choices.

SoraK05

Thanks :)
I am optimizing for having a definitively functioning algorithm in general which should be able to be done in C.
From the overview it should function effectively where applicable, and be able to be transformed to a language.

Where it can be converted once written in a higher level language to assembly for its generally few basic algorithms, that should speed things up in general (and reduce bloat and reduce total steps).

Gunther

What about a formulation of the algorithm as pseudo code? That leaves all options open. We're then not fixed. It could be C (very portable), C++ or BASIC. After that's done, we can figure out the bottle necks by profiling. And now and not earlier is the showdown or the assembly language hour.

Gunther
You have to know the facts before you can distort them.

SoraK05

Check page 33 for the general summary with tweaks and adjustment knobs for an idea/overview of the expected algorithm. :)

Gunther

Quote from: SoraK05 on June 13, 2014, 02:16:37 AM
Check page 33 for the general summary with tweaks and adjustment knobs for an idea/overview of the expected algorithm. :)

Okay, I'll do it.

Gunther
You have to know the facts before you can distort them.

SoraK05

General infos is there :)
Besides a system to do things in chunks to perhaps be more direct in writing a file which is already done for the most part in DEFLATE, where some additional options can be done for the string detection, it should be functioning.
It may not have specific details for some options but should generally provide info to accompany what is available in descriptions.


It will require temp space for a temp offset mapping file, and enough time to perhaps sort it and calculate what is required.
The purpose is to be more comprehensive as a tool than general purpose tools like 7z/rar, having different settings for custom compression, and should generally have a lower result in very decent speed depending on the settings. It is also meant to allow for a more thorough compression for something long-term and general media where quick access is possible. (The BMP example in the previous zip and mentioned shows how these general tools are quite cheap for time, and that an 'image only' tool <PNG> is irrelevant in nature to data as 0/1 where other 'general' tools do a smaller result on the same data).

Therefore some extra space and some time to do the compression, and decent time is acceptable. This is also including versatile settings for a comprehensive compression, and should be done in decent speed even for general purpose settings compared to other tools like 7z/rar with smaller results and using asm (where the others are not).
The bootable scheduled option where it can do a strict process is a bonus for the higher compression options to do larger files such as lossless video / audio / assets that can extract/decode quick.

Anyone interested in assisting with an algorithm is appreciated :)

K_F

Coming in here a bit late.

The OP mentions a stream of 0/1's - have you thought of RLE (Run Length Encoding). It's a very simple and fast method.
IIRC this is probably the only method that will give a good lossless compression - if the data is in a 'decent' format.
On average I think you might be lucky to get a 20-30% compression.

It's generally used as the final stage of compression methods using DCT's - the DCT itself being a bit lossy.
8)
'Sire, Sire!... the peasants are Revolting !!!'
'Yes, they are.. aren't they....'

SoraK05

This is looking at strings in particular, similar to other encoders, but instead of 8 bit + frequency this is a lot more flexible and dynamic.
I've also been looking at compression on another forum with some other programmers of tools and also involved with the hutter prize.

This does generally have media in mind and being able to stream while still being compact.
I am aware there is like a general spectrum of data that is like 50/50 with prevalence and without, and depending on the strength of the encode it can capture the prevalent data and generally store it in a non prevalent result. The algorithm in my description should generally capture what can be from strings, and although more can be done to reduce the file itself (like looking at specific repeating data in offsets and applying a layer to accommodate this) it will not be as streamable.
There is a method looking at strings, prevalence in gaps (counting gaps of 0/1 depending and using this), I know of nCr being useful for smaller results where there is an overwhelming prevalence, adding methods like 2, then 3 etc (to be somewhat recursive) and so on.

I am aware multiple methods are able to capture prevalence more effectively, but in general this tool should be a very effective string detection tool and be able to shrink common data a lot more than common tools like 7z and so on while also being useful on media too where chunks are applied and be streamable (and scalable). For a string method and being streamable it should be very compact and being a lot more specific to a file than common tools/encoders (designed with streaming in mind).
It could be a much more useful and compacted algorithm which can scale for common compression uses :) Archives can be a lot smaller as well as even video/audio depending while still streaming in decent time if using some quick asm.