The MASM Forum

General => The Workshop => Topic started by: SoraK05 on June 09, 2014, 01:27:07 PM

Title: X86 ASM Programmer requested for compression algorithm
Post by: SoraK05 on June 09, 2014, 01:27:07 PM
Hello :)
I have been writing a general outline to a compression algorithm to work on files in the data stream of 0/1 bits, and it is generally 90% and can have a general minor tweak included.
Otherwise, it is not in a structure resembling computer pseudocode steps. It is a general outline and details.

The reason I am looking for an assembly programmer is that I am not familiar with asm (or a lot of c++ otherwise) and it works on looking to search a file for its bits to have variable bits detected rather than bytes which is a generally slower restriction using higher level languages, as well as allow for the most step-by-step precision considering it is to scan the file quite thoroughly and often to build up the required data, as well as be able to decode faster and to the variable bit precision.
For something to bit precision and requiring multiple scans, it would be ideal to use the shortest steps with asm for this as the strongest method will require some constrained minimal brute forcing type processing.


From expectations on general data (as well as a personally hand crafted BMP to use for general expectation) and comparisons to other tools, it looks to do things quite thoroughly and a lot more than common tools such as 7z/rar.


If someone is interested in looking at it, and I can also include the final tweaks, I would appreciate an x86 ASM tool, as well as a general algorithm :)
Title: Re: X86 ASM Programmer requested for compression algorithm
Post by: Gunther on June 09, 2014, 08:02:03 PM
Hi SoraK05,

I'm not sure what you're searching. We've some funny, bad experiences with super compression algorithms (http://www.masmforum.com/board/index.php?topic=13454.0). I hope you're serious.

If you want to learn assembly language, download the MASM32 package and go forward. If you're searching a programmer, try it with Rent a Coder.

Gunther
Title: Re: X86 ASM Programmer requested for compression algorithm
Post by: SoraK05 on June 09, 2014, 08:32:55 PM
I'm looking to contribute toward a more effective compression tool which ground up is programmed using less steps with asm.

7z/rar and other codecs in general are not very effective and compromise time/space.

I have a basic BMP example in a 7z with the outline where it is few colours with mostly repetition and should be able to fit into very few bytes yet PNG/RAR/7z result in sizes of KBs.

Compression in general in the spectrum involves repetition which can be captured where one version and its representations will be less space overall and that a program can restore it.
In general, in the data stream, 50% is 'compressible' and 50% is 'uncompressible' <generally resembling the coded results>, and the process can be implied as a conversion <where processing is rejected if the file size is determined to increase>.
Where repetition according to parameters is found there can be compression - this is in the 0/1 bit spectrum, and does not favour text/audio/image and so on.


Besides one general prepping process which can be considered, this is a more thorough attempt at finding repetition than the general loose approach of 7z/RAR and many other codecs, with asm to assist in speed.
Decoding should be very quick, with asm also, as once the file is constructed it is decoded with a code tree with prefixes which is straightforward.


I am looking at the notion of asm, yet first I would prefer looking at hardware and paths, opcode reasoning and so forth to be more flexible.
I am willing to discuss, and share the writeup with any interested asm programmers.
Title: Re: X86 ASM Programmer requested for compression algorithm
Post by: jj2007 on June 09, 2014, 08:46:10 PM
Hi,

Before investing heavily into a new algo, check if it has not been invented already. The bitstream thing you describe, for example, resembles strongly the Huffman coding process.

Check also if the final product will be able to compete with FreeArc.

And welcome to the forum :icon14:
Title: Re: X86 ASM Programmer requested for compression algorithm
Post by: SoraK05 on June 09, 2014, 08:55:00 PM
Hello, and thank you :)
The closest thing as far as I know is DEFLATE and from what I read up about the way DEFLATE operates this should produce smaller files.
It does take use a variable code tree, however it is not 'huffman' per se. :)

From the math involved, besides moving data in advance there isn't necessarily a lot more room to compress, and should be lower than many <including FreeArc>..
I can see in the FreeArc writeup how there are algorithms to work with text, exes, and multiple algorithms.

This should work on any file in general without being particular to it being an executable, image etc and have low results.
It is a standard algorithm that can be left automated for data, which can be adjusted to skimp a little.

EDIT:
Read the 100 to 1 Compressor :) I get that it is not possible in the data stream :)
I can also mention that I get an algorithm is ideal direct on lossless data rather than after being processed with some kind of algorithm.
The BMP mentioned with few colours and a lot of repetition can compress to few KBs with 7z/RAR/PNG and much more with JPG for example, and is the same data that can end up being few B.

It may be considered intensive and require temporary storage, however with asm and time, the result should be compacted and accessible (including any offset within providing some advance calculation to directly access, like data for multiple file offsets on a calculation run).
Title: Re: X86 ASM Programmer requested for compression algorithm
Post by: jj2007 on June 09, 2014, 09:02:36 PM
DEFLATE = LZW + Huffman

Check MaxComp (http://www.maximumcompression.com/data/summary_mf2.php#data) for competitors; sort by efficiency to get an idea what are the traps (some algos have good compression but take DAYS, not hours).

Why don't you code your algo in plain C and post the exe here? We are good at translating slow C/C++ loops into fast assembler loops.
Title: Re: X86 ASM Programmer requested for compression algorithm
Post by: SoraK05 on June 09, 2014, 09:05:58 PM
I have the overview already done for the most part.
Any programmer that is willing to transform it to what the computer can do, and more directly to asm is appreciated :)

It should generally provide the main details and algorithm function.
I'm not very fluent in computer language, with some basic c++ skills.

I am sure, and the writeup may demonstrate, this can have some low results with a focus on thorough scans.
Title: Re: X86 ASM Programmer requested for compression algorithm
Post by: jj2007 on June 09, 2014, 09:47:21 PM
Quote from: SoraK05 on June 09, 2014, 09:05:58 PM
I'm not very fluent in computer language, with some basic c++ skills.

Both BASIC and C++ are fine, but post it. Even pseudocode will be gladly accepted ;-)
Title: Re: X86 ASM Programmer requested for compression algorithm
Post by: nidud on June 09, 2014, 09:48:35 PM
deleted
Title: Re: X86 ASM Programmer requested for compression algorithm
Post by: SoraK05 on June 09, 2014, 09:54:54 PM
OK here.
The notion is for being thorough. There could be a tweak / optimization, which is welcome.

It will generally bracket specific locations which are more compressible, look for strings (including mirror/inverse) within those bracket areas to be more specific for distribution as well as distinguish compressible/uncompressible areas, patterns of strings and in a row which can be repeated to layer+encapsulate where applicable, so a file can have compressed+layered+raw (separated with 'gap length info' to distinguish) using prefixes.
Layered data prefixes when met will require to decode in temp space and then be appended for example.

The notion is also to do a constrained scan to determine the minimal number of strings for the code tree as well as the most effective, described as 'heaviest string'. This means any variable length pattern which covers the most and also considers size for the code tree (rather than a cheaper byte pattern + frequency e.g.).

A temporary file for offset mapping is used to construct the final file from its data.
Title: Re: X86 ASM Programmer requested for compression algorithm
Post by: dedndave on June 09, 2014, 10:39:53 PM
sounds somewhat similar to something i worked on years ago - in 16-bit code days
i used several compression methods, and scanned regions to see which method provided the best compression
i didn't move sections around, however - your method is different in that respect

it seemed like a sound idea
but, scanning regions and estimating compression ratios for several methods proved very slow
de-compression, on the other hand, was very fast

at that point, it's a matter of "compression philosophy", which is slowly changing
people are beginning to realize that it isn't about disk space (as it used to be), but more about bandwidth
why is that different ?
well - because the time it takes to achieve better compression might be worth it
it may take a while to compress a file, but you only have to do it once
then, it gets sent over the internet several times (or several thousand) and de-compressed

but, until you can convince users that slow compression is ok, you might be spinning your wheels   :(
and, you have to provide more than 2 or 3 percent improvement in compression ratio
Title: Re: X86 ASM Programmer requested for compression algorithm
Post by: nidud on June 09, 2014, 11:55:16 PM
deleted
Title: Re: X86 ASM Programmer requested for compression algorithm
Post by: SoraK05 on June 10, 2014, 12:07:50 AM
It doesn't move sections around.

It has 4 basic algorithms which run in sequence providing there is compression acknowledged:

1) Scan file for specific patterns to develop offset brackets of compressible areas, and use the heaviest of the filtered results
2) Scan for heaviest strings in each offset bracketed area to be specific to the file for distribution (by adjusting the code tree to suit the different areas than have a code tree for the data as a whole), as well as account for compressible/uncompressible areas (and this can include mirror/inverse for less entries overall after 3+4)
3) Scan for heaviest patterns of strings which are in code tree and offset bracketed areas (skipping raw) <i.e. in the temporary offset mapping file>
4) Scan for in a row <in the temporary offset mapping file>

I figure 'MODE 1' is 1+2 which will initially build the code tree and offsets to place them, and 'MODE 2' is 3+4 which will shrink data further and can be repeated to layer/encapsulate.

Decoding should be fast.

Considering it would be asm step-by-step for each part, it shouldn't be unreasonably long and the result is accessible and compact. I would opt to use something like this for video, assets in a game, archival data from it not being too long to encode where space is available (and is using least steps with asm) as well as decode quick (using asm).
I wouldn't generally use other archival tools beyond something to archive for immediate purposes, and this can also be adjusted for such a purpose and be much smaller than 7z/rar which is general purpose.


As for compression ratio, this is going for a 'thorough' approach, and wouldn't necessarily get much smaller using a string scanning method beyond moving data in advance to prep for MODE 1 which is an algorithm by itself.
Using asm is to use least steps for the speed.
Title: Re: X86 ASM Programmer requested for compression algorithm
Post by: SoraK05 on June 10, 2014, 09:47:44 PM
I have written a shortcut to do something similar to above and should still end up compressing quite low as well as take far less time than the constrained brute forcing of MODE 1 for areas + strings.

If anyone is interested I can write the description of an alternative MODE 1 to save a lot more time and still compress quite low (however not as accurately). MODE 2 in principle for duplicate combinations of patterns and in a row (as a number) remains.


EDIT:
This alternative should (still) be lower than 7z/rar and be done in decent time. :)
Title: Re: X86 ASM Programmer requested for compression algorithm
Post by: SoraK05 on June 13, 2014, 12:18:18 AM
Sorry for the triple post. Perhaps a mod can join posts :)

This is just to mention that I generally have a complete writeup done.
The idea behind this is to have a powerful and comprehensive compression tool that in its highest mode can just about squeeze what is available in a file.

I have all adjustments and tweaks to be able to change aspects of the algorithm for space/time.
All adjustments from fastest and weakest to slowest and highly compressed can be done for aspects of the algorithm, where all knobs can be adjusted to suit one's preference and with general details for what is going on and expected speed etc.


The idea is to have this tool also be able to work on multiple files appended and create offset data for each file for immediate decryption, and be able to schedule multiple files (whether together or as individual files).
When scheduled, one more separate tool is to be done which can allow for a bootable device like a usb to be able to boot when the pc starts and automatically perform the scheduled operation in a barebone environment without any video/sound/keyboard calculated and it is just doing this process. The point of this cold boot process is to be able to allow for more to the most thorough scheme to work in a direct environment with shortest asm steps.
When the process is done the PC can automatically shut down to imply completion.

Lossless audio and video can be very quickly decoded and with some minimal data to have for precalculated info for seconds for jumping to offsets.
It is like a one-off powerful tool with adjustment knobs which in general should be quick enough even in a Windows environment on some settings to be much smaller than 7z/rar and also be done in very decent time. There is also a very cheap method for quick encoding which is more ideal for a direct decoding.

The idea is a versatile tool which can perform quick with minimal asm steps to be used in general as a compression tool with quick decoding and also have more compact/thorough options that can do video/audio/asset data in a barebone boot environment, and also for more compact archival data.

The idea is that in general this compression tool can cater with its knob adjustments for expectations of compression and space/time, giving opportunity for a barebone processing possibility scheduled on a bootable device.


My only thing is I have minimal programming language knowledge. I have basic knowledge for accessing bits, general loops and functions etc in c++.

If anyone is interested with skills to be able to write some few generally short and basic algorithms which can be adjusted/tweaked for its knobs for a versatile and comprehensive compression tool written in minimal asm steps and with a scheduled bootable and direct processing feature, I can assist with a description and writeup.
Title: Re: X86 ASM Programmer requested for compression algorithm
Post by: hutch-- on June 13, 2014, 01:36:24 AM
What you will tend to find is that you try and manually optimise a collection of techniques like this after you have got the logic and algo choice right. Some make the mistake of thinking that assembler is going to make everything faster but unless you get it right you often end up with badly designed slow code. you will find at the design stage that a C compiler is more than fast enough and its only when you have the collection of techniques up to an advanced stage that manual optimisation becomes worthwhile.

There are a couple of areas that you can look at once you have the details right, instruction choice and design AND seeing if there is a way to run the collection of techniques asynchronously in multiple threads. If you can get both going you may see big improvements in speed which will in turn compensate for the additional complexity of the technique choices.
Title: Re: X86 ASM Programmer requested for compression algorithm
Post by: SoraK05 on June 13, 2014, 01:41:14 AM
Thanks :)
I am optimizing for having a definitively functioning algorithm in general which should be able to be done in C.
From the overview it should function effectively where applicable, and be able to be transformed to a language.

Where it can be converted once written in a higher level language to assembly for its generally few basic algorithms, that should speed things up in general (and reduce bloat and reduce total steps).
Title: Re: X86 ASM Programmer requested for compression algorithm
Post by: Gunther on June 13, 2014, 01:59:12 AM
What about a formulation of the algorithm as pseudo code? That leaves all options open. We're then not fixed. It could be C (very portable), C++ or BASIC. After that's done, we can figure out the bottle necks by profiling. And now and not earlier is the showdown or the assembly language hour.

Gunther
Title: Re: X86 ASM Programmer requested for compression algorithm
Post by: SoraK05 on June 13, 2014, 02:16:37 AM
Check page 33 for the general summary with tweaks and adjustment knobs for an idea/overview of the expected algorithm. :)
Title: Re: X86 ASM Programmer requested for compression algorithm
Post by: Gunther on June 13, 2014, 03:05:26 AM
Quote from: SoraK05 on June 13, 2014, 02:16:37 AM
Check page 33 for the general summary with tweaks and adjustment knobs for an idea/overview of the expected algorithm. :)

Okay, I'll do it.

Gunther
Title: Re: X86 ASM Programmer requested for compression algorithm
Post by: SoraK05 on June 14, 2014, 01:29:28 AM
General infos is there :)
Besides a system to do things in chunks to perhaps be more direct in writing a file which is already done for the most part in DEFLATE, where some additional options can be done for the string detection, it should be functioning.
It may not have specific details for some options but should generally provide info to accompany what is available in descriptions.


It will require temp space for a temp offset mapping file, and enough time to perhaps sort it and calculate what is required.
The purpose is to be more comprehensive as a tool than general purpose tools like 7z/rar, having different settings for custom compression, and should generally have a lower result in very decent speed depending on the settings. It is also meant to allow for a more thorough compression for something long-term and general media where quick access is possible. (The BMP example in the previous zip and mentioned shows how these general tools are quite cheap for time, and that an 'image only' tool <PNG> is irrelevant in nature to data as 0/1 where other 'general' tools do a smaller result on the same data).

Therefore some extra space and some time to do the compression, and decent time is acceptable. This is also including versatile settings for a comprehensive compression, and should be done in decent speed even for general purpose settings compared to other tools like 7z/rar with smaller results and using asm (where the others are not).
The bootable scheduled option where it can do a strict process is a bonus for the higher compression options to do larger files such as lossless video / audio / assets that can extract/decode quick.

Anyone interested in assisting with an algorithm is appreciated :)
Title: Re: X86 ASM Programmer requested for compression algorithm
Post by: K_F on August 07, 2014, 06:25:51 PM
Coming in here a bit late.

The OP mentions a stream of 0/1's - have you thought of RLE (Run Length Encoding). It's a very simple and fast method.
IIRC this is probably the only method that will give a good lossless compression - if the data is in a 'decent' format.
On average I think you might be lucky to get a 20-30% compression.

It's generally used as the final stage of compression methods using DCT's - the DCT itself being a bit lossy.
8)
Title: Re: X86 ASM Programmer requested for compression algorithm
Post by: SoraK05 on August 19, 2014, 04:31:50 AM
This is looking at strings in particular, similar to other encoders, but instead of 8 bit + frequency this is a lot more flexible and dynamic.
I've also been looking at compression on another forum with some other programmers of tools and also involved with the hutter prize.

This does generally have media in mind and being able to stream while still being compact.
I am aware there is like a general spectrum of data that is like 50/50 with prevalence and without, and depending on the strength of the encode it can capture the prevalent data and generally store it in a non prevalent result. The algorithm in my description should generally capture what can be from strings, and although more can be done to reduce the file itself (like looking at specific repeating data in offsets and applying a layer to accommodate this) it will not be as streamable.
There is a method looking at strings, prevalence in gaps (counting gaps of 0/1 depending and using this), I know of nCr being useful for smaller results where there is an overwhelming prevalence, adding methods like 2, then 3 etc (to be somewhat recursive) and so on.

I am aware multiple methods are able to capture prevalence more effectively, but in general this tool should be a very effective string detection tool and be able to shrink common data a lot more than common tools like 7z and so on while also being useful on media too where chunks are applied and be streamable (and scalable). For a string method and being streamable it should be very compact and being a lot more specific to a file than common tools/encoders (designed with streaming in mind).
It could be a much more useful and compacted algorithm which can scale for common compression uses :) Archives can be a lot smaller as well as even video/audio depending while still streaming in decent time if using some quick asm.