=============================================================================
  CodePressor 1.03-JL            BINARY AND SOURCE CODE-stable DISTRIBUTION
=============================================================================
                                         (c) 2000-2001 Eli-Jean R. Leyssens
                                             Base algorithm by Tony Haines!


  This program will take either an Absolute or Untyped file and compress it
using a compression method targeted especially at ARM instruction code. The
compression ratios of CodePressor are pretty good although for larger
programs you can find better compressors that use algorithms like LZH. For
sub 16K programs though it will be hard to beat, which is not suprising
considering the first version of the compression method was developed during
(and for) CodeCraft#2: the coding competition with emphasis on small sized
programs, 1K, 2K and 8K.

  The first version of the compression method on which almost every method in
CodePressor is based was thought up and programmed by Tony Haines. He used it
to compress his 8K CodeCraft#2 entry called Flu. When I asked him whether
he'd mind if I turned the algorithm into a 2K entry for CC#2 he said he'd be
enjoyed if I did, so I set out to create the program which we now call
CodePressor, as suggested by Tony.

  That first version of CodePressor was written completely in ARM assembly
which was making it too hard to try out new things or add different methods
to CodePressor. That is, the original CodePressor really only had one
compression method whereas the new CodePressor has multiple ones, although
they're almost all based on Tony Haines' compresson algorithm. At the moment
though almost any program >= 2K is best compressed using method 0 with all
other methods performing competitively for < 2K programs.


Installing CodePressor
~~~~~~~~~~~~~~~~~~~~~~
  
  Installation is just a matter of copying the CodePressr executable to your
Library directory. This is usually located on your boot (hard)disc as
$.!Boot.Library

  You should then be able to run the CodePressr program from the commandline.
Either press F12 to enter the commandline or CTRL-F12 to open a taskwindow.


  -=-=-=-=-= Using CodePressor =-=-=-=-=-

  Run CodePressor either directly from the commandline or from an Obey file.
  
  
  Syntax: CodePressr [options] <inputfile> <outputfile> [<method>]
 
  Note that the executable name is CodePressr, without the 'o'. If you have
RO 4 or a filingsystem that allows long filenames you could of course rename
it to CodePressor :)
  
  The supported options are:
  
    -h          Help, display the program's help text
    -v          Version, display version information
    -q          Quiet, suppress informational messages (not errors)
  
  Method: this version of CodePressor supports 9 methods, numbered 0, 1, 2,
3, 4, 5, 6, 7 and 8. You can specify ONE of these methods to force
CodePressor to use that method. If you do not specify a method then
CodePressor will first try out all methods and then use the best one to
actually compress your program with.

  All methods, except number 3, can compress Untyped files with any load
address as well as Absolute files. Method 3 can only compress Absolute files
or Untyped files with a load address of 0x8000.


Algorithm
~~~~~~~~~
  
  The following description has been copied from the ReadMe of the original
CodePressor, the one I wrote for CodeCraft#2. Currently only method 0 still
works "exactly" like this. The other methods have other maximum distances and
some work on bytes rather than nibbles etc.
  
  So, how does it work? Well, the basic algorithm is to for each word look
back in a range of 256 words and find the word which has the most identical
nibbles. Identical to the nibbles in the current word of course.

  You can then store a byte, indicating which word had the most identical
nibbles, called distance, and another byte of which each bit indicates
whether the matching nibble is identical or not. After that you only store
the non-matching nibbles.

  So, if the best matching word has 6 identical nibbles, then that could be
stored as 8 + 8 + ( 8 - 2) * 4 bits = 16 + 8 = 24, instead of 32.
  If not enough nibbles match, then you store one byte indicating an
uncompressed word followed by the 8 nibbles making up that word.

  As you might have figured out by now the above only generates real
compression for words whose best matching word has at least 5 identical
nibbles. Still, this results in a good overall compression as a lot of "best"
matches have 4, 5, 6, 7 or even 8 identical nibbles.
 
  However, on closer inspection of all the data that was being generated by
this algorithm I discovered a few things:

  - a lot of best matching words have a distance of 16 (words) or less
  - in lots of cases the identical nibbles included the 4 top nibbles
  
  So, instead of always using 8 bits to store the distance and nibbles-mask I
now encode both the distance and nibbles-mask as follows:

  %0000 0000 - %0000 1111    ->   %0 0000 - %0 1111
  %xxxx 0000 - %xxxx 1111    ->   %1 xxxx 0000 - %1 xxxx 1111
  
  Where xxxx <> 0. So, if only the bottom 4 bits are set then a 0 followed by
the lower 4 bits are stored, otherwise a 1 bit, followed by all 8 bits.

  This gave the compression ratio an extra boost :)

  Furthermore, the algorithm really only works well on code. So, if there are
one or more data blocks in the program that you wish to compress then it's
very likely that most words in those blocks can not be compressed by the
algorithm and thus must be stored as "uncompressed words". However, as I
already said, and uncompressed word is stored by first storing a byte
indicating an uncompressed word and then the uncompressed word. If there are
lots of uncompressable words then this marker per word would seriously
increase the resulting file.

  So, I added code to detect runs, or sequences, of uncompressable words. It
then stored a marker, followed by a count. After that all the uncompressed
without any further markers follow.


Credits
~~~~~~~
 
  Thanks to Tony Haines for his brilliant algorithm, letting me improve it
and turn it into a 2K entry for CC#2 and being very supportive during the
CC#2 competition in general.

  Most of the credit for method 3 should go to Alain Brobecker and Frederic
Elisei. It is largely based upon their excellent NPK program. In some cases
NPK version 3 outcompresses CodePressor so you'd be wise to check out that
program as well.

  Thanks to Tony Haines, Alain Brobecker and Frederic Elisei in supporting
the development of the new version of CodePressor with multiple methods.


Contact
~~~~~~~

  "-JL" releases by Jeffrey Lee, http://www.phlamethrower.co.uk

  If you want to contact Eli-Jean or want to know more about Topix or Topix
productions:

    Pervect :     pervect@topixweb.com

    TopixWEB:     http://www.topixweb.com


  Tony Haines can be reached at: a.s.haines@bham.ac.uk
