/*****************************************************************/
  GramDist: a multiple sequence relative distance program
  Version 1.00, 8/30/2012

  http://bioinfo.unl.edu/
  drussell2@unl.edu

  This program is freely available for academic use, without any 
  warranty.  Commercial distribution of this program, in whole or 
  in part, requires prior agreement with the authors. 
/*****************************************************************/

1. INSTALL

GramDist is written in ANSI-C, and so should build without error on any
platform with an ANSI-C compiler.  Use the following commands to build
the program.

cd src
make clean
make

At this point, the executable GramDist (on linux/unix/macosx) or
GramDist.exe (on Windows) will reside in the src directory as well.
Copy the executable to anywhere in your path.

/*****************************************************************/

2. USAGE

GramDist [options]

Options:

 -i <filename>

Specify the input file, which needs to be in FASTA format.  If this
option is not used, the default file name is "infile".  The type of
input sequence (nucleotide or amino acid) is determined by the
input file format command line option (-F).

 -o <filename>

Specify the output file name.  If this option is not used, the default
file name is "outfile".

 -F <value>

Specify the input file format.  A value of 0 will cause GramDist
to automatically detect if the input file contains Amino Acid
sequences.  The auto-detection is based on if a base other than
A, C, G, T, U, or X is part of the sequence.  Should any other character
appear in any of the input sequences, the program will align the
sequences as though the input file contains all amino acid sequences. 
A value of 1 will force the alignment to assume all sequences are either
DNA or RNA.  A value of 2 will force the alignment to assume all
sequences are amino acid sequences. If this option is not specified, 
the default is to automatically detect the input file type.

 -M <value>

Specify how to use the merged amino acid alphabet.  As discussed in
our paper on GramAlign, we developed a merged alphabet, whereby
certain amino acid characters were found to have similar row
scores within the substitution matrices.  We were able to reduce the
original 23 characters into a set of 11 characters.  This option
is particularly useful for the grammar-based distance calculation. 
A value of 0 will disable using the merged alphabet. A value of 1 
will use merged alphabet for the distance calculation. If this 
option is not specified, the default behavior uses the merged
alphabet. This option is ignored for nucleotide sequences.

 -P

Force the generation of a partial distance matrix with a time 
complexity on the order of NlogN. The default behavior will create a 
complete distance matrix which ensures the most accurate grammar-based
distance matrix, but requires a time complexity on the order of N^2.
In creating the partial distance matrix, one initial column is
completely filled in and divided into two clusters (one with the
smallest distances and the other with the largest distances).
Then each cluster is recursively calculated (i.e., each cluster
is calculated and divided into two clusters).  The underlying 
basis for this to work is the transitivity of grammars (i.e.,
if the initial sequence has a short relative grammar distance to two
other sequences, then those two sequences should likely have
a short relative grammar distance to each other).

 -q

Turn on "quiet mode", which will prevent any text from being displayed
by GramDist.  The default is verbose, in it will output various progress
during the alignment procedure.

/*****************************************************************/

3. CHANGES

Version 1.00 -- major release.

