/*****************************************************************/
This program is freely available for academic use, without any  warranty.  Commercial distribution of this program, in whole or in part, requires prior agreement with the authors. 


/*****************************************************************/
RAIphy is a semi-supervised metagenomic fragment classification program. It utilizes the genome signatures to characterize the DNA sequences and taxonomic classification is based on an information theoretic measure referred as Relative Abundance Index (RAI).

A DNA sequence of unknown source is classified and taxonomically labeled based on the phylogenetic profiles of the previously sequenced genomes. The profiles are iteratively updated using the unknown DNA sequences and the classification results. After a few cycles, the metagenome is classified into operational taxonomic units.



/*****************************************************************/
INSTALL
---------------------------------------
RAIphy is written in C++, using the Qt 4.6 application framework (http://qt.nokia.com).  With the framework installed, the user can build the project with "NMake" or "MinGW32-make" on Windows or "make" on Mac OS.  The makefiles distributed with the project source code have been generated by Qt's "qmake" tool. 

For more information about installing the Qt framework, please visit "http://doc.qt.nokia.com/4.6/installation.html".  

For our users' convenience, we have two pre-built executables available on our website (http://bioinfo.unl.edu/raiphy.php).  Both Windows and Mac versions are currently available.

/*****************************************************************/

USAGE
DATABASE:
---------------------------------------
RAIphy is a semi-supervised algorithm which requires initial models of phylogenetically identified DNA sequences. Those models are stored in disk as database files (e.g. NCBIRefSeq.db). Before classifying an unknown metagenome, the user must load a database file. By default, a database file containing the currently sequenced and annotated prokaryote genomes of NCBI RefSeq database named as "NCBIRefSeq.db" is contained in the working directory. However, RAIphy allows the user to create other custom database files for specific applications.

 - To load an existing database file:
   click "File -> Set Database File" (shortcut Ctrl+D) and select the preffered *.db file

 - To create a database:
   click "Tools -> Create Database File" (shortcut Ctrl+N) and select the FASTA files to be included in the database. 
   Then select a new database file to store the database in.

 - To add new taxa to the loaded database:
   click "Tools -> Add Items to Database" and select the FASTA files to be included in the database. 


CLASSIFYING METAGENOMES:
--------------------------------------
RAIphy accepts the environmental DNA samples in FASTA format. The input DNA fragments can be contained in single or multiple files. The classification results are printed in an output file. The classification can be employed by using the actual RAI profiles without refinement (supervised mode) or by updating the RAI profiles with iterative refinement (semi-supervised mode).

 - To classify a sequence without iterative refinement: 
   click "File -> Classify DNA fragments" (shortcut Ctrl+O) and select the FASTA file(s) containing the environmental samples. Then select an output file.

 - To classify a sequence with iterative refinement:
   click "File -> Classify DNA fragments with iterative refinement" and select the FASTA file(s) containing the    environmental samples. Then select an output file.
 

OUTPUT FORMAT:
--------------------------------------
For each DNA fragment, two lines are printed in the output file in order: first line contains the header for the input sequence, and the second line contains the predicted taxonomic labels for that sequence. The format template is:

>SEQUENCE_1
species|phylum|class|order|family|genus
>SEQUENCE_2
species|phylum|class|order|family|genus
:
:


Note: the taxonomic information is gathered from NCBI Taxonomy Browser and applies when the default database (NCBIRefSeq.db) is used. For custom databases created by the user, the second lines contain the header of FASTA files those which the database is built on.


/*****************************************************************/

