CODONOME_Guide

CODONOME collects the expression value of each codon (just called "codonome") and of each aminoacyl-tRNA synthetase (aaRS). To do this, the software is able to count the total mRNA codon number of any organism and to import and integrate any mRNA expression data source in tabulated text format.

This guide is designed to give detailed documentation about CODONOME software.

It shows how to install the software, how to count mRNA codons and how to import expression to study the codonome.

Download CODONOME for Mac OS X or for Windows from the following address:
http://apollo11.isto.unibo.it/software/

The software minimum requirements are:
Mac OS X 10.4.11 for PowerPC G4, G5 or Intel processors;
Windows XP Professional, Home Edition (Service Pack 3);
Windows Vista Ultimate, Business, Home (Service Pack 1);
Windows 7.

CODONOME is based on FileMaker Pro 10 (FileMaker Pro, Inc.)
database management software (www.filemaker.com/index.html),
and is released as a FileMaker Pro 10 template, along with a runtime application able to run "FileMaker Pro" at the core of the software.
The runtime is freely distributed, in compliance with the license of "FileMaker Pro 10 Advanced" developer package that was used to create the program.

Standard database commands (Find, Sort, Export records) are available within each layout of CODONOME (see 'GENERAL DEFINITIONS' and 'MENU AND COMMANDS' sections in this Guide).

Once decompressed, CODONOME is ready to be used.

Please do not change the name of any files and folders of the CODONOME software.

You may download multiple copies of CODONOME and run them simultaneously, provided that each "CODONOME" folder is located in a separate directory.

1 Creation of a local RefSeq entries database using RefSeq_parser database table
(Back to Index)

CODONOME software is designed to parse RefSeq entries to create a RefSeq database and to calculate the transcriptome (total mRNA) codon number of an organism. By collecting these data with expression values data you can know how many codons are actually used by one cell of that tissue.
Moreover you can study the consequences of that putative preferential use in the aminoacyl-tRNA synthetases usage.

1.1 Downloading and editing the RefSeq database

Download the RefSeq text file of the desired species at:
ftp://ftp.ncbi.nih.gov/refseq/

(choose "mRNA_Prot" folder, download "gbff.gz" format file, decompress it as usual).

Then edit the downloaded file using the Unix commands "tr" and "awk".
The file must be placed in the same directory from which the commands are launched.
These commands are also included in Mac OS X and in most Unix-like systems, e.g. Linux.
Editing is performed using this instruction:

tr -ds "\n" "[:space:]" < gbff.txt | awk '{gsub ("//LOCUS", "\rLOCUS"); print $0;}' | tr -d "//\n" > out.txt

where "gbff.txt" should actually be the name of the downloaded RefSeq file, and "out.txt" the name of the edited file produced as the output.

Open the CODONOME software and switch to the RefSeq_Parser table.
(To switch among different database tables, use the "Layout" menu at the upper left corner).
Choose the command "Import records" from the "File" menu.
Select the file to be imported choosing: "Tab-separated text" from the "Show" pop-up menu.

The software calculates and extracts this information in specific calculated fields:

FIELD DESCRIPTION

"FASTA":                    the entry in FASTA format,
                            including accession number and
                            mRNA sequence;

"LOCUS":                    the entry accession number;

"bp":                       the length of the entry sequence
                            (in bp);

"CDS_start":                the position of the
                            entry-recorded translational
                            start codon;

"CDS_start_Prokaryotes":    the position of the
                            entry-recorded translational                                 start codon (if the investigated
                            organism is a Prokaryotes);

"CDS_end":                  the position of the
                            entry-recorded translational
                            codon just before the end codon;

"CDS_end_Prokaryotes":      the position of the
                            entry-recorded translational
                            codon just before the end codon
                            (if the investigated organism is
                            a Prokaryotes);

"UTR5'_length":             the length of the mRNA 5' UTR
                            sequence;

"Seq":                    the mRNA sequence;

"Seq_UTR5'":                the mRNA 5' UTR sequence;

"CDS":                      the CDS sequence;

"CDS_Prokaryotes":          the CDS sequence (if the
                            investigated organism is a
                            Prokaryotes);

"Complement":               if the investigated organism is a
                            Prokaryotes shows "YES" if the
                            gene is in complement (otherwise
                            shows "NO");

"SYMBOLUM":                 the gene symbol;

"SYMBOLUM_Prokaryotes":     the gene symbol (if the
                            investigated organism is a
                            Prokaryotes).

AAA_N, ACC_N fields are used in the next step.

The aim of this step is to obtain a RefSeq mRNA codon count. To do this, click on the 'Codon count' button (or 'Prokaryotes codon count' if your chosen organism is a Prokaryiotes).
First, the software deletes all but the "NM" entries (so the coding sequence); then it proceeds with the codon count until you will see the codon numer in AAA_N, AAC_N, etc. fields.

The software automatically enters into the Codonome table the gene symbol, the codon count of each mRNA and calculates the codon count sum of the whole transcriptome and the per mil frequency of each codon.
(See the 2.1 section for the fields description of the Codonome table).

2 Importing the expression values data
(Back to Index)

CODONOME software is optimized to parse expression data, integrating them with the codon count to give the expression value of each codon of both each mRNA and the whole transcriptome.

First, you need an expression data text file with two columns separated by the tabulator keyboard button (TAB, ACII19): Gene symbol and Expression value.

[Columns Headers are not required]

[Gene symbol] [Value]
PRY         119.8678872124652088
DRP2         48.5523996241932508
SERBP1        47.4984303755452469
...            ...

For example, first you may do your analysis with TRAM and then export the required informations in a tabulated text file.

Name the file as "expression.tab" and put it into the CODONOME folder.

2.1 Obtainment of codons expression values (codonome)

In the Codonome table, CODONOME software gives the expression value of each codon, called "codonome" in the following steps.

From the Codonome table choose the 'Import expression values' button.
The software loads the expression value at the corresponding mRNA; then it automatically calculates the codonome count of each mRNA, the codonome count sum of the whole transcriptome and the per mil frequency of each codonome.

The fields in Codonome table are:

FIELD                    DESCRIPTION

"Gene_Symbol":             the gene symbol;

"AAA_N":                   the each codon count of each mRNA;

"Expression":              the expression value of each mRNA;

"AAA_N_E":                 the count of each codon of each
                           mRNA multiplied by the expression
                           value (the codonome count of each
                           mRNA);

"Expression_Random":       another randomly chosen mRNA
                           expression value;

"Random_Number":           a random number from 0 to 1 used
                           instead of the expression value;

"Random_Number_Int":       a random number from 1 to 10^4
                           used instead of the expression
                           value;

"Codon_Bias_Sum*":       the sum of each codon of the whole
                           transcriptome;

"Tot_Codon_Bias_Sum*":     the sum of every codon of the
                           whole transcriptome;

"Codon_Frequency*":        the per mil frequency of each
                           codon;

"Media*":                  the medium value of each codon;

"Standard_Dev*":       the standard deviation value of
                           each codon;

"Codonome_Bias_Sum*":      the sum of each codonome of the
                           whole transcriptome;

"Tot_Codonome_Bias_Sum*": the sum of all transcriptome
                           codonomes;

"Codonome_Frequency*":     the per mil frequency of each
                           codonome.

*: at the bottom of the page

2.2 Collecting by aminoacyl-tRNA synthetase expression values

The Codon_Synthetases table is preset for Homo sapiens, Danio rerio, Caenorhabditis elegans, Saccharomyces cerevisiae and Escherichia coli and contains the following fields:

FIELD           DESCRIPTION

"Amino acid":   the three letter amino acid symbol;

"Codon":        the amino acid coding codon;

"aaRS_HUMAN":   the aminoacyl-tRNA synthetase gene symbol of
                Homo sapiens
                (for the stop codons the field is empty);

"aaRS_BRARE":   the aminoacyl-tRNA synthetase gene symbol of
                Danio rerio;
                (for the stop codons the field is empty);

"aaRS_CAEEL":   the aminoacyl-tRNA synthetase gene symbol of
                Caenorhabditis elegans;
                (for the stop codons the field is empty);

"aaRS_YEAST":   the aminoacyl-tRNA synthetase gene symbol of
                Saccharomyces cerevisiae;
                (for the stop codons the field is empty);

"aaRS_ECOLI":   the aminoacyl-tRNA synthetase gene symbol of
                Escherichia coli;
                (for the stop codons the field is empty);

"aaRS_Other":   if your organism has different gene symbols,
                you can type them here.
                (for the stop codons the field is empty).

From bacteria to Homo sapiens, the tetrameric subunit organization of cytoplasmic phenylalanyl-tRNA synthetase is markedly conserved: in each organism there are two different gene that codify, one for each of the two subunits (alpha and beta).
In the Codon_Synthetases table (and also in Synthetases table, see the 2.3 section), to simplify, we use the incomplete gene symbol (e.g. FARS instead of FARSA or FARSB for Homo sapiens).

Then choose the 'Group' button from the Codon_Synthetases table. Write the organism's Latin name (e.g.: "Homo sapiens") in the windows that appears.

CODONOME software collects the codon and the codonome per mil frequencies by aminoacyl-tRNA synthetase and groups them per synthetases and switches to the Synthetases table.

(See the 2.1 section for the fields description of the Synthetases table).

2.3 Obtainment aminoacyl-tRNA synthetases expression values

CODONOME software also gives the expression value of each aminoacyl-tRNA synthetase.

Before importing expression values, you should search for the phenylalanyl-tRNA synthetase record and duplicate it (command "Duplicate record" from the "Record" menu). In these two records, you should replace the gene symbols with the alpha subunit gene symbol in the first record and with the beta subunit gene symbol in the duplicated record (e.g. FARSA and FARSB instead of FARS in Homo sapiens). Please make no further modifications to the other fields.

You can import your expression value data file (the expression.tab used at the 2.1 section) in the Synthetases table choosing the 'Import expression values' button.
The software automatically loads the expression value at the corresponding aminoacyl-tRNA synthetase.

The fields in Synthetases table are:

FIELD             DESCRIPTION

"Synthetase":     the aminoacyl-tRNA synthetase gene symbol;

"Codon_Sum":     the codon per mil frequencies sum of each
                  aminoacyl-tRNA synthetase;

"Codonome_Sum":   the codonome per mil frequencies sum of
                each aminoacyl-tRNA synthetase;

"Expression":     the expression value of each aminoacyl-tRNA
                  synthetase.

For further statistical analysis, choosing the 'Export data' button (in Synthetases table), you can automatically export in a file named "biascodonome.tab" Codon_Frequency and Codonome_Frequency fields (for each codon) and in a file named "biascodonomepersynthetase.tab" all the fields in the Synthetases table.
These text files will appear in the CODONOME folder.

A set of records referring to the same subject type (e.g., the 'Genes' table).

One set of fields which represent one entry (i.e. containing all requested data for a subject, e.g. a gene probe).
The record browser is a small book icon at the top left of the window. You may also browse the records faster using the cursor at the right of the small book icon.

A particular graphical organization of the field of a table.
A table can be visualized into more than one layout.
A layout may display fields from a table or its related fields from other tables.
A file may show data within different layouts.
Visualization of a field is independent from the storage of the contained data.

Browse among the layouts can be made clicking on the 'Layout:' pop-up Menu at the upper left corner.

You may browse the database by clicking on the small book pages at the top left of the window, or
using the cursor at the right of the small book icon, or by
entering a record number and clicking on the "Return" key.
The following information is constantly displayed in the window top bar (if not, select "Status Toolbar" from the "View" Menu):
Records: total number of Records in the table.
Found: total number of the subset of Records currently selected. Clicking on the green circular button will retrieve the complementary subset of currently omitted records.
Sorted: sorting status of the Records (Sorted/Unsorted).

The FileMaker Pro-based database may be used basically in these "modes":
'Browse', 'Find', and 'Preview'.
Switching among different modes can be obtained from the 'View' Menu or from the pop-up Menu bar at the bottom left of the window.

It allows entry, view, browse, sort, and manipulation of data.
It may be selected from:
the 'View' menu, or
the mode pop-up Menu bar, at the bottom left of the window.

In the 'Browse' mode, the record sets can be browsed by clicking on the small book icon (with the arrows to move 'back' and 'forward') in the upper left corner.

Browsing among the tables can be done by clicking on the 'Layout' pop-up Menu at the upper left corner.

An alternative mode to use the database.
It allows searching for specific content in the database fields, using any different combination of criteria
(see the 'Search mode' section below for more details).
It may be selected from:
the 'View' menu, or
the mode pop-up Menu bar, at the bottom left of the window.

The user can fill a blank form allowing to search in specific fields.

In the "Find" mode, the small book icon in the upper left corner represents different "requests" that are made for searching the database.

In FileMaker Pro 'Find' mode, the "AND" - "OR" - "NOT" operators may be implemented in this way:

"AND" by filling criteria in different fields
      located in the same "Request",

"OR" by generating additional requests
      (from "Requests" Menu) in the same query,

"NOT" by generating additional requests
(from "Requests" Menu) and clicking on the "Omit"
      button (located in the window top bar).

The 'Operators' pop-up Menu appears clicking on a field while pressing the 'ctrl' key, allowing query of:
exact matches, duplicate values, ranges, wild cards and more.

Click on the 'Perform Find' button at the top of the window to start the query.

The result of the search is the subset of the entries matching the set search criteria.

An alternative way to use the database.
It visualizes a print preview of the found records.
It may be selected from:
the "View" menu,
or the pop-up Menu bar, at the bottom left of the window.

In the "Preview" mode, the user can obtain a print preview of the data in the current table.
Browsing among the tables can be done by clicking on the 'Layout:' pop-up Menu at the upper left corner.

About FileMaker Pro Runtime...
Information about FileMaker Pro Runtime at the core of the software.

Preferences...
Standard preferences panel; cache memory size can be set up to 256 Mb.

Hide TRAM
Hiding all TRAM windows.

Quit TRAM
Closing the program.

File Options...
It is possible to set only the "Spelling" options.

Change Password...
There is no default password set.

Page setup...
Standard page set up command.

Print...
Standard print command.
The appearance will match the layout currently displayed on the screen.

Import Records
This is the general "Import" function of FileMaker Pro.

Export Records...

Export command for the found records set in a given table.
Records are exported in their current sorting mode.
User can select fields to be exported, their relative order,
and the separation character.

Save a Copy as...

Save a copy of the database, complete, compressed or as a clone (database structure with no record present).

Browse Mode
Switch to the 'Browse Mode' (see "General Definitions" above).

Find Mode
Switch to the 'Find Mode' (see "General Definitions" above).

Preview Mode
Switch to the 'Preview Mode' (see "General Definitions" above).

Go to layout
A possible way to switch between different layouts.

View as Form
A possible way to individually display the current record of a found set of records.

View as List
A possible way to display all the records of a found set in the form of a list.

View as Table
A possible way to display all the records of a found set in the form of a spreadsheet-like table.

Toolbars
To switch on/off the toolbars of the application: "Standard"
and "Text Formatting".

Status Area
To switch on/off the "Status Area", the toolbar located at the top of the program window.

Text Ruler
To switch on/off the text ruler of the application.

Zoom in
Used to increase layout dimensions.

Zoom out
Used to decrease layout dimensions.

6.5 'Records' Menu
(Back to Index)

New Record
Creating a new empty record in the database.
The new Record will be the latest of the current record set.

Duplicate Record
Duplicating the current record in the database.
The new Record will be the latest of the current record set.

Delete Record...
Deleting the current record in the database.

Delete Found Records...
Deleting all currently found records in the database.

Go to Record
Moving to the selected record by number, previous or next.

Show All Records
Showing all the records in the database.

Show Omitted Only
Showing all the records in the database not included in the current 'found' set.

Omit Record
Removing the selected record out of the current found set, without deleting it.

Omit Multiple...
Removing more than a record, selected by numbers, out of the current found set, without deleting them.

Modify Last Find
Returning to the last performed search in order to edit it.

Saved Finds
Saving a set of search criteria.

Sort Records...
Sorting the current records set according to desired criteria.

Unsort
Display the current records set according to the order of creation of each record.

Replace Field Contents
Replace the value of a field into all found set of record with the value specified in the current record, or by calculation.

Relookup Field Contents...
This command executes a relook up of the value of a field by reading the matched value in a related table (the relationship has been established during database development using a 'key' field).

Revert Record...
Restoring the value of a field, discarding any change, before clicking out of that field.

6.6 'Scripts' Menu
(Back to Index)

About

This opens the 'About' window containing information about the TRAM software.

Guide
The page with the user Guide of the TRAM software (this Guide).

6.7 'Help' Menu
(Back to Index)

Search
Search a system 'Help' for the general commands.

TROUBLESHOOTING (Back to Index)

Sometimes, power failure, hardware problems, or other factors can damage a FileMaker Pro database file.
When the runtime application discovers a damaged file, a dialog box appears, telling the users to contact the creator.
Even if the dialog box does not appear, files can exhibit erratic behaviour.
If you have FileMaker Pro or FileMaker Pro Advanced installed you can recover it using the 'Recover' command.
Otherwise, to recover a damaged file:
- On Mac OS X machines, press Command + Option (cmd-alt) while double-clicking the runtime application icon. Hold the keys down until you see the 'Open Damaged File' dialog box.
- On Windows machines, press Ctrl+Shift while double-clicking the runtime application icon. Hold the keys down until you see the Open Damaged File dialog box.
During the recovery process, the runtime application:
1. Creates a new file;

2. Renames any damaged file by adding “Old” to the end of the
file name;

3. Gives the repaired file the original name.

TECHNICAL NOTES
(Back to Index)
The software minimum requirements are:
Mac OS X 10.4.11 for PowerPC G4, G5 or Intel processors;
Windows XP Professional, Home Edition (Service Pack 3);
Windows Vista Ultimate, Business, Home (Service Pack 1);
Windows 7.
Other specifications may be found here.

The scripts at the core of TRAM software are "FileMaker Pro" scripts.

TRAM is composed of a 137 MB database engine ('TRAM') and of a template ('TRAM.TMA') with 37 data tables, with 117 relationships among them and 434 script definitions.
Following set up including NCBI UniGene and UCSC EST localization data, the size becomes 3.6, 2.0 GB and 742 MB for human, mouse and zebrafish 'TRAM.TMA' file, respectively.
Importing the 28 human microarray sample data file for the test of biological model raised 'TRAM.TMA' file size to 4.3 GB.

Time required to import and process a typical microarray data file is about 10 minutes.
Typical execution time is 1-2 hours for a 'Map' analysis and 5-10 minutes for a 'Cluster' analysis, depending on the number of analyzed samples, which also heavily affects the time required to refresh data when the type of data normalization is changed.

Large file size and relative slowness of data processing are mainly due to systematic indexing of all data contained in TRAM, with the advantage of very fast data browsing, navigation and search at the end of data import and processing, which may be run in batch mode.

We encourage any creative use, modification and noncommercial redistribution of TRAM, as long as the original paper is cited, and statement that the original program has been modified is provided (in such a case).

7.1 Software known limits
(Back to Index)

Due to FileMaker Pro limits:
maximum TRAM file size is 8 terabytes (1024 gigabytes);
text field can contain up to 2 GB of characters;
numbers field can contains values up to 800 digits.

Due to TRAM limits:
in order to generate consistent transcriptome maps, TRAM currently deletes all genes with ambiguous mapping, but this involves data loss for few genes that are biologically present in different locations, i.e. genes common to X and Y chromosomes (e.g., CSF2RA). We are working to fix this problem.

The limit of 25 chromosomes for a genome is declared only for the possibility to display synthetic maps with all chromosomes shown horizontally aligned; however, it does not apply to the data import, standard visualization mode and all data analysis.

7.2 Bugs report
(Back to Index)

Please report any suggestion, bugs or problems to:
Pierluigi Strippoli
pierluigi.strippoli@unibo.it

ACKNOWLEDGEMENTS
(Back to Index)

Thanks to NCBI for the "Entrez" databases and to UCSC Genome Bioinformatics for the "UCSC Genome Browser".
Thanks to FMPexperts List and FMForum for suggestion and tips about FileMaker Pro.