TRAM
(Transcriptome Mapper) 1.0.1
User Guide - August, 2011
Mac OS X and Windows versions
INDEX
INTRODUCTION
INSTALLATION
SET
UP
1.
Importing data about chromosomes and genes
of
an organism
1.1.
Inserting
data about the chromosome number and length
(bp)
of an organism
1.2.
Importing
localization data for known genes
1.3.
Importing
localization data for UniGene clusters
2.
Importing gene identifiers conversion data tables
2.1.
Conversion
of Sequence accession numbers to Gene
Symbols
2.2.
Importing
gene probe
identifiers for a Platform
2.3.
Importing
Custom identifiers conversion table
USE
3.
Importing
the expression data files
4.
Analyzing
data
4.1.
Creating
transcriptome maps
4.2.
Searching
for clusters of over/under-expressed
contiguous
genes
GENERAL
DEFINITIONS
5.1
File
5.2
Table
5.3
Record
5.4
Field
5.5
Layout
5.6
Browse
Mode
5.7
Find
Mode
5.8
Preview
Mode
MENU
AND COMMANDS
6.1
TRAM
6.2
File
6.3
Edit
6.4
View
6.5
Records
6.6
Scripts
6.7
Help
TROUBLESHOOTING
TECHNICAL
NOTES
7.1
Software
known limits
7.2
Bugs
report
ACKNOWLEDGEMENTS
INTRODUCTION
(Back to Index)
TRAM
generates and analyzes
transcriptome maps. It is able to import and integrate any gene
expression
data source in tabulated text format, and to map expression
values to the relevant genomic region, providing statistical analysis
of over- or under-expressed regions compared to the whole genome
or to the relative chromosome.
This
guide is designed for detailed documentation of TRAM 1.0 software.
It
shows how to install the
software and how to import
expression data to create and analyze transcriptome maps.
Download TRAM 1.0 for Mac OS X
or for Windows from the following address:
http://apollo11.isto.unibo.it/software/
The software minimum requirements
are:
Mac OS X 10.4.11 for PowerPC G4, G5 or Intel processors;
Windows XP Professional, Home Edition (Service Pack 3);
Windows Vista Ultimate, Business, Home (Service Pack 1);
Windows 7.
If you are working on
human
expression gene values
download the file: TRAM_HUMAN.zip
If you are working on mouse
expression gene values
download the file: TRAM_MOUSE.zip
If you are working on zebrafish expression gene values
download the file: TRAM_BRARE.zip
For all other cases download the file: TRAM.zip
The downloaded file should be
automatically decompressed,
generating a "TRAM" folder.
Failing this, double click on the
file to activate the default decompression utility of your system.
The TRAM Folder contains:
"TRAM" (Macintosh) or "TRAM.exe"
(Windows) file
(the runtime
application);
"TRAM.TMA" (database file);
"Batch_Import_A" folder;
"Batch_Import_B" folder;
"Platform" folder;
"Results" folder;
"FMP
Acknowledgments.pdf" file;
"Extensions" folder, containing
a “Dictionaries” folder,
with the dictionary file
for supported languages;
(and
an “English” folder with 3 files, for Windows);
"Docs" folder,
containing a copy
of the documentation.
40 ".dll" files (for Windows).
TRAM is based on FileMaker Pro 10
(FileMaker Pro, Inc.)
database management software (www.filemaker.com/index.html),
and it is released as a FileMaker
Pro 10 template, along
with a runtime
application able to run
"FileMaker Pro" at the
core of the software.
The runtime is freely
distributed, in compliance with the license of "FileMaker Pro 10
Advanced"
developer package that was used to create the program.
Standard
database commands (Find, Sort, Export records) are available within
each layout of TRAM (see 'GENERAL DEFINITIONS'
and 'MENU AND COMMANDS' sections in this Guide).
INSTALLATION
(Back
to Index)
Once
decompressed, TRAM is ready to be used.
The software minimum
requirements
are:
Mac OS X 10.4.11 for PowerPC G4, G5 or Intel processors;
Windows XP Professional, Home Edition (Service Pack 3);
Windows Vista Ultimate, Business, Home (Service Pack 1);
Windows 7.
Please do not
change the name of all files and folders of the TRAM software.
You may download multiple copies
of TRAM and run them simultaneously, provided that each "TRAM" folder
is located in a different directory.
Simply use buttons to navigate in the different sections of the
software. The 'Back' button brings user to the last visited layout (and
not to all previously visited layouts). The 'Home' button brings user
to the main software screen, from which any layout may be reached.
The TRAM file in the TRAM_HUMAN (or
TRAM_MOUSE, TRAM_BRARE) species-specific
versions
is pre-loaded with the latest human (or mouse, or zebrafish,
respectively) data available in April, 2010 for genes,
chromosomes, UniGene cluster IDs and all related GenBank accession
numbers, ESTs.
In addition, the gene identifiers for common commercially available
array
Platforms
as deposited in Gene Expression Omnibus (GEO) are also available in
these pre-setup versions (see section 2.2 for details). Please, if you
use any other
gene
identifier type read the "Set up" chapter, section 2.2.
The number of the current NCBI Genome Build may be obtained from the
site:
http://www.ncbi.nlm.nih.gov/mapview/
by clicking on the organism of interest.
The corresponding genome assembly version used by UCSC Genome Browser
to produce EST localization data may be chosen from the "assembly" menu
in the "Table Browser" web page:
http://genome.ucsc.edu/cgi-bin/hgTables?
Entrez
NCBI UniGene
UCSC
EST
Gene
Genome Clusters
Localization
Build Release
(NCBI
Build)
HUMAN Jan.
2011 37.2
#228
(Dec. 2010) Feb. 2009 (37)
MOUSE Jan. 2011 37.1
#188
(Nov. 2010) Jul. 2007 (37)
BRARE
Jan. 2011 Zv8
#122
(Sep. 2010) Dec. 2008 (Zv8)
A set
up process is
required every time your experimental model
organism is different from human, mouse or zebrafish, for which
pre-setup
versions are provided. The set up process is described in the following
section.
SET
UP
(Back
to Index)
While
TRAM_HUMAN.zip, TRAM_MOUSE.zip and TRAM_BRARE.zip files contain
pre-setup versions ready to analyze expression data from human, mouse
and zebrafish organisms, you may also download an empty TRAM template
that may be prepared for the analysis of data from any organism.
Pre-setup versions may be
directly used to import and analyze
expression data without performing the 'Set up' process. However, the
user could need to perform the 'Set up' section 2.2 to load additional
Platform schemes if necessary to interpret the gene identifiers listed
in his expression data file (see below).
The empty TRAM template must instead be always prepared by performing
the 'Set up' process from the beginning.
Download the "TRAM.zip" file from:
http://apollo11.isto.unibo.it/software/TRAM/
Following decompression of the
"TRAM.zip" file, open
the "TRAM" file contained in the
"TRAM" folder.
In
the 'Main' window click 'Set Up', this will change to the
'Set Up' layout which contains the
first main choices.
Set up is composed of two main parts.
1)
Organism-specific Genomic Data
Guided feeding of the software
with data about chromosomes and genes of the genome of your interest.
2)
Gene Identifiers conversion tables
Guided feeding of the software
with conversion tables; this allows the conversion of each gene
identifier
used in
the expression data file to the corresponding gene name.
Note: TRAM, as all the FileMaker-like databases, automatically saves any
change, so you
will not find any 'save' option at the end of the import processes.
After
the import processes, any manual data change will cause the loss of
the originally imported data.
Set Up - Step definition
Step |
Type |
Execution
Order
[time to be completed] |
Needed to...
|
If skipped... |
First part
GENES AND
GENOME DATA
|
|
|
Collect
chromosomes and genes data |
|
I. (Section 1.1)
Importing chromosome data |
Needed |
1st
[minutes] |
Define number and length
of chromosomes
|
The software does not work |
II.
(Section 1.2)
Importing gene
data |
Needed |
2nd
[hours] |
Define genomic coordinates for known
genes
|
The software does not work |
IV.
(Section 1.3)
Importing EST localization data
(if available in UCSC Browser)
|
Optional |
After
III. (UniGene
identifiers,
required) [hours] |
Define genomic coordinates for unknown
genes, i.e. EST (Expression Sequence Tags) Clusters
|
Results can be based only on
"known" genes |
Second part
GENES
IDENTIFIERS
|
|
|
Assign your
expression data to chromosomes and genes |
Skip this if you use Official
Gene
Symbols as identifiers |
III.
(Section 2.1)
Importing GenBank/UniGene
identifiers
conversion table |
Optional |
Before
IV and/or V if they are executed [hours, once
executed UniGene Tabulator process (0.5 days)] |
Analyze expression data labelled by any GenBank RNA sequence accession
number or
UniGene ClusterID |
The expression data cannot be assigned
to the corresponding genes via GenBank/UniGene
sequence identifiers |
V.
(Section 2.2)
Importing Platform conversion
table |
Optional |
After
III. (UniGene
identifiers)
if it is executed
[minutes]
|
Analyze expression data labeled by any Platform gene identifiers |
The expression data cannot be assigned to the corresponding genes via
Platform identifiers
|
VI. (Section 2.3)
Importing
custom identifier conversion table |
Optional |
At any time before expression data import
[minutes] |
Analyzes expression data labelled by your custom identifiers |
The expression data can be assigned to the corresponding
genes only via standard identifiers |
At the end of Set Up, the user may proceed
with
expression data file import.
1 Importing data about chromosomes and
genes of your organism
(Back
to Index)
TRAM software is designed to
create a chromosome set and to
assign the gene expression
values
to the right position within each of them.
The
software is optimized to parse
'Entrez Gene' data to obtain the necessary localization information,
due to both the short update period and the gene positions accuracy
of this database. You may use other sources of data, provided that they
are in the format described below (columns number and order,
file name) to ensure a correct TRAM functioning.
TRAM cannot analyse mitochondrial chromosomes and
non-chromosomal elements (such as plasmids).
1.1 Importing data about the chromosome
number and length (bp) of a selected organism
(Back
to Index)
Note: the
maximum number of chromosome accepted by the software is 25 (including autosomal
and sexual);
All previously
imported data will be deleted.
When different types of deposited
sequences (e.g., with NC_ reference or AC_ or NT_ code type) are
available for
the studied organism, NC_ (RefSeq) sequence is
chosen
as default for each chromosome, AC_ sequence is chosen, if available, in
absence of an
NC_ sequence and finally, NT_ sequence is selected as a last choice.
The following
instructions are also available as a guided procedure within the
software in the "Set Up" area.
From the TRAM Home, click on the "Set Up" button - then on "Genomic" - then on "Chromosomes".
a) Download the table containing data
for each chromosome
If
you did not set the ‘Organism’ field into the 'Setting Segment' or
'Setting
Cluster' window, the software will ask you for the searched ‘Organism’,
please insert the name only
as Latin name (e.g. Human = Homo
sapiens).
If appropriate, use complete species/strain name given in square
brackets by "Entrez Gene" on line database (e.g., Saccharomyces cerevisiae S288c).
Click on the
‘Open
Entrez Gene Table site’ button.
In the 'Entrez Gene'
displayed page,
click on the 'Gene-Genomes' link on the right, then
click on the 'Find related data' (Database:)
menu on the right,
select 'Genome'
and click on the 'Find items'
button.
Save
the resulting data
choosing 'Send to file'
from the 'Send to' pop-up menu
(do not change the suggested file name).
(The web site may alert you about the download of no more than 100
chromosomal data: please confirm).
Move the obtained ‘genome_result.txt’
into the ‘TRAM’ folder.
b) Import the file
Click on the ‘Import genome_result.txt’ button, to
automatically import
and parse the obtained chromosome data.
At the end, data fields in the
table will appear as follows:
[Chromosome] [Length]
[Organism] [Chr_ID]
chr1
249,250,621 Homo sapiens 1
...
where Chr_ID is a unique progressive number assigned by TRAM
to each chromosome.
Following
chromosome data import it is useful to check the "Chromosome" Table
(click on the "Chromosomes" link in TRAM page from which you have
launched the import of chromosome data, or on the "Chr." button in the
TRAM Home).
You may manually edit the chromosome records if necessary, using the
"Record" Menu and typing into the appropriate fields.
From the TRAM Table "Chromosomes" you may click on the "Chr_ALL"
button that will bring you to a table with the full set of chromosome
sequences available for the chosen organism. As explained above, for
each chromosome type only one sequence will be automatically imported
in the final "Chromosomes" TRAM Table, reflecting the priority order
used by NCBI "Entrez Genes" to determine the default coordinates for
the genes on that chromosome (i.e., sequences with NC_ or AC_ or NT_
prefixes in the accession number, respectively). While it is not
advisable to use a different chromosome sequence set other than
that automatically selected by TRAM, the "Chr_All" table gives you a
global picture of all chromosome sequences available for the considered
organism.
Note:
for organisms with only one chromosome (e.g., prokaryotes) insert
manually the chromosome data, as follows:
from TRAM
Home, click on the "Chr." button,
then on the appearing "Chromosome" layout
create a new record
by selecting "New record" from
the "Record" menu
and insert these data manually in the corresponding field:
Chromosome
Chromosome (exactly this word: chromosome).
Length
Chromosomal
length in bp (it can be derived from the
corresponding GenBank entry; e.g., 5,498,450 for NC_002695).
Organism
Organism Latin name
(e.g., Escherichia coli).
If
appropriate, use complete species/strain name given in square
brackets
by "Entrez Gene" on line database (e.g., Escherichia coli O157:H7 str.
Sakai).
Chr_ID
1 (exactly this digit: 1).
GenBank
#
GenBank accession number (it can be derived from the'Entrez
Gene' entries relative to the investigated
organism, e.g., NC_002695).
Use complete species/strain name given
in square brackets by "Entrez Gene" on line database
(e.g., Escherichia coli O157:H7 str. Sakai).
For organisms with only one
chromosome do not use 'Special' functions to perform TRAM set up.
1.2 Importing localization data for
known genes
(Back
to Index)
The following
instructions are also available as a guided procedure within the
software in the "Set Up" area.
From the TRAM Home, click on the "Set Up" button - then on "Genomic" - then on "Genes".
a) Download the data
for each gene from "Entrez Gene"
Note: All previously imported
data
will be deleted.
Note:
if the software ask you for the name of the ‘Organism’, please insert this only as Latin name
(e.g. Human = Homo sapiens).
If
appropriate, use complete species/strain name given in square
brackets
by "Entrez Gene" on line database (e.g., Saccharomyces cerevisiae S288c).
Click on the 'Open
"Entrez Gene" web site' button.
The 'Entrez Gene' web page will be opened showing all gene data
needed for the
specified organism.
It is not necessary to select a gene subset, as they will be all
collected
by
default; alternatively, a more specific search can be performed using the Entrez
limits options.
Save the resulting data
as follows:
click on the 'Send to' link on
the right,
choose 'File' ('Summary (text)'
format,
default order)
and click on the 'Create file'
button
(do not change the suggested file name).
Move the obtained 'gene_result.txt' into the ‘TRAM’
folder.
b) Import the file
Click on the 'Import
the 'gene_result.txt' file' button
to automatically import and parse the downloaded gene data.
IMPORTANT - Do
not import the same text file more than once into TRAM database;
download or decompress the file again if
you need to repeat the import twice.
Gene entries without genomic
coordinates,
or with the word "Pseudogene" in the
"Description"
field, will be deleted.
At the end, data fields in the table for an RNA transcript will appear
as follows:
[Chromosome] [start site] [end
site] [Gene symbol]
chr1
67,278,568 67,390,570 WDR78
You
may check and freely edit the data in the TRAM table "Genes".
1.3 Importing localization data
for
EST Clusters,
if
these data are available in “UCSC Genome Browser”
(Back
to Index)
Note: this
step is necessary if you wish to analyze the expression data not only
for
known genes but also for genes
so far identified only as UniGene Cluster (cluster of ESTs, Expression
Sequence Tags).
The genomic
coordinates for UniGene Cluster are available for several
organisms in the "UCSC Genome Browser" (University of California at
Santa Cruz).
Assembly (build)
version for the investigated genome in UCSC and NCBI must be the same,
in order to use the same reference genome coordinates and successfully
integrate localization data from known genes and from ESTs.
The number of the current
NCBI Genome Build may be
obtained from the site:
http://www.ncbi.nlm.nih.gov/mapview/
by clicking on the organism of interest.
The corresponding genome assembly version used by UCSC Genome Browser
to produce EST localization data may be chosen from the "assembly" menu
in the "Table Browser" web page:
http://genome.ucsc.edu/cgi-bin/hgTables?
Note:
All
previously imported EST Clusters data will be deleted.
Note:
this step
must be performed after
the previous "Set up Genes" process (section 1.2) and the UniGene
identifiers
conversion table
import (section 2.1).
a) Download the EST
localization data from UCSC "Genome Browser"
The following
instructions are also available as a guided procedure within the
software in the "Set Up" area.
From the TRAM Home, click on the "Set Up" button - then on "Genomic" - then on "EST Clusters".
Click on the ‘Open
Genome Browser site’ button.
Then in the web browser page select:
clade:
your investigated clade (e.g., Mammal)
genome:
your investigated genome (e.g., Human)
group:
"mRNA and EST Tracks"
track:
"ESTs" (if available,
otherwise current set up
is not possible)
table: "all_est"
region: "genome"
output format:
"selected fields from primary
and related
tables"
output file:
EST.txt
file type returned: gzip
compressed
Click
on the 'get output' button and select the
following fields in the
appearing table:
qName
tName
tStart
tEnd
Click on the 'get output' button at the bottom of
the page.
Once the download of the file 'EST.txt.gz' is complete, decompress it
and put the resulting 'EST.txt'
file into the ‘TRAM’ folder.
b) Import the file
Click on the ‘Import "EST.txt" file’ button,
to automatically import and
parse the obtained UniGene clusters location data file.
At the end, data fields in
the table for an RNA transcript will appear as follows:
[Chromosome]
[start site] [end site] [ClusterID]
chr1
67,278,568
67,390,570 Hs.49421
You
may check the processed data in the TRAM table "EST_Clusters" (from
TRAM Home, click on the "ESTs" button, then on the "EST_Clusters"
orange button.
EST entries are parsed via their relationship with "UniGene_ID" table:
ESTs
belonging to UniGene Clusters are imported in the "EST_Clusters" table,
where localization for each cluster is calculated between the minimum
start coordinate and the maximum end coordinate available for each EST
cluster.
To omit incongruent
result, the
parsing process will subsequently import in the "Genes" table only
the unambiguously mapped UniGene clusters.
To this aim, entries with a chromosome name not equal to one in the
chromosome names in the "Chromosomes" table
will not be considered, as well as those with ESTs mapping on very
distant positions on the same chromosome. To this aim, we set a
conservative limit to 250,000 bp in TRAM, considering that in Entrez Gene
the set was of 28,355 human genes (the largest known genes), the mean
size was 43,698
and the standard deviation 102,616, so this is equivalent to consider a
size range within mean plus or minus 2 SD (approximately 95% of values
in a Gaussian distribution). This correction effectively removes
approximately 3,000 transcripts erroneously mapped to regions of
several Mb or tens of Mb. The user retains the possibility to inspect
the
list of EST clusters with a genomic extension >250 kb
that are present in a given chromosome segment, even if they are not
considered
in the creation of the transcriptome map. For this purpose, click
"Go" under the title "Genes Table" in the "Map" result layouts,
then click "EST Clusters - Go".
2 Importing gene
identifiers conversion data tables
(Back
to Index)
TRAM software is designed to collect expression data files where genes
are identified via specific symbols.
Default Gene Identifier used by TRAM is the Official Gene Symbol
(or, in its absence, the "Entrez Gene" abbreviation in the entry
header), e.g.:
[Columns Headers are not
required]
[Gene]
[Expression value]
HBB
160.03
FLJ39609
132.50
If you have a list of
symbols of this type, with the corresponding
expression values, you can directly go to "Home" and start to Import
expression data.
"Gene name" in TRAM is the best name
available for a gene (represented by, in decreasing order: Official
Gene Symbol, or
the name in the "Entrez Gene" entry header, or the UniGene Cluster ID).
If the expression data are labelled with gene identifiers/symbols
different from Official Gene Symbols or from the names in the "Entrez Gene"
entry header, TRAM tries
to convert any user-provided gene identifier into
an
official Gene Symbol/Gene name.
For this purpose, the user has to import the
two-column conversion tables listing a gene identifier and
the corresponding Gene
Symbol.
It is possible to import more than one Identifier Conversion Table.
TRAM has an original, powerful system to integrate multiple alternative
conversions of gene identifiers.
The following
instructions are also available as a guided procedure within the
software in the "Set Up" area.
From the TRAM Home, click on the "Set Up" button - then on "Gene ID".
NOTE:
TRAM will try to convert the Gene identifiers present in the user
expression data files to Gene Symbols/Gene names, following this
priority order until a positive match is found:
1) if you set up the "Custom" table as described in section 2.3 of the chapter "Set up",
the "Custom" table will be first searched
to match Gene identifiers in your data to
the corresponding Gene Symbols/Gene names, overriding all other
conversions;
2) if no
match has been found, then the "Genes" table
(mandatorily setup as described in section 1.2 of the chapter "Set up") will be searched to directly
interpret Gene identifiers in your data as Gene Symbols/Gene names;
3) if
you write a Platform ID code (e.g., GPL... for a GEO Platform) in (at
least) the first line of the
third
column of your data (formatted as described in the section 3 of
this Guide), a corresponding list of gene Identifiers
(often a series of progressive numbers) is expected in your data
and each will
be converted in the corresponding Gene Symbol/Gene name, if you previously set up the table for the relative platform as
described in section 2.2
of the chapter "Set up";
for example:
1007_s_at 6.38
GPL96
1053_at
6.65
117_at 6.48
...
...
[Note
- If the first expression value is not in the first row due to the
presence of some header lines, please use the very first row
anyway in your file to indicate the Platform code, making sure that you
are
writing it in the third column.
If you have only one column in the
first row, please press the tabulator key twice then write the
Platform code.
Do not insert blank spaces or other characters at the end of the text
in
a column].
4) if
you write the word GeneID in
the first line of the third
column of your data (formatted
as described in the section 3 of this Guide), an "Entrez Gene"
Identifier (a progressive number) is expected in your data and
it will be converted in the corresponding Gene Symbol/Gene name
searching in the "Genes" Table
(this has been mandatorily setup as described in
section 1.2 of the
chapter "Set up");
for example:
780 6.38
GeneID
5982
6.65
3310 6.48
...
...
[Note
- If the first expression value is not in the first row due to the
presence of some header lines, please use the very first row
anyway in your file to indicate the 'GeneID' option, making sure that
you are
writing it in the third column.
If you have only one column in the
first row, please press the tabulator key twice then write the 'GeneID'
word.
Do not insert blank spaces or other characters at the end of the text
in
a column].
5) if
no match has still been found, the "Unigene"
Table will be searched to directly interpret Gene identifiers in your
data as GenBank
Accession numbers (if you set
up this table as described in
section 2.1 of the
chapter "Set up");
6) if
no match has still been found, the "Unigene"
Table will then be searched to directly interpret Gene identifiers in
your data as UniGene Cluster identifiers (if
you set
up this table as described in
section 2.1 of the
chapter "Set up").
When a match is found, this will prevent the software to search for
symbol into the next tables.
We suggest to use recently
released data for each table to be imported in the TRAM software.
2.1 Conversion of Sequence accession
numbers to Gene Symbols
(Back
to Index)
If you have labelled your
expression data values by sequence identifiers,
you will have to generate and import the complete UniGene identifiers data table for
your organism, which will match any GenBank
accession number for a
transcript (RNA, EST) to the known Gene Symbol,
when available or, as a second choice, to the corresponding UniGene Cluster ID,
if existing.
Note: this process has been already
performed (update: Mar. 2010) for the Homo sapiens, Mus musculus and
Danio rerio
provided pre-setup versions of TRAM.
Note: In order to
keep the data
updated, all
data previously
imported in
this table will be deleted during a new import.
This step
must be done before
the import of EST localization data (section 1.3) and/or Platform
(section
2.2) data, if one of these import processes is performed.
a) Prepare a
table containing four columns, separated by a tabulator,
relating each
GenBank
Accession number to
the respective UniGene
Cluster ID, Gene Symbol
(when available) and GenBank Identifier (GI) (if desired), e.g.:
[Columns Headers are not required]
[GenBank
[UniGene [Gene
Symbol] [GenBank GI
Accession]
Cluster ID]
Identifier]
AF117710
Hs.523443 HBB
4378803
To
do this, we propose to import the default output file of 'UniGene Tabulator' (version
1.1 or later) software,
a tool able to parse the whole UniGene database for an organism.
The
following
instructions are also available as a guided procedure within the
software in the "Set Up" area.
From the TRAM Home, click on "Set Up" button - then on "Gene
ID" - then on "Sequence IDs".
Click on the 'Open 'UniGene Tabulator' site'
button:
http://apollo11.isto.unibo.it/software/UniGene_Tabulator/
your default internet
browser will show the software download page.
Download the current version of the
software for your OS.
Please, follow the instructions in the
UniGene Tabulator User Tutorial to automatically parse UniGene data for the
organism of your interest. Please,
note
that for TRAM purpose, it is not necessary to import the
UniGene library data file into UniGene Tabulator.
At the end of the process, the file 'UniGene.tab'
will be automatically created into
the 'UniGene Tabulator' folder. A message will alert the user about the
availability of the 'UniGene.tab' file at the end of the process. This
file contains a useful code conversion among:
GenBank accession number, UniGene cluster ID and official Gene Symbol.
The parsing
process could employ several hours to complete, depending on
the amount of data available for the selected organism.
At the end of
the process the file 'UniGene.tab'
will appear in your desktop.
b) How to
import the
UniGene tabulated data file in TRAM
Move the 'UniGene.tab' file
into the TRAM folder.
Click on the 'Import the 'UniGene.tab'
file' button
to import the data into the respective "UniGene_ID" database table.
Note: All
previously imported data will be deleted.
This step is necessary to use either GenBank
Accession Numbers or
UniGene Cluster IDs as gene
identifiers.
The GenBank Accession Number must
lack the version of the sequence, which if present is separated by a
full stop mark from the main number (i.e. do not use AK125137.1).
IMPORTANT: If you perform this
step after importing the gene
identifiers for a Platform (section 2.2), you have to run the
import and analysis of sample expression data again, because the
conversion of
the identifiers
to the matching gene symbols may have been changed.
Quality control. All imported
records should have a value in the fields UniGene_ID (UniGene cluster
identifier) and GenBank_AN (GenBank accession number). At the end of
the import of UniGene.tab file into TRAM,
you may search for records with empty 'UniGene_ID' or 'GenBank_AN'
field [to do this, go to the 'UniGene' table of TRAM, press "Find" on
the window top bar and then type "=" (without quotes) in the
'UniGene_ID' or 'GenBank_AN' field].
If you find one or more records without a UniGene_ID or a GenBank_AN,
you may manually
fill the missing values in, after obtaining them by searching for the
GenBank
accession number with an empty UniGene_ID (or for the UniGene_ID with
an empty GenBank accession number, respectively) at the address:
http://www.ncbi.nlm.nih.gov/sites/entrez?db=unigene
2.2 Importing gene probe
identifiers for a
Platform
(Back
to Index)
This step is necessary to use gene probe IDs as gene
identifiers for a particular array Platform registered in the GEO (Gene
Expression Omnibus) on line database or otherwise available.
In order to relate the expression data values to Platform
identifiers,
the corresponding identifiers data table(s) must be imported.
The
following
instructions are also available as a guided procedure within the
software in the "Set Up" area.
From the TRAM Home, click on the "Set Up" button - then on "Gene
ID" - then on "Platform
IDs".
a) Alternative option: if the
expression data you are going to analyze are derived from the GEO
database,
locate the GEO Platform data
for the platforms of your interest by searching for a Platform (e.g.,
GLP96) in the 'accession' field
of the web page "Accession Display":
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi
Details about the GEO database can be found at:
http://www.ncbi.nlm.nih.gov/geo/
please read the GEO 'Overview' section at:
http://www.ncbi.nlm.nih.gov/geo/info/overview.html
On the bottom of the resulting Platform description Web page,
click on the 'Download full
table...' button
and save the file,
or click on the 'View full table'
button
and save the resulting Web page as a text file.
If neither of these two options is available, please click on the link:
SOFT formatted family file(s) to download the Platform description file
in format .soft.
Manually change the file extension ".soft" into ".txt".
You may also obtain platform (.adf) text files from ArrayExpress database:
http://www.ebi.ac.uk/arrayexpress/
b) Alternatively, you may use a
platform data file from any source, provided that you have at least two
columns of data (in
tabulated text format):
- the list of gene identifiers
(ID), describing
the genes included in that experimental platform;
this column must
have the header (first row):
"ID" (without
quotes);
- the corresponding
GenBank
Accession Number or Gene Symbol.
For example:
ID
[GB_ACC] [Gene Symbol]
1007_s_at U48705
DDR1
[Columns Headers are not required, except for
the ID header]
Further data in the column, e.g.
Web addresses, will be automatically
ignored by TRAM.
c) Import the Platform data file
(tabulated text) in TRAM.
From the TRAM Home, click on the "Set Up" button -
then on "Gene
ID" - then on "Platform
IDs".
Click on the 'Import
the Platform data file' button.
You will then be guided to locate
these columns:
* ID
(the Platform ID for the probe)
* GB_ACC
(the GenBank Accession Number for the probe
sequence, when available, or alternatively
the GenBank GI code)
* Gene symbol (the
official Gene Symbol, when available)
Further data in the
column, e.g. Web addresses or additional GenBank Accession Numbers
following the first one, will be automatically
ignored by TRAM.
NOTE - If the column type
is not
clear by simple inspection of the column content within the first rows,
please scroll down the window
to evaluate further records (rows) that
could clarify if that column contains sequence accession numbers and/or
gene symbols identifiers.
Platform data will be imported into the "Platform_ID" TRAM database
table.
You will be requested to assign a unique code to each Platform after
its import.
At the end of the import, you may
delete the original Platform data file.
In addition, a text file with the processed Platform data is
automatically created in the 'Platform' folder within the 'TRAM'
folder. This file is automatically named: 'GPL...", where (...) is the
code you assigned to the platform. The prefix 'GPL' is used
independently on the GEO origin of the platform. These files
could be useful in the case you successively need to execute a batch
platform import (section 'Special' from the TRAM Home) in another copy
of the TRAM software, provided that you rename them as GPL1.txt,
GPL2.txt and so on.
If you use pre-set up versions,
or you ran the 'Import
GenBank/UniGene identifiers conversion table' section in
the Set up (Part 2 - section 2.1),
TRAM will try to use first the GenBank
accession to relate the
sequence to the corresponding updated Gene Symbol (when available) or
to a UniGene Cluster; alternatively, the ‘Gene Symbol’, as
provided in the data file, will be used.
In particular, TRAM will search to assign
the Gene name to each platform gene identifier in this priority
order (if all these steps give negative results, a name will be not
assigned and the gene will be not further analyzed):
1. Gene symbol or name or UniGene ID obtained from UniGene via the sequence accession number
originally provided for the gene probe;
2. Gene symbol or name or UniGene ID obtained from UniGene via the UniGene ID originally
provided for the gene probe;
3. Gene symbol as provided by the Platform scheme available online;
4.
GenBank accession number (or GenBank GI) as provided by the Platform
scheme
available online.
Repeat the import process for any desired Platform.
From the Platform TRAM table, you may click on the 'Platforms Summary'
button, which will take you to a summary table of the data about each
Platform. The button 'Show Identifiers' associated to each Platform
record will show all Identifiers of the relative Platform.
A file with formatted platform data ready to be imported in TRAM will
be
also created at the end of each guided platform import. This is useful
for
any subsequent possible use of the 'Special' batch
unsupervised platform import function described below.
Clicking on 'Special' in the TRAM
'Home' window will allow the user to start a
batch data import of large pools of Platforms data without the user
intervention.
To this aim, prepare all the
files with Platforms data as described, name them GPL1.txt, GPL2.txt,
... and put them
within the 'Platform' folder of the main directory of TRAM.
In this case, a fourth column
must be added at least in the first
row, with the code
identifying the Platform whose data are present in the file
(e.g.,
GPL96):
[Columns Headers are not required]
[ID]
[GB_ACC] [Gene Symbol] [Platform]
1007_s_at U48705
DDR1 GPL96
1053_at M87338 RFC2
...
... ...
You will be asked to
choose whether to delete or not the previously
imported Platform data.
---
IMPORTANT - To interpret the
identifiers in your gene expression
data
file as Platform ID
for
the relative setup Platform, remember to write the Platform code (e.g., GPL... for
a Platform) in
the third column of your
expression data file, at least in the first
row, so that your expression data file will contain three
columns separated by one tabulator, in this format:
Expression data file
[Columns Headers are not required]
[Gene ID]
[Value] [Platform]
1007_s_at 6.38
GPL96
1053_at
6.65
117_at 6.48
...
...
The Platform code in the third column
will allow TRAM to link the gene identifiers to the corresponding
Platform.
The following Platforms
(commercially available)
are already
loaded as default in the pre set-up versions for human, mouse
and zebrafish (the number of sample available in GEO for each Platform
has been updated on January 21, 2011):
HUMAN
(Platforms with > 1,000 Samples in GEO)
01) GPL570 [HG-U133_Plus_2]
(48,497 Samples)
Affymetrix Human Genome
U133 Plus 2.0 Array
02) GPL96 [HG-U133A]
(26,013
Samples)
Affymetrix Human
Genome U133A Array
03) GPL97 [HG-U133B]
( 5,158 Samples)
Affymetrix Human
Genome U133B Array
04)
GPL571 [HG-U133A_2]
( 4,802 Samples)
Affymetrix Human Genome
U133A 2.0 Array
05) GPL8300
[HG_U95Av2]
( 4,752 Samples)
Affymetrix Human
Genome U95 Version 2
Array
06) GPL6104 Illumina
humanRef-8 ( 3,673 Samples)
v2.0 expression
beadchip
07) GPL6947
Illumina HumanHT-12 ( 3,464 Samples)
V3.0 expression
beadchip
08) GPL201 [HG-Focus]
( 3,278
Samples)
Affymetrix Human
HG-Focus Target Array
09) GPL1708 Agilent-012391
( 2,717 Samples)
Whole Human Genome
Oligo Microarray G4112A
(Feature Number
version)
10) GPL6480 Agilent-014850
( 2,437 Samples)
Whole Human
Genome
Microarray 4x44K G4112F
(Probe Name
version)
11) GPL4133
Agilent-014850
( 2,325 Samples)
Whole Human
Genome
Microarray 4x44K G4112F
(Feature Number
version)
12) GPL6102
Illumina human-6
(
2,286 Samples)
v2.0 expression beadchip
13) GPL6244 [HuGene-1_0-st]
( 1,985 Samples)
Affymetrix Human Gene 1.0 ST Array
[transcript (gene) version]
14)
GPL887 Agilent-012097
( 1,950 Samples)
Human 1A Microarray (V2) G4110B
(Feature Number
version)
15) GPL5175 [HuEx-1_0-st]
( 1,834 Samples)
Affymetrix Human Exon 1.0 ST Array
[transcript (gene) version]
16) GPL3921 [HT_HG-U133A]
( 1,698 Samples)
Affymetrix HT Human Genome U133A Array
17) GPL5188 [HuEx-1_0-st]
( 1,385 Samples)
Affymetrix Human Exon 1.0 ST Array
[probe set (exon) version]
18) GPL2986 ABI
( 1,250 Samples)
Human Genome Survey Microarray Version 2
19) GPL2507
Sentrix Human-6
( 1,109 Samples)
Expression
BeadChip
20) GPL6884 Illumina
( 1,037 Samples)
HumanWG-6 v3.0 expression beadchip
21) GPL2700 Sentrix
HumanRef-8
( 1,030 Samples)
Expression BeadChip
22) GPL91 [HG_U95A]
( 1,019 Samples)
Affymetrix Human Genome U95A Array
23) GPL7091 Agilent Human oligo 22k A
( 16 Samples)
(This rarely
used
platform has been set up
because
requested to parse some samples of the
biological model
presented in the paper)
MOUSE
(Platforms
with
> 900 Samples in GEO)
01) GPL1261 [Mouse430_2]
(20,405 Samples)
Affymetrix Mouse
Genome 430 2.0 Array
02) GPL81 [MG_U74Av2]
( 5,938 Samples)
Affymetrix Murine
Genome U74 Version 2 Array
03) GPL339 [MOE430A]
( 4,434 Samples)
Affymetrix Mouse
Expression 430A Array
04) GPL3677 Rosetta/Merck Mouse 44k
( 2,183 Samples)
1.0 microarray
05) GPL6246
[MoGene-1_0-st]
( 2,179 Samples)
Affymetrix Mouse Gene 1.0 ST Array
[transcript (gene) version]
06) GPL8321 [Mouse430A_2]
( 2,109 Samples)
Affymetrix Mouse
Genome 430A 2.0 Array
07) GPL891 Agilent-011978
( 1,358 Samples)
Mouse Microarray
G4121A
(Feature Number version)
08) GPL4134
Agilent-014868
( 1,060 Samples)
Whole Mouse Genome Microarray 4x44K G4122F
(Feature Number version)
09)
GPL340 [MOE430B]
( 972 Samples)
Affymetrix Mouse Expression 430B Array
10) GPL3562 Rosetta/Merck Mouse TOE 75k ( 970 Samples)
Array 1
microarray
11) GPL6466 Agilent-011978
( 970 Samples)
Mouse Microarray G4121A (Probe Name version)
ZEBRAFISH
(Platforms with
> 10 Samples in GEO)
01) GPL1319 [Zebrafish] Affymetrix
( 750
Samples)
Zebrafish Genome
Array
02) GPL6457
Agilent-019161
( 157 Samples)
D. rerio (Zebrafish) Oligo Microarray
(V2) G2519F (Feature Number version)
03) GPL7302
Agilent-015064
( 111 Samples)
D. rerio (Zebrafish) Oligo Microarray
4x44K G2519F (Probe Name version)
04) GPL6563
Agilent-015064
( 110 Samples)
D. rerio (Zebrafish) Oligo Microarray
4x44K G2519F (Feature Number
version)
05) GPL2878
Agilent-013223
(
62 Samples)
D. rerio (Zebrafish) Oligo Microarray G2518A
Option 001
(Feature Number version)
06) GPL7301
Agilent-019161
( 24 Samples)
D. rerio (Zebrafish) Oligo Microarray
(V2) G2519F (Probe Name
version)
2.3. Import Custom identifiers
conversion table (INDEX)
For expression
data values related to personal "custom" gene identifiers, with the
correspondence between gene/probe identifiers and gene symbols
established by the user, the user has to import the Custom identifiers
data table(s).
The
following
instructions are also available as a guided procedure within the
software in the "Set Up" area.
From the TRAM Home, click on the "Set Up" button - then on "Gene
ID" - then on Custom
IDs".
a) Prepare a
table containing 2 columns, separated by a tab,
for each of your custom identifier, e.g.:
[Columns Headers are not required]
[ID] [Official Gene
Symbol/Gene name]
My_1 HBB
My_2 FLJ39609
Save the data in text format.
b) Import the table
Click on the 'Import the custom
file' button to import the custom table into the 'Custom_ID'
TRAM
database table.
It is possible to subsequently import additional custom tables for
conversion of other identifiers. The conversion specified in the 'Custom_ID' TRAM database table will override any other conversion.
USE
(Back
to Index)
While
TRAM_HUMAN.zip, TRAM_MOUSE.zip and TRAM_BRARE.zip files contain
pre-setup versions ready to analyze expression data from human, mouse
and zebrafish organisms, you may also download an empty TRAM template
that may be prepared for the analysis of data from any organism.
Pre-setup versions may be directly used to import and analyze
expression data without performing the 'Set up' process. However, the
user might need to perform the 'Set up' section 2.2 to load additional
Platform schemes, if necessary to interpret the gene identifiers listed
in his expression data file (see below).
Conversely, the empty TRAM template must be always prepared by
performing
the 'Set up' process from the beginning (section 2).
Note - Data saving
TRAM, as any
FileMaker-based database, automatically saves
any
changes, so you
will not find any save options at the end of the import processes.
After the import processes, avoid any manual data change that may cause
the
loss of the original imported data.
Note - Advanced
use
You may open the program files using your
copy of FileMaker Pro 10 or later, thus becoming fully able to make any
modification to
the software.
In this case, do not open the program using
the "TRAM" file,
but open, within FileMaker Pro, the file "TRAM.TMA" instead.
Following modifications, the correct functioning
of the program requires its re-launch by "TRAM"
runtime, due to
data pathway structure stored in the “TRAM” Scripts.
To cancel a TRAM operation before it is completed (not recommended):
Press Command-period (Mac OS X) or Esc (Windows).
It is possible to compare two different biological conditions,
importing one as the 'A' sample
(or samples pool), and the other as the 'B' sample (or sample pools) to be
compared to 'A'.
Switching by TRAM database tables may be done by clicking on the
relative buttons present in each layout.
3 Import the expression data
files
(Back
to Index)
The
user
is
responsible for the homogeneity or comparability of
the data to be
imported in terms of: biological sample, microarray platform (although
inter-sample normalization methods are provided), and spot
quality filtering/data preprocessing.
The
software will
map the imported values along the chromosomes, but it can't check the
validity of the experimental design.
Each
series of data related to a "Sample" is defined as a
'distinct
biological sample',
for example in the case
of two channel experiment, a sample should be a single channel, each
channel data being imported as a distinct
data file.
Be sure that your system default
format uses
"."
(full stop mark)
as a decimal separator (English
standard).
See below how to
check and change the setting if necessary.
IMPORTANT.
The expression data file must be a tabulated
(tab-delimited) text file containing two columns separated by a TAB
character (tabulator key, ASCII9).
First ("left") column: Gene probe
identifier:
Official Gene Symbols/"Entrez
Gene" names (default);
or, if set up the
relative conversions:
Custom
identifiers, or
Platform
Identifiers or
GenBank Accession
numbers.
Second column:
numerical expression value.
Use "." as a
decimal separator
(and do not use a thousand separator).
Be sure that your system default
format use
"."
(full stop mark)
as a decimal separator (English standard).
If this is not the case, you must change the system setting.
Mac OS X: in "System Preferences" (from the
"Apple" Menu),
click on "International", then on "Formats",
then choose as "Region" a country with the English standard format for
numbers (full stop mark as a decimal separator).
System
restart or user logout is not required to make the change effective.
Windows: in "Control Panel" (from the "Start"
Menu),
click on "International options" then modify the format of numbers
choosing a country with the English standard format for numbers (full
stop
mark as a
decimal separator).
System
restart or user logout is not required to make the change effective.
The expression value is usually the pre-processed intensity value, i.e.
the value
assigned to the spot as it has been processed by the software of the
specific experimental platform used (for instance following background
subtraction for a microarray spot).
Scientific notation is supported in
the format, for example, 20E-2.
TRAM considers the expression values
as
linear data, and not
logarithm-transformed data. If necessary, data should be
retransformed before importing them in TRAM. TRAM can back-transform
log-transformed values (in base 2, 10 or e) if user prepares data using
'Help with data' utility (see below).
Ratio values (e.g., ratio between two microarray channels) are
not admitted in TRAM.
When the pre-processed expression values are not available, the
user may consider the background (BKD) median as the median of the
pixel intensities in the area surrounding the spot, and the feature
(spot) median as the median of the pixel intensities in the area inside
the spot. The spot intensity may be then calculated by subtracting the
background median value from the feature median value, and used as the
expression value for the corresponding gene. Clicking
on the 'Help
with data' button in the TRAM 'Home' window will allow the
user to be interactively assisted in the preparation of text files of
the required format, including calculation of the spot intensity by
subtracting the background value from the spot value (see below for details).
Third column
[optional]: Platform code (GPL),
it is needed at least in the first row.
IMPORTANT - To interpret the identifiers
in your gene expression data file as ID for
the relative Platform, you must previously
have set up the
corresponding Platform as explained in section
2.2 of this Guide. Some Platforms are pre-setup as described in
the same section.
Example:
[Columns Headers
are not required]
[Probe ID] [Value]
[Platform]
1007_s_at 6.38
GPL96
1053_at
6.65
117_at 6.48
...
...
Note:
If the first expression value is not in the first row due to the
presence of some header lines, please use the very first row
anyway in your file to indicate the Platform code, making sure that you
are
writing it in the third column.
If you have only one column in the
first row, please press the tabulator key twice then write the
Platform code.
Do not insert blank spaces or othe characters at the end of the text in
a column.
If you use the GenBank Accession
Numbers as identifiers, please do
not append the version of the sequence to the GenBank identifier, i.e.
use AB123456 and not
AB123456.1.
Batch
processing of a sample series
It is possible to prepare in batch mode a series of sample data files related
to the same work, obtained with the same Platform and formatted in
an identical way.
Put all the files to be processed in the 'Series' folder
located in the "TRAM" folder, naming them S1.txt, S2.txt and
so on.
From the TRAM "Home", click on the 'Help
with data' button and then on the Data file batch processing button.
Locate the "ID" and "Value" columns when requested for the first
sample.
Insert the name of the Platform when requested.
TRAM will then automatically process
all the files located in the 'Series' folder using the same criteria,
generating a series of uniformly processed data files with names such
as P1.txt,
P2.txt
and so on. These files may be transferred in the 'Batch_Import_A' or
'Batch_Import_B' folders to be automatically imported by TRAM using the
'Batch mode' import buttons in the TRAM 'Home', after renaming them with names such as A1.txt, A2.txt ... or
B1.txt, B2.txt ...,
respectively.
Management of
absent/negative/zero values
Probes whose expression value is
absent (i.e. empty, not available) will not be further
considered by TRAM for the construction and analysis of the maps,
assuming that an expression level has not been measured.
Sample expression values equal to or lower than "0" (≤0)
will be
thresholded to 95% of the minimum positive value present in that
sample, in order to obtain meaningful numbers when dividing "Samples
Pool A" values by "Sample Pool B" values.
Assuming that in these cases an expression level is too low to be
detected under the used experimental conditions, this transformation
still allows to obtain a ratio between values in the pool 'A' and
values in the pool 'B', which is useful to highlight differential gene
expression.
Expression
values assigned to unmapped genes (without
known genome coordinates) will be
normalized and it will be possible to browse through them in the 'Values_A_B_All'
layout, but they will not be used in the construction and analysis of
the maps.
From the 'Values_A_B'
layout, the button 'A/B (unmapped)' option brings to the layout 'Values_A_B_All'.
Import utilities
The
user must provide TRAM with one or more expression data files with at
least two columns: Gene/Probe ID and its corresponding numerical
expression value. To prepare the files in this format, you may use any
word processor or spreadsheet program and save the file in tabulated
text format.
To simplify the extraction of the relevant columns from any available
tabulated text file providing expression data, generated by the user's
experimental
platform or publicly available from any online source, the TRAM
internal utility "Help with data"
can
be used by
pressing the
relative button in the TRAM Home.
IMPORTANT - To interpret the identifiers
in your gene expression data file as ID for
the relative Platform, you must have previously set up the
corresponding Platform as explained in section
2.2 of this Guide. Some Platforms are pre-setup as described in
the same section.
Clicking
on the 'Help
with data' button in the TRAM 'Home' window will allow the
user to be interactively assisted in the preparation of text files of
the required format. The user will be guided to import his data file,
and to select the two columns containing gene identifiers and
expression values. Finally, a Platform code must be indicated if the
gene/probe identifiers are not the standard gene symbols and they need
to be converted in gene symbols using Platform data loaded in TRAM (see
section 2.2).
Finally, the software asks the user to save the data, generating a
text file suitable to be imported in TRAM. The user may choose the
desired file name.
If the user plans to import
expression data files using 'Batch Import'
mode
of feeding the database, the
text files must be saved with a name of the type A1.txt,
A2.txt ... (in the TRAM folder
'Batch_Import_A') or B1.txt, B2.txt ... (in the TRAM folder 'Batch_Import_B').
Clicking on the 'Special'
button in the TRAM 'Home' window will allow the user to
automatically perform
batch data import of large pools of samples for both 'A' and 'B' Pools
in succession, provided that the expression data files have been
prepared in the required format (possibly using the 'Help with data'
utility) and have been saved
in the TRAM folder 'Batch_Import_A' (with names such as A1.txt,
A2.txt ...) and in the TRAM folder 'Batch_Import_B'
(with names such as B1.txt, B2.txt).
Clicking on the 'Export' button in the TRAM 'Home' window will
assist the user in the export of the (raw or normalized) imported data.
The
following instructions are also available as a guided procedure within
the software in the appropriate "Set Up" area ("Set up - Part 2 - Gene Identifiers
conversion tables").
NOTE:
TRAM will try to convert the Gene identifiers present in your
expression data files to Gene Symbols/Gene names until a positive
match is found, with the following
priority order:
1) if you set up the "Custom" table as described in section 2.3 of the chapter "Set up",
the "Custom"
Table will be first searched to match Gene identifiers in your data to
the corresponding Gene Symbols/Gene names, overriding all other
conversions;
2) if no
match has been found, then the "Genes" table
(mandatorily setup as described in section 1.2 of the chapter "Set up") will be searched to directly
interpret Gene identifiers in your data as Gene Symbols/Gene names;
3) if
you write a Platform ID code (e.g., GPL... for a GEO Platform) in (at
least) the first line of the
third
column of your data (formatted as described in the section 3 of
this Guide), a corresponding list of gene Identifiers
(often a series of progressive number) is expected in your data
and each will
be converted in the corresponding Gene Symbol/Gene name, if you previously set up the table for the relative platform as
described in section 2.2
of the chapter "Set up";
for example:
1007_s_at 6.38
GPL96
1053_at
6.65
117_at 6.48
...
...
[Note
- If the first expression value is not in the first row due to the
presence of some header lines, please use the very first row
anyway in your file to indicate the Platform code, making sure that you
are
writing it in the third column.
If you have only one column in the
first row, please press the tabulator key twice then write the
Platform code.
Do not insert blank spaces or other characters at the end of the text
in
a column].
4) if
you write the word GeneID in
the first line of the third
column of your data (formatted
as described in the section 3 of this Guide), an "Entrez Gene"
Identifier (a progressive number) is expected in your data and
it will be converted in the corresponding Gene Symbol/Gene name
searching in the "Genes" Table
(this has been mandatorily setup as described in
section 1.2 of the
chapter "Set up");
for example:
780 6.38
GeneID
5982
6.65
3310 6.48
...
...
[Note
- If the first expression value is not in the first row due to the
presence of some header lines, please use the very first row
anyway in your file to indicate the 'GeneID' option, making sure that
you are
writing it in the third column.
If you have only one column in the
first row, please press the tabulator key twice then write the 'GeneID'
word.
Do not insert blank spaces or other characters at the end of the text
in
a column].
5) if
no match has still been found, the "Unigene"
Table will be searched to directly interpret Gene identifiers in your
data as GenBank
Accession numbers (if you set
up this table as described in
section 2.1 of the
chapter "Set up");
6) if
no match has still been found, the "Unigene"
Table will then be searched to directly interpret Gene identifiers in
your data as UniGene Cluster identifiers (if
you set
up this table as described in
section 2.1 of the
chapter "Set up").
When a match is found, this will prevent the software to search for
symbol into the next tables.
We suggest to use recently
released data for each table to be imported in the TRAM software.
If you have a list of Gene Symbols as probe identifiers,
with the corresponding
expression values, you can directly go to "Home" and start to Import
expression data, otherwise go to the "Set Up" chapter, Part 2.
Import start
In
the 'Main' ('Home') window there are two button series designed for
rapidly
begin
the
import processes.
The first import button series
('Import A'
and 'Import B')
imports one expression data file into 'Values_A'
table or 'Values_B'
TRAM database table,
respectively.
At the start of the import process,
the
user must choose whether to retain or delete
all previously imported data. Clicking on 'No' in the first dialog box
will let
the user add to the previously imported data one or more other
datasets. The user may subsequently select any sample subset which must
be subjected to analysis.
The second dialog box asks for the selection of the file containing the
data table.
All data imported from a file will be labelled by the software with a
progressive order number (Sample_ID) to easily track (or delete from
the analyzed set by the 'Remove Sample' function) all data belonging
to a
specific set.
In addition, 'Samples_A'
and 'Samples_B'
tables allow to visualize
and annotate the list of imported samples, and to visualize summary
data for each sample.
The 'Go' buttons open a window
in your default browser displaying the entry for Platform,
Series, Sample, Dataset and PubMed record if you annotated
(at any time) the
corresponding fields with codes for GPL,
GSE, GSM, GDS and PMID, respectively.
At the start of an analysis, the user can also select which samples are
to be excluded or included (default) from the current analysis, without
removing them from the TRAM database; alternatively the user may even
remove any sample from the
database.
Please note that changing the set of
samples to be analyzed causes restarting of normalization (see below),
which may take several minutes or hours, depending on the number of
loaded samples.
The software will ask
for the import of another set at the end of
the process.
As final step, the user can check the results of the import
process.
When requested by the software, click
on the 'Continue'
blue button at the top and on the right of
the program window, to ensure a correct
functioning of the software.
The
second import button series ('Batch mode'
buttons) work
in the same way but it is optimized to perform a batch, non
user-supervised import.
By clicking on 'Batch mode' (A or B) all
files (formatted as just described for the manual import) contained
in the
"Batch Import_A" folder or in
the
"Batch_Import_B"
folder, respectively, will
be imported.
In these
folders the file must be named as
A1.txt, A2.txt, ... and
B1.txt, B2.txt, ...,
respectively (without interruption in the series of progressive
numbers).
In the case that you would like to perform a batch import
maintaining the previously imported dataset, the first file name should
be numbered as the first not used Sample_ID number (e.g. if the last
imported set has Sample_ID = 5, the first file must be A6) and that
number will correspond to the Sample_ID of that dataset. The software
will alert you about this. You may check for the currently used
Sample_TRAM_IDs by clicking on the 'Samples A' and 'Sample B' buttons,
respectively, in the TRAM Home.
Clicking on the 'Special'
button in the TRAM 'Home' window will allow the user to
automatically perform
batch expression data import of large pools of samples in succession
for both 'A' and 'B' Pools. Batch import may be followed
automatically by data analysis using the 'Batch Import + Analysis'
button in the 'Special' section.
After the import process,
expression
data are visualized in the 'Values_A' and 'Values_B' tables, that you be displayed by
clicking on the buttons 'A' and 'B',
respectively, from TRAM 'Home' (opening
window).
These are the data fields for the 'Values' tables:
Identifier (the original
probe identifier in your data)
Intensity value (the original numerical
expression value)
Sample_ID
(A1, A2... or B1, B2...).
Platform
(filled if you indicated a Platform code 'GPL...'
in your expression data file.
Exclude
(state of inclusion/exclusion of the data for the analysis)
Gene_name
(Gene Symbol/Gene name
following conversion of Identifiers)
Chr
(chromosome
name)
txStart
(start position of
the gene transcript on the chromosome)
txEnd
(start position of the gene
transcript on the chromosome)
IMPORTANT - The conversion of gene or probe
identifiers to Gene Symbols/Gene names is performed during expression
data import. To keep the database indexed and fast, variation of
set up of the software are not dynamically reflected in variation of
gene assignment to the probe identifiers. Therefore, changing of any table related to the
"Set Up" chapter
('Chromosomes', 'Genes', 'EST_Clusters' and 'UniGene_ID', '
Platforms ID', 'Custom ID') should be followed by reimport and reanalysis of the
expression data to make the changes effective. An exception to this rule is the set up
of new Platforms or new Custom ID sets that have to be applied
only
to new, subsequently loaded samples and not to previously imported
samples: in this case reimport of all samples is not needed.
Clicking on the 'Special'
button in the TRAM 'Home' window will allow the user to
automatically perform
batch data import of large pools of samples.
Interpretation and Normalization
of the imported data
The user provides TRAM with an "Intensity value" for each spot, which
is
intended to be the pre-processed intensity value, i.e. the numerical
value assigned
to the spot as it has been processed by the software of the specific
experimental platform used (e.g. following background subtraction for a
microarray spot).
To allow comparison of gene expression data obtained by different
biological samples and/or by different experimental platform, TRAM is
able to perform some useful data
normalization methods.
The
normalization type may be changed by a pop-up Menu from the 'Values' or 'Samples'
data
tables.
Intra-sample (intra-array) normalization works within each distinct
sample data, while
inter-sample (inter-array) normalization is simultaneously applied to
the desired samples set.
You may select different combinations between these types of
normalization.
Please note that the normalization
process may require several hours for databases in which tens of arrays
were imported.
Clicking
on the 'Special'
button in the TRAM 'Home' window will allow the user to perform
automatically normalization changes of large pools of samples.
The normalization may be also set
starting an analysis, so that
normalization and analysis will be performed in chain without the
user's intervention.
Intra-sample
normalization
These
methods rescale values within each data set using a standard internal
reference for each sample.
None
No Intra-sample normalization is performed.
Mean [DEFAULT AFTER INSTALLATION]
Each
value is expressed as the percentage
of the corresponding sample mean value. This is equivalent to
the classic "global normalization" in the microarray data analysis.
Median
Each
value is expressed as the percentage
of the corresponding sample median value. This is equivalent to
the classic "global normalization" in the microarray data analysis.
Max
Each
value is expressed as the percentage
of the corresponding sample maximum value. This is equivalent to
the classic "scale normalization" in the microarray data analysis.
Inter-sample normalization
These
methods rescale values within each samples set.
None
No Inter-sample normalization is performed.
Quantile
For
the implementation in the database structure at the core of TRAM, each
intra-sample normalized value is given a rank following sample data
sorting in ascendant order, then the mean value for all the values with
the same rank across all samples is calculated. This mean value is
assigned as the expression value to each gene with the same rank in
each sample. An original variant of this method implemented in TRAM is
described below. (Bolstad et al., 2003).
Scaled_Q
(Scaled
Quantile) [DEFAULT AFTER
INSTALLATION]
Derived
from Quantile method, except than rank for each array is rescaled
according to the array with the maximum number of probes. This original
method allows to compensate when comparing array with highly different
number of probes, because in this way the highest values for arrays
with
low number of probes are given ranks comparable to those assigned to
arrays with high number of probes (see the article).
DATA SUMMARY
-
Values_A_B Layout
The summary of gene expression values, under
the current mode of
normalization, may be viewed in the 'Values_A_B'
layout.
This is an indexed database table summarizing all data points available
in the sample pool for each gene.
Along with the Mean value and the Standard Deviation (SD) value, the SD
value is also shown as a percentage of the expression value.
The 'Mean' value of the data points
available for each locus is considered the expression value for the
respective gene and it is used in the subsequent analysis.
The number of 'Data Points' from which the summary data are obtained is
also displayed.
The yellow button 'A/B (unmapped)'
brings to the layout 'Values_A_B_All',
which includes also unmapped loci, that are not listed in the
'Values_A_B' table used for the creation and analysis of the
transcriptome maps.
Clicking on the 'Export' button the
data
for the genes listed in 'Values_A_B' table may be exported as a
tabulated text file.
The file contains by default the following columns, from left to right:
01) Gene_name
02) Chromosome name
03) Chromosome Identifier (progressive number)
04) Gene mean expression value for pool 'A' samples.
05) Gene mean expression
value for pool 'B' samples.
06) Ratio between gene mean
expression value from pool 'A'
samples and from pool 'B' samples ('A'/'B'
ratio).
4 Analyzing data
(Back
to Index)
Different TRAM databases may be
obtained by duplicating
the
fresh "TRAM" folder and
starting na ew analysis sessions.
Please do not
change the name of any file and folder of the TRAM software.
You may download multiple copies
of TRAM and run them simultaneously, provided that each "TRAM" folder
is located in a different directory, so you may maintain the original
names of TRAM folder and files.
For the analysis
of a pool of expression data arrays, the expression value for each gene symbol
will be the mean expression value
among all its
corresponding identifiers available in that sample pool.
Basically, TRAM
software performs two types of analysis: creation of
transcriptome maps ('Map' mode), or search for cluster of over- or
under-expressed
neighbouring/contiguous genes ('Cluster' mode).
Clicking
on the 'Special'
button in the TRAM 'Home' window will allow the user to automatically
perform all
available analysis in sequence, after an
initial choice of the settings required for the analysis.
You may start the analysis clicking on the red 'Analysis'
buttons in the 'Home' layout ('Home'). You will be then asked to
insert the analysis settings of your choice.
The two settings common to
both types of analysis are:
Pool choice (A, B
or A vs. B to compare two series of samples between them using A/B
ratio);
Statistics calculation may be performed with respect to all genome segments (or genes) or
to the set of segments (or genes) located in the same chromosome.
This implies both descriptive statistics (calculation of percentile
thresholds to select over/under-expressed genes) and statistic
analysis (parameters for calculation of hypergeometric distribution in
order to determine significance of the identified
over/under-expressed segments or clusters).
4.1 Creating and analyzing transcriptome
maps
(Back
to Index)
Click on the 'Chromosomal Segments'
button in the TRAM 'Home' (TRAM main window).
The software will generate a
graphical map of the transcriptome showing a vertical line representing
each chromosome. An expression value is associated to each segment of
the line, whose size is determined by a window (in bp) set by the user.
This value is the mean for all available expression data related to
the genes included in each segment.
Information about "Location" is derived from Entrez Gene imported data,
and in the 'Map' mode is obtained for the first gene listed in each
chromosomal segment.
In the 'Map' mode, results are always
generated calculating both types of analysis (the one based on
all genes in the genome and the one based on the genes
located in the same chromosome the segment belongs to). You
are required to select one type of analysis ('genome' or 'chromosome')
in
order to be directed, at the end of process, to the results
layout you selected, but
the results for the other layout are also available.
This is because TRAM spends much of the time during 'Map' analysis in
creating chromosomal segments, so it is convenient to calculate both
statistics when segments are created.
SETTINGS
The available settings for this analysis are:
Window: defines the length for
a
segment.
If the coordinates of a gene span the window boundaries, the
gene is included in each window in which a part of it lies.
Each segment on the map shows only those genes having an available
expression value in the corresponding sample or pool of samples.
Sliding
window shift:
defines
the overlapping region between a segment and the next one.
A shift equal to zero results into non overlapped segments.
For example, if the window is 1.000.000 bp and the
shift equals 200.000 bp, the successive segments will be created with
coordinates:
1 - 1.000.000 bp
200.000 - 1.200.000 bp
400.000 - 1.400.000 bp, and so on.
This function could be useful to increase the sensitivity of the search
for over/under-expressed segments.
Percent (segment):
defines the threshold required to consider a segment as 'Over- (or Under-) expressed' (i.e. to
be marked in red or blue in the
expression bar).
The segment which shows mean expression
value (calculated as the mean of all known genes included in it)
within the highest (n) percent of Values or within the lowest (n) percent of Values, where n=Factor
(segment), will be highlighted (in red or blue colour, respectively),
thus displaying genomic regions globally over-
or under-expressed, respectively, with
respect to the desired threshold.
Percent (gene): defines the threshold
expression value to consider a gene as 'Over- (or Under-) expressed' (i.e. to
be marked in red or blue in the segment gene list).
The gene which shows mean
expression
value within the highest (n) percent of Values or within the lowest (n) percent of Values, where n=Percent
(gene), will be
highlighted, being listed in red (over-expressed) or blue (under-expressed) colour font,
respectively.
The number of over/under-expressed
genes in the segment is calculated with respect to the Percent (gene).
Using two different parameters for segment and genes allows the user to perform a
more refined analysis.
Number of
genes in the window:
defines the minimum
number of over/under-expressed genes required to mark the
segment with the tag 'Over' (or 'Under').
The Over-Expressed segment
listing a number of Over-Expressed genes
equal to or greater than the 'Number of genes in the window' will be
marked
as 'Over' in the 'Map' layouts.
The Under-Expressed segment
listing a number of Under-Expressed genes equal to or greater than the 'Number of genes
in window' will be marked as 'Under'
in the 'Map' layouts.
RESULTS
The Results of the analysis
are displayed within 30-90 minutes, depending on the number of array
analyzed. Changing the data normalization type during the analysis
requires additional time for the task to be completed.
The results of the
analysis are displayed in the 'Chromosomal Segments' layouts (i.e., 'Map' layouts).
Each chromosomal segment is actually a record of the database. You can
find and sort segments using desired criteria.
The 'P'
field displays the p-value
resulting from the hypergeometric distribution calculation for the
"Over/Under"-expressed segments. This is the statistical significance,
i.e. the probability that the result (presence of n
over/under-expressed genes within the same segment) could have been
obtained by chance.
Due to high number of segments in a genome, the 'P' value needs to be
corrected to avoid False Discovery Rate (FDR). The 'Q' field
displays the p-value
corrected for FDR.
'P' and 'Q' values are displayed only for the segments fulfilling
criteria to be tagged as over/under-expressed. If Q≤0.05,
the over/under-expression is considered to be statistically significant.
For details and references
about the statistical analysis, see the
article describing 'TRAM'.
The
user may also produce a graphical output showing the series of
chromosome
transcriptome maps aligned horizontally, and may choose to select
representation of specific chromosomes or set of chromosomes.
In additions, specific buttons helps retrieving online databases
entries for the desired genes.
In the "Map" layouts based on all gene values, segments that result to
be significantly over/under-expressed only in this type of analysis,
but not in the corresponding one based on pertinent chromosome values,
will be marked by a "G" and the intensity bar will be highlighted in
yellow. The button "Show only"->"Genome Specific" will retrieve only
these "G" segments.
In the "Map" layouts based
on chromosome-specific values, segments that result to be
significantly
over/under-expressed only in this type of analysis, but not in the
corresponding one based on all gene values, will be marked by a "C" and
the intensity bar will be
highlighted in yellow. The button "Show only"->"Chromos. Specific"
will retrieve only these "C" segments.
Clicking on the 'Export Results Data'
button allows to export the results as a tabulated text file that will
be saved in
the 'Results' folder present in the main 'TRAM' directory.
The file contains the following columns, from left to right:
01) Chromosome name
02) Chromosomal location
03) Segment Start genomic position
04) Segment End genomic position
05) Segment expression value
06) Label of segment Over/Under-expression ('Over', 'Under')
07) P value
08) Q value
09) List of genes (symbols) included in the segment
10) Number of Over-expressed genes in the segment
11) Number of
Under-expressed genes in the segment
12) Total number of genes in the segment
A second file with the label 'Set' in the file name is generated,
containing the summary of the analysis settings, which are also
displayed at the top in all TRAM results layout.
The user can also export results data in different formats (e.g.,
Excel) using the "Export Records..." command from the "File" Menu.
4.2 Searching for clusters of neighbouring
over/under-expressed genes
(Back
to Index)
In
the “Cluster” mode, the
software will search for sets of
contiguous/neighbouring genes all
expressed beyond a defined 'n' threshold, i.e. with expression
values higher than the (100 - 'n') percentile or lower than the 'n'
percentile.
In this mode, results are centered on individual
differentially expressed loci and they are complementary and more
sensitive compared to the “Map” mode of analysis, which requires
the definition of an arbitrary window length within which genes must be
comprised.
SETTINGS
Click on the 'Gene Clusters' button in the TRAM
'Home' (TRAM main window).
The available
settings for
this analysis are:
Percent (gene):
defines the thresholds required to consider a gene as 'Over- (or Under-) expressed' (i.e. to
be marked in red or
blue in the
expression bar).
The genes showing mean
expression
value within the highest (n) percent of Values or within the lowest (n) percent of Values, where n=Percent
(gene), will be
highlighted, being listed in red (over-expressed) or blue
(under-expressed) colour font,
respectively,
Over-Expressed gene
(marked as 'CLUST-O' in the
results layout)
Under-Expressed gene
(marked as 'CLUST-U' in the results layout)
Gap:
defines the maximum number of non
'Over' or 'Under'expressed genes allowed to be localized between two
'Over' or 'Under'expressed genes in a cluster.
Setting a gap equal to 1 means that two
over- (under-) expressed genes will be included in the 'Cluster' even
when they are separated along the chromosome by a gene not
fulfilling the conditions to be considered over/under-expressed. For
example, a cluster composed
by the over-expressed genes A, B, and C, could contains no more
than 2 non-over-expressed genes: one between genes A and B, and
the other
between genes B and C.
Genes with this feature will be
marked as 'GAP' in the results
layouts.
If Gap=0, only contiguous
genes will be considered to be in cluster.
Genes with no expression data in
the analyzed samples set will not be considered as 'GAP' and will be
instead marked 'EMPTY' in the
results layouts. They are visualized, but they are ignored by the
searching for cluster process.
Gene Type: the software will construct
a scheme of the linear succession of genes present in the table
'Genes', filled during the set up process.
The user can set TRAM to use any of the following, while constructing
the linear map of genes:
1) Official symbols only
(only genes with an Official Gene Symbol assigned); or
2)
Entrez symbols only
(genes as at
point 1) plus
genes with an Entrez Gene identifier assigned); or
3) All
symbols and UniGene clusters
(genes as at points 1) and 2) plus
sequences having a UniGene (EST) cluster identifier).
RESULTS
The Results of the 'Cluster'
analysis
are typically displayed within a few minutes.
Changing the data normalization type
during the analysis takes additional time for the task to be completed.
The
results of the analysis are displayed in the 'Cluster' layouts.
Each gene is actually a record
(row) of the database. You can find and sort genes using desired
criteria.
The
'P'
field displays the p-value
resulting from the hypergeometric distribution calculation for the
"Over/Under"-expressed Clusters. This is the statistical significance,
i.e. the probability that the result (presence of n
over/under-expressed clusters within the transcriptome) could have been
obtained by chance.
Due to the high number of genes
in a
genome, the 'P' value needs to be corrected to avoid False Discovery
Rate (FDR).
The 'Q'
field displays the p-value
corrected for FDR.
'P' and 'Q' values are displayed only for the clusters fulfilling
criteria to be tagged as over/under-expressed. If Q≤0.05,
the over/under-expression is considered to be statistically significant.
For details and references about
the statistical analysis, see the article describing 'TRAM'.
The number (#) of over/under-expressed genes in
the cluster,
the Length (in bp) of the
chromosomal region covered by the cluster, the number of
individual Data Points (e.g.,
array spots) from which summary data for each gene are obtained are
also displayed.
Specific buttons helps retrieving
online databases entries for the desired genes.
In the "Cluster" layouts based on all gene values, genes that result to
be significantly over/under-expressed only in this type of analysis,
but not in the corresponding one based on pertinent chromosome values,
will be marked by a "G" and the intensity bar will be highlighted in
yellow. The button "Show only"->"Genome Specific" will retrieve only
these "G" genes.
In the "Cluster" layouts
based on chromosome-specific
values,
segments that result to be significantly
over/under-expressed only in this type of analysis, but not in the
corresponding one based on all gene values, will be marked by a "C" and
the intensity bar will be
highlighted in yellow. The button "Show only"->"Chromos. Specific"
will retrieve only these "C" genes.
Clicking on the 'Export Results Data'
button,
will allow to export the results as a tabulated text file that will be
saved in
the 'Results' folder present in the main 'TRAM' directory.
The file contains the following
columns,
from left to right:
01) Cluster ID (a unique number
used as cluster identifier)
02) Type of Cluster
(CLUST-O:
over-, CLUST-U: under-expressed)
03) Count of
over/under-expressed
genes in the Cluster
04) Length (bp) of the region
covered by the cluster
05) Chromosome name
06) Chromosomal location
07) Gene symbol/name
08) Gene Start genomic
position
09) Gene End genomic position
10) Gene expression value (mean among all pool samples)
11) Label of gene
Over/Under-expression ('Over', 'Under')
12) Number of individual
Data Points processed for each gene
13) P value
14) Q value
15) Gene description
A second file with the label
'Set' in the file name is generated
containing the summary of the analysis settings, which are also
displayed at the top in all TRAM results layout.
The user can also export
results
data in different formats (e.g., Excel) using the "Export Records..."
command from the "File" Menu.
GENERAL
DEFINITIONS
(Back
to Index)
5.1 File
A set of database tables.
5.2 Table
A
set of records referring to
the
same subject type (e.g., the 'Genes' table).
5.3. Record
One set of
fields which represent one entry (i.e. containing all requested
data for a subject, e.g. a gene probe).
The record browser is a small
book icon at the top left of the window. You may also browse the
records faster using the cursor at the right of the small book icon.
5.4. Field
The database unit containing
a specific data type (e.g., 'Gene_name').
5.5. Layout
A
particular graphical
organization of the field of a table.
A table can be visualized into
more than one layout.
A layout may display fields from
a table or its related
fields from other tables.
A file may show data
within different layouts.
Visualization of a field is
independent from the storage of the contained data.
Browsing among the layouts can be
made by clicking on the 'Layout:' pop-up Menu at the upper
left corner.
You may browse the database by
clicking on the small book pages at the top left of the window, or
using the cursor at the
right of the small book icon, or by
entering a record number and
clicking on the "Return" key.
The
following information is constantly displayed in the window top bar (if not, select "Status
Toolbar" from the "View" Menu):
Records:
total number of Records in the table.
Found:
total number of the subset of Records currently selected. Clicking on the green circular button will
retrieve the complementary subset of currently omitted records.
Sorted:
sorting status of the Records (Sorted/Unsorted).
The FileMaker Pro-based
database
may be used basically in these "modes":
'Browse',
'Find', and 'Preview'.
Switching among different modes
can be done from the 'View'
Menu or from the pop-up
Menu bar at the
bottom left of the window.
5.6 Browse
Mode
One way to use the database.
It
allows entry, view,
browse, sort, and
manipulation of data.
It may be selected from:
the 'View' menu, or
the mode pop-up Menu bar, at the bottom left of the window.
In
the 'Browse' mode, the record sets can be browsed by clicking on the small book icon
(with the arrows to move 'back' and 'forward') in the upper left corner.
Browsing
among the tables can be done by clicking on the
'Layout' pop-up Menu at the upper left corner.
5.7 Find Mode
An
alternative mode to use the
database.
It allows searching for specific
content in the database fields, using any different combination of
criteria
(see the 'Search mode' section
below for more details).
It may be selected from:
the 'View' menu, or
the mode pop-up Menu bar, at the bottom left of the window.
The user can fill a blank
form allowing to search in
specific fields.
In the "Find" mode, the
small book icon in the upper left corner represents different "requests" that
are made for searching the database.
In FileMaker Pro 'Find' mode, the
"AND" - "OR" - "NOT" operators may be implemented in this way:
"AND" by filling criteria in different fields
located in the
same "Request",
"OR" by generating additional requests
(from
"Requests" Menu) in the same query,
"NOT" by generating additional
requests
(from
"Requests" Menu) and
clicking on the "Omit"
button (located in the window top bar).
The 'Operators' pop-up Menu appears
by clicking on a field while pressing the 'ctrl'
key, allowing query of:
exact
matches, duplicate values, ranges,
wild cards and more.
Click on the 'Perform Find'
button at the top of the window to start the query.
The result of the search is the
subset of the entries matching the set search criteria.
5.8
Preview
Mode
An
alternative way to use the
database.
It visualizes a print preview of
the found records.
It may be selected from:
the "View" menu,
or the pop-up Menu bar, at the bottom left of the window.
In the "Preview" mode, the user
can
obtain a print preview of the data in the current table.
Browsing among the tables can be
done by clicking on the
'Layout:' pop-up Menu at the upper left corner.
MENU AND
COMMANDS
(Back
to Index)
6.1 "TRAM" Menu
(Back
to Index)
About
FileMaker Pro Runtime...
Information about FileMaker Pro
Runtime at the core of the software.
Preferences...
Standard preferences panel;
cache memory size can be set up to 256 Mb.
Hide TRAM
Hiding all TRAM windows.
Quit TRAM
Closing the program.
6.2 'File' Menu
(Back
to Index)
File Options...
It is
possible to set only the "Spelling" options.
Change
Password...
There is no default password set.
Page setup...
Standard page set up command.
Print...
Standard print command.
The appearance will match the
layout currently
displayed on the screen.
Import Records
This is the general "Import"
function of FileMaker Pro.
Export
Records...
Export
command for the found
records set in a given table.
Records are exported in their
current sorting mode.
User can select fields to be
exported, their relative order,
and the separation character.
Save a Copy
as...
Save
a copy of the database,
complete, compressed or as
a clone (database structure with
no record present).
6.3 'Edit' Menu
(Back
to Index)
Undo
Standard "Undo" command.
Cut
Standard "Cut" text command.
Copy
Standard "Copy" text command.
Paste
Standard "Paste" text command.
Select all
Selection of all text present
within
a selected field
(to select a field, click into
the field).
Find/Replace
Utility for searching/replacing
text
strings within fields.
Note: Use 'Find' mode (from
'View' Menu)
for full search and selection of a record set.
Spelling
Utility for check spelling of
text strings within fields.
Export Field
Contents...
Utility to export the contents of
the selected field to a file.
6.4 'View' Menu
(Back
to Index)
Browse Mode
Switch to the 'Browse Mode' (see
"General Definitions" above).
Find Mode
Switch to the 'Find Mode' (see
"General Definitions" above).
Preview Mode
Switch to the 'Preview Mode' (see
"General Definitions" above).
Go to layout
A possible way to switch between
different layouts.
View as Form
A possible way to individually display the current record of a found set of records.
View as List
A possible way to display all the
records of a found set in the form of a list.
View as Table
A possible way to display all the
records of a found set in the form of a spreadsheet-like table.
Toolbars
To switch on/off the toolbars of
the application: "Standard"
and "Text Formatting".
Status Area
To switch on/off the "Status
Area", the toolbar located at the top of the program window.
Text Ruler
To switch on/off the text ruler
of the application.
Zoom in
Used to increase layout
dimensions.
Zoom out
Used to decrease layout
dimensions.
6.5 'Records' Menu
(Back
to Index)
New Record
Creating a new empty record in
the
database.
The new Record will be the latest
of the current record set.
Duplicate
Record
Duplicating the current record in
the database.
The new Record will be the latest
of the current record set.
Delete
Record...
Deleting the current record in
the
database.
Delete Found
Records...
Deleting all currently found
records in the database.
Go to Record
Moving to the selected record by
number, previous or next.
Show All
Records
Showing all the records in the
database.
Show Omitted
Only
Showing all the records in the
database
not included in the current 'found' set.
Omit Record
Removing the selected record out
of
the current found set,
without deleting it.
Omit
Multiple...
Removing more than a record,
selected by numbers, out
of the current found set,
without deleting them.
Modify Last
Find
Returning to the last performed
search in order to edit it.
Saved Finds
Saving a set of search criteria.
Sort Records...
Sorting the current records set
according to desired criteria.
Unsort
Display the current records set
according to the order of creation of each record.
Replace Field
Contents
Replace the value of a field into
all found set of record with the value specified in the current
record, or by calculation.
Relookup Field
Contents...
This command executes a relook up
of the value of a field by reading the matched value in a related table
(the relationship has been established during database development
using a 'key' field).
Revert
Record...
Restoring the value of a field,
discarding any change,
before clicking out of that field.
6.6 'Scripts' Menu
(Back
to Index)
About
This opens the 'About' window
containing information about the TRAM software.
Guide
The page with the user Guide of the
TRAM software
(this Guide).
6.7 'Help' Menu
(Back
to Index)
Search
Search a system 'Help' for the
general commands.
TROUBLESHOOTING
(Back
to Index)
Sometimes, power failure,
hardware problems, or other factors can damage a FileMaker Pro database file.
When the runtime application
discovers a damaged file, a dialog box appears, prompting the user to contact the
creator.
Even if the dialog box does not
appear, files can exhibit erratic behaviour.
If you have FileMaker Pro or FileMaker Pro Advanced installed you can recover it using the 'Recover' command.
Otherwise, to recover a damaged file:
- On Mac OS X machines,
press Command + Option (cmd-alt) while double-clicking the runtime application icon. Hold the keys down until you see
the 'Open Damaged File' dialog box.
- On Windows machines,
press
Ctrl+Shift while double-clicking the runtime application icon. Hold the
keys down until you see the Open Damaged File dialog box.
During the recovery process, the
runtime application:
1. Creates a new file;
2.
Renames any damaged file by
adding “Old” to the end of the
file name;
3. Gives the repaired file the
original name.
TECHNICAL
NOTES
(Back
to Index)
The software minimum requirements
are:
Mac OS X 10.4.11 for PowerPC G4, G5 or Intel processors;
Windows XP Professional, Home Edition (Service Pack 3);
Windows Vista Ultimate, Business, Home (Service Pack 1);
Windows 7.
Other specifications may be found here.
The scripts at the core of
TRAM
software are "FileMaker Pro" scripts.
TRAM is composed of a 137 MB database engine ('TRAM') and of a
template ('TRAM.TMA') with 37 data tables, with 117 relationships among
them and 434 script definitions.
Following set up including NCBI UniGene and UCSC EST localization data,
the size becomes 3.6, 2.0 GB and 742 MB for human, mouse and zebrafish
'TRAM.TMA' file, respectively.
Importing the 28 human microarray sample data file for the test of
biological model raised 'TRAM.TMA' file size to 4.3 GB.
Time required to import and process a typical microarray data file is
about 10 minutes.
Typical execution time is 1-2 hours for a 'Map' analysis and 5-10 minutes for a 'Cluster' analysis,
depending on the number of analyzed samples, which also heavily affects
the time required to refresh data when the type of data normalization
is
changed.
Large file size and relative slowness of data processing are mainly
due to systematic indexing of all data contained in TRAM, with the
advantage of very fast data browsing, navigation and search at the end
of data import and processing, which may be run in batch mode.
We
encourage any creative use, modification and noncommercial
redistribution of TRAM, as long as the original paper is cited, and
statement that the original program has been modified is provided (in
such a case).
7.1 Software known limits
(Back
to Index)
Due to FileMaker Pro
limits:
maximum TRAM file size is 8
terabytes (1024 gigabytes);
text field can contain up to 2 GB
of characters;
numbers field can contains values
up to 800 digits.
Due to TRAM limits:
in order to generate consistent transcriptome maps, TRAM currently
deletes all genes with ambiguous mapping, but this involves data loss
for few genes that are biologically present in different locations,
i.e. genes common to X and Y
chromosomes
(e.g., CSF2RA). We are
working to fix this problem.
The limit of 25 chromosomes for a genome is declared only for the
possibility to display synthetic maps with all chromosomes shown
horizontally aligned; however, it does not apply to the data
import, standard visualization mode and all data analysis.
7.2 Bugs report
(Back
to Index)
Please report any suggestion, bugs or
problems to:
Pierluigi Strippoli
pierluigi.strippoli@unibo.it
Luca Lenzi
l.lenzi@unibo.it
ACKNOWLEDGEMENTS
(Back
to Index)
Thanks to NCBI for the "Entrez"
databases and to UCSC Genome
Bioinformatics for the "UCSC
Genome Browser".
Thanks to FMPexperts
List and FMForum
for suggestion and tips about
FileMaker Pro.