TRAM generates and analyzes
transcriptome maps. It is able to import and integrate any gene
expression data source in tabulated text format and to map
expression values to the relevant genomic region, providing
statistical analysis of over- or under-expressed regions
compared to the whole genome or to the relative chromosome.
This guide is designed for detailed documentation
of TRAM 1.3
software.
It shows how to install the software and how to
import
expression data to create and analyze transcriptome maps.
The minimum software
requirements are: Mac OS X 10.6, OS X Lion
10.7, OS X Mountain Lion 10.8;
Windows XP Professional, Home Edition (Service Pack 3); Windows Vista Ultimate,
Business, Home Premium (Service Pack 2);
Windows 7 Ultimate, Professional, Home Premium;
Windows 8 Standard and Pro edition.
A connection to the Internet
is required to display the software Guide and to download
data for set up, but not to run the tool.
If you are working
on human expression gene values, download the file:
TRAM_1.3_HUMAN_2017.zip
(Macintosh) TRAM_1.3_HUMAN_2017_Win.zip
(Windows)
For all other cases download the file:
TRAM_1.3.zip
(Macintosh) TRAM_1.3_Win.zip
(Windows)
The downloaded file should be
automatically decompressed, generating a "TRAM" folder. Failing this, double click
on the file to activate the default decompression utility of
your system.
The TRAM Folder contains: "TRAM" (Macintosh) or
"TRAM.exe" (Windows) file
(the runtime application); "TRAM.TMA" (database
file); "Batch_Import_A" folder; "Batch_Import_B" folder;
"Platform" folder; "Results" folder; "FMP Acknowledgments.pdf" file; "Extensions" folder, containing a
"Dictionaries" folder, with the
dictionary file for supported languages; (and
an
"English" folder with 3 files, for Windows); 40 ".dll" files (for Windows).
TRAM 1.3 is based on
FileMaker Pro 12 (FileMaker Pro, Inc.) database management software
(http://www.filemaker.com/), and it is released as a
FileMaker Pro 12 template, along with a runtime application able to run "FileMaker Pro" at the core of the software. The runtime is freely
distributed in compliance with the license of "FileMaker Pro
12 Advanced" developer package that was used to create the
program.
Standard database commands (Find, Sort, Export
records) are available within each layout of TRAM(see
"GENERAL DEFINITIONS" and "MENU AND COMMANDS" sections in this
Guide).
Once decompressed, TRAM is ready to be used. Macintosh:
open the "TRAM" application ("TRAM" or "TRAM.app" file)
contained in the "TRAM" folder. Windows: open the
"TRAM" application ("TRAM" Runtime file) contained in the "TRAM"
folder.
The minimum software requirements are: Mac OS X 10.6, OS X Lion
10.7, OS X Mountain Lion 10.8;
Windows XP Professional, Home Edition (Service Pack 3); Windows Vista Ultimate,
Business, Home Premium (Service Pack 2);
Windows 7 Ultimate, Professional, Home Premium;
Windows 8 Standard and Pro edition. A connection to the Internet is
required to display the software Guide and to download data
for set up, but not to run the tool.
Please do
not change the name of all files and folders of the TRAM
software.
You may download multiple
copies of TRAM and run them simultaneously, provided that each
"TRAM" folder is located in a different directory. Do not move the "TRAM" folder
while the software is open.
Run the "TRAM" software from a local hard disk.
Do not run the software from a network drive.
If a TRAM analysis
aborts unexpectedly, it is advisable to restart it in a
fresh TRAM copy.
Simply use buttons to navigate in the different sections of the
software. The "Back" button brings user to the last visited
layout (and not to all previously visited layouts). The "Home"
button brings user to the main software screen, from which any
layout may be reached.
The TRAM file in the TRAM_HUMAN (or TRAM_MOUSE,
TRAM_BRARE if available) species-specific versions is pre-loaded
with the latest human (or mouse, or zebrafish, respectively)
data for genes, chromosomes, UniGene cluster IDs and all related
GenBank Accession Numbers, Expressed Sequence Tags (ESTs).
In addition, the gene identifiers for common commercially
available array Platforms as deposited in Gene Expression
Omnibus (GEO) are also available in these pre-setup versions
(see section 2.2 for details). Please, if you use any other gene
identifier type read the "Set up" chapter, section 2.2.
The number of the current National Center for Biotechnology
Information (NCBI) Genome Build may be obtained from the site:
http://www.ncbi.nlm.nih.gov/mapview/
by clicking on the organism of interest.
The corresponding genome assembly version used by University of
California, Santa Cruz (UCSC) Genome Browser to produce EST
localization data may be chosen from the "assembly" menu in the
"Table Browser" web page: http://genome.ucsc.edu/cgi-bin/hgTables?
TRAM_1.3_HUMAN_2017 is the TRAM 1.3 version that
is provided already pre-loaded, following a complete
Set Up process, with 2017 data for H.
sapiens. It replaces any
previous version of TRAM_HUMAN.
It includes a pre-loaded "gene_aliases.txt"
file. The 38 gene
aliases were manually curated. The 22,454
clone names were extracted by the "Clone
Names" section of each "NCBI Gene" record using an
awk script followed by a FileMaker Pro script (the
parsing procedure is described in the 1.2, c)
section of this Guide). The 230,702
GenBank Accessions related to a
Gene Symbol were extracted by the
"Related sequences" section of each
"NCBI Gene" record using an awk script
followed by a FileMaker Pro script
(the parsing procedure is described in
the 1.2, c) section of this Guide). The
185,647
RefSeq
Accessions
related to a
Gene Symbol
were extracted
by the "NCBI
Reference
Sequences
(RefSeq)"
section of
each "NCBI
Gene" record
using an awk
script
followed by a
FileMaker Pro
script (the
parsing
procedure is
described in
the 1.2, c)
section of
this Guide).
Data for
TRAM_HUMAN
2017 are
derived from:
NCBI NCBI UniGeneUCSC
EST GeneGenome Clusters
Localization BuildRelease
(NCBI Build)
A set up process is
required every time your experimental model organism is
different from human, for which pre-setup versions may
be provided. The set up process is described in the
following section.
- GEO_GSM_Download,
a useful tool
to automatically download
data matching a list of NCBI Gene Expression Omnibus
(GEO) samples (GSM) from the GEO database; - GEO_GPL_Download,
a useful tool
to automatically download
data matching a list of GEO platforms (GPL) from
the GEO database;
- a "Protocol" with practical advice useful
to run a meta-analysis by TRAM.
While TRAM_HUMAN.zip file contains a pre-setup
version ready to analyze expression data from human organism,
you may also download an empty TRAM template that may be
prepared for the analysis of data from any organism.
Pre-setup versions
may be directly used to import and analyze expression data
without performing the "Set up" process. However, the user could
need to perform the "Set up" section 2.2 to load additional
Platform schemes if necessary to interpret the gene identifiers
listed in his expression data file (see below).
The empty TRAM template must instead always be prepared by
performing the "Set up" process from the beginning.
Download the "TRAM.zip" file from: http://apollo11.isto.unibo.it/software/TRAM/ Following decompression of
the "TRAM_1.3.zip" (or TRAM_1.3_Win.zip) file, open the "TRAM" file contained
in the "TRAM" folder.
In the "Main" window click "Set Up", this will
change to the "Set Up" layout which contains the first main
choices.
Set up
is composed of two main parts. 1) Organism-specific Genomic Data Guided feeding of the
software with data about chromosomes and genes of the genome of
your interest. 2)
Gene
Identifiers conversion tables Guided feeding of the
software with conversion tables; this allows the conversion of
each gene identifier used in the expression data file to the
corresponding gene name.
Note:
TRAM, as all FileMaker-like databases, automatically
saves any changes, so you will not find any "save"
options at the end of the import processes. After
the import processes, any manual data change will cause the
loss of the originally imported data.
Set Up - Step
definition
Note: if you re-execute a
step previously executed, you need to re-execute all
subsequent steps maintaining the execution order below.
Analyzes expression data labeled by any Platform gene
identifiers
The expression data cannot be assigned to the corresponding
genes via Platform identifiers
At the end of Set Up, the user may proceed with
expression data file import.
1 Importing data
about chromosomes and genes of your organism (Back
to Index)
TRAM software is designed to create a chromosome set and to assign
the gene expression
values to the right position within each of them.
The software is optimized to parse "NCBI Gene"
data to obtain the necessary localization information, due to
both the short update period and the gene position accuracy of
this database. You may use other sources of data, provided that
they are in the format described below (column number and order,
file name) to ensure a correct TRAM functioning.
TRAM cannot analyse non-chromosomal elements (such as plasmids),
while it is able to map mitochondrial chromosome genes
since version 1.2, although mitochondrial genes have not
been considered in TRAM_HUMAN pre-loaded versions.
1.1 Importing data
about the chromosome number and length (bp) of a selected
organism
Note: the maximum number of
chromosomes accepted by the software is 25 (including autosomal and
sexual), only for the purpose of "horizontal" viewing, and
unlimited for all other purposes. When different types of deposited
sequences (e.g., with NC_ reference or AC_ or NT_ code
type) are available for the studied organism, NC_ (RefSeq)
sequence should be chosen as default for each chromosome, AC_ sequence should be chosen,
if available, in absence of an NC_ sequence and finally, NT_
sequence should be selected as a last choice.
The following instructions
are also available as a guided procedure within the software in
the "Set Up" area. From the TRAM Home, click on the "Set Up" button - then on "Genomic" - then on "Chromosomes".
a) Prepare the table containing data for each
chromosome
For example, you may obtain
from "NCBI Genome" the data for each chromosome. Click on the "Open "NCBI Genome" web site" button.
If you did not already set the "Organism" field into the
"Setting Segment" or "Setting Cluster" window, the software will
ask you for the searched "Organism", please insert the name only as Latin name
(e.g. Human = Homo sapiens).
If appropriate, use complete species/strain name given in
square brackets by "Entrez Gene" online database (e.g., Saccharomyces cerevisiae S288c).
In the "NCBI Genome"
organism-specific displayed page, locate the "Representative"
genome information, then click on individual
(RefSeq) chromosome entries at the bottom of the page.
Write the resulting data in a standard tabulated text
file (.txt), separating
each column by a "tab",
in this format [without Column Headers; use the "Name"
reported in the Reference genome as the identifier for each
chromosome]: [Chromosome]
[Length]
[Organism] [RefSeq/Genbank#]
1
248,956,422 Homo sapiens NC_000001 2
242,193,529 Homo
sapiens NC_000002 ... Move the obtained file, named genome.txt, into the
"TRAM" folder.
b) Import the file Click on the "Import genome.txt" button, to
automatically import and parse the obtained chromosome data.
At the end, data fields in the table "Chromosomes" will appear as
follows:
[Chromosome] [Length]
[Organism] [Chr_ID]
chr1
248,956,422 Homo sapiens 1
...
where Chr_ID is a unique progressive number assigned by TRAM
to each chromosome.
Following chromosome data import it is useful to
check the "Chromosome" Table (click on the "Chromosomes" link in
TRAM page from which you have launched the import of chromosome
data, or on the "Chr." button in the TRAM Home).
You may manually edit the chromosome records if necessary, using
the "Record" Menu and typing into the appropriate fields.
Note: for organisms with only one chromosome (e.g.,
prokaryotes) insert manually the chromosome data, as
follows: from TRAM Home, click on the "Chr." button,
then on the appearing "Chromosome" layout create a new
record by selecting "New record"
from the "Record" menu
and insert these data manually in the corresponding field:
Chromosome
Chromosome (exactly this word: chromosome). Length
Chromosomal
length in bp (it can be derived from the corresponding GenBank entry; e.g., 5,498,450
for NC_002695). Organism
Organism Latin name (e.g., Escherichia coli).
If appropriate, use
complete species/strain name given in square brackets
by "Entrez Gene" online database (e.g., Escherichia coli O157:H7 str.
Sakai).
Chr_ID
1 (exactly this digit: 1). GenBank #
GenBank Accession Number (it can be derived from the"Entrez Gene" entries relative to the investigated organism, e.g., NC_002695).
Use complete species/strain
name given in square brackets by "Entrez Gene" online
database (e.g., Escherichia coli O157:H7 str. Sakai).
For organisms with only
one chromosome do not use "Special" functions to perform
TRAM set up.
The following
instructions are also available as a guided procedure within
the software in the "Set Up" area. From the TRAM Home, click on
the "Set Up"
button - then on "Genomic"
- then on "Genes".
a) Download the data for each gene from "NCBI Gene" Note: All previously
imported data will be deleted. Note: if the software asks you for the name of the
"Organism", please
insert this only as Latin name (e.g. Human = Homo sapiens). If appropriate, use complete species/strain name given
in square brackets by "Entrez Gene" online database (e.g.,
Saccharomyces cerevisiae S288c).
Click on the "Open "NCBI
Gene" web site" button.
The "Entrez Gene" web page will be opened showing all gene data
needed for the specified organism.
In the "NCBI Gene" displayed page (in the web browser),
check that "Current" is selected as "Status" on
the left column.
Save the resulting data as
follows:
click on the "Send to"
link on the right,
choose "File"
(and select "Tabular (text)" format, default order)
then click on the "Create file"
button
(do not change the suggested file name). Move the obtained "gene_result.txt" into the
"TRAM" folder.
b) Import the file Click on the "Import the "gene_result.txt" file"
button
to automatically import and parse the downloaded gene data.
IMPORTANT - Do not import the
same text file more than once into TRAM database; download or
decompress the file again if you need to repeat the import
twice.
Gene entries without genomic
coordinates, or with the word "Pseudogene" in the
"Description" field (except when
in the context "readthrough transcribed pseudogenes" or
"gene/pseudogene") will be deleted.
At the end, data fields in the table for an RNA transcript will
appear as follows:
You may check and freely edit the data in the TRAM
table "Genes".
c) Optional - Recommended
- Resolve Clone names and RNA Accessions
"NCBI Gene" entries include information about Clone
identifiers as well as GenBank Accession Numbers for RNA
sequences that are related to a specific locus.
Including this information in TRAM is very useful to resolve
Gene Identifiers using this type of information. This section
must be executed before step III (Section 2.1).
1. Download the data for each gene from "NCBI Gene" in asn.1
format Note: if the software asks you
for the name of the "Organism", please insert this only as
Latin name (e.g. Human = Homo sapiens). If appropriate, use complete species/strain
name given in square brackets by "Entrez Gene" online
database (e.g., Saccharomyces
cerevisiae S288c).
Click on the "Open
"NCBI Gene" web site" button.
The "Entrez Gene" web page will be opened showing all gene
data needed for the specified organism.
In the "Entrez Gene" displayed page (in the web browser),
click on the "Current Only" link on the right.
Save the resulting data
as follows:
click on the "Send to"
link on the right,
choose "File"
(and select "ASN.1" format)
then click on the "Create
file" button
(do not change the suggested file name). Rename the obtained
"gene_result.txt"
file as "gene_result.asn1.txt".
Move the obtained "gene_result.asn1.txt"
file into the "TRAM" folder.
2. Process the "gene_result.asn1.txt" file using the awk command available in
UNIX systems:
- in UNIX, use shell;
- in Mac OS X, use the "Terminal" application;
- in Windows, use a UNIX emulator.
Seek advice from a UNIX user if needed. There is
currently no alternative way to effectively process
asn.1 files which include the complete information
about Clone names and GenBank Accession Numbers.
Change directory in the UNIX system to reach the
directory in which the "gene_result.asn1.txt" file is located. Copy the text
of the following command (copy exactly, including
spaces), then press "Enter":
NOTE: in some lower organisms,
RefSeq gene entries have "NC_" prefix. If you see in the
"NCBI Reference Sequences" section of the "NCBI Gene"
entries for your organism that RefSeq entries have "NC_"
codes, please use this alternative command:
In any case, the resulting "gene.result.Clones.RNAs.txt"
file will be into the "TRAM" folder.
3. Import the files
Click
on the "Import the "gene.result.Clones.RNAs.txt
file" button to automatically import and parse the
downloaded and awk-processed Gene data.
IMPORTANT
- Do not import the same text file more than once into TRAM
database; download or decompress the file again if you need to
repeat the import twice. You may check and
freely edit the data in the TRAM table "Gene Aliases", listing
all Clone Names and/or GenBank Accession Numbers related to a
specific locus according to "NCBI Gene" entries.
If
the same Gene Alias (Gene_Alias) is eventually assigned to
different Gene Symbols (Gene_Symbol), the field
"Discrepancy Alias vs Symbols"
will display "Yes" in all Records with that Gene Alias.
1.3 Importing
localization data for EST Clusters,if these data are
available in "UCSC
Genome Browser"
Note: this step is necessary if
you wish to analyze the expression data not only for known genes
but also for genes so far identified only as UniGene Cluster
(cluster of ESTs, Expression Sequence Tags). The
genomic coordinates for UniGene Cluster are available for
several organisms in the "UCSC Genome Browser" (University of
California, Santa Cruz).
Assembly (build) version for the investigated genome in
UCSC
and NCBI
must be the same, in order to use the same reference
genome coordinates and successfully integrate localization data
from known genes and from ESTs.
The number of the
current NCBI Genome
Build may be obtained from the site: https://www.ncbi.nlm.nih.gov/genome/gdv/
by clicking on the organism of interest.
The corresponding genome assembly version used by UCSC Genome Browser to
produce EST localization data may be chosen from the "assembly"
menu in the "Table Browser" web page: http://genome.ucsc.edu/cgi-bin/hgTables?
Note: All previously
imported EST Clusters data will be deleted. Note: this
step must be performed after
the previous "Set up Genes" process (section 1.2) and the
UniGene identifiers conversion table import (section 2.1).
a) Download the EST localization
data from UCSC
"Genome Browser"
The following instructions
are also available as a guided procedure within the software
in the "Set Up" area. From the TRAM Home, click on the
"Set Up" button -
then on "Genomic" -
then on "EST Clusters".
Then in the web browser pageselect: clade: your investigated
clade (e.g.,
Mammal) genome: your investigated genome (e.g., Human) group: "mRNA and EST Tracks" track: "ESTs" (if available, otherwise
current set up
is
not possible)
table: "all_est"
region: "genome"
output format:
"selected
fields from primary
and
related tables" output file:
EST.txt file type returned:gzip
compressed
Click on the "get output" button and select the following fields in the appearing table: qName tName tStart tEnd
Click on the "get output" button at the
bottom of the page.
Once the download of the file "EST.txt.gz" is complete,
decompress it and put the resulting "EST.txt" file into the "TRAM" folder.
b) Import the file Click on the "Import "EST.txt" file"
button, to automatically import and
parse the obtained UniGene clusters location data file.
At the end, data
fields in the table for an RNA transcript will appear as
follows:
You may check the processed data in the TRAM
table "EST_Clusters" (from TRAM Home, click on the "ESTs"
button, then on the "EST_Clusters" orange button.
EST entries are parsed via their relationship with "UniGene_ID"
table: ESTs belonging to UniGene Clusters are imported in the
"EST_Clusters" table, where localization for each cluster is
calculated between the minimum start coordinate and the maximum
end coordinate available for each EST cluster. To omit incongruent
results, the parsing process will subsequently import in the
"Genes" table only the unambiguously mapped UniGene clusters. To
this aim, entries with a chromosome name not equal to one in the
chromosome names in the "Chromosomes" table will not be
considered, as well as those with ESTs mapping on very distant
positions on the same chromosome. To this aim, we set a
conservative limit to 250,000 bp in TRAM, considering that in Entrez
Gene the set was of 28,355 human genes (the largest known
genes), the mean size was 43,698 and the standard deviation
102,616, so this is equivalent to consider a size range within
mean plus or minus 2 SD (approximately 95% of values in a
Gaussian distribution). This correction effectively removes
approximately 3,000 transcripts erroneously mapped to regions of
several Mb or tens of Mb. The user retains the possibility to
inspect the list of EST clusters with a genomic extension
>250 kb that are present in a given chromosome segment, even
if they are not considered in the creation of the transcriptome
map. For this purpose, click "Go" under the title "Genes Table"
in the "Map" result layouts, then click "EST Clusters - Go".
2 Importing gene
identifiers conversion data tables (Back to Index)
TRAM software is designed to collect expression data files
where genes are identified via specific symbols.
Default Gene Identifier used by TRAM is the Gene Symbol
(Official or not) found in the "NCBI Gene"
database
(or, in its absence, the "Gene" abbreviation in the entry
header), e.g.:
If you have a
list of symbols of this type, with the corresponding
expression values, you can directly go to "Home" and start
to import expression data.
"Gene Name" in TRAM is
the best name available for a gene (represented by, in
decreasing order: Official Gene Symbol, or the symbol in the
"Gene" entry header, or the UniGene Cluster ID).
If the expression data are labeled with gene
identifiers/symbols different from Official Gene Symbols or
from the names in
the "Gene" entry header, TRAM tries to convert any user-provided gene
identifier into a Gene
Symbol/Gene name. For this purpose, the user has to
import the two-column conversion tables listing a gene identifier and the
corresponding Gene Symbol.
It is possible to import more than one Identifier Conversion
Table. TRAM has an original, powerful system to integrate
multiple alternative conversions of gene identifiers. The following instructions are also
available as a guided procedure within the software in the
"Set Up" area. From the TRAM Home, click on
the "Set Up"
button - then on "Gene ID". NOTE:
TRAM will try to convert the Gene identifiers present in
the user expression data files to Gene Symbols/Gene names,
following this priority order until a positive match is
found:
1)if you set up the "Custom" table as described
in section 2.3
of the chapter "Set up", the "Custom" table will be first searched to
match Gene identifiers in your data to the corresponding
Gene Symbols/Gene names, overriding all other conversions;
2)
if no match has been found, then the "Genes" table (mandatorily set upas
described in section 1.2 of the chapter "Set up") will be searched to
directly interpret Gene identifiers in your data as Gene
Symbols/Gene names;
3) if you writea Platform ID code (e.g., GPL... for a GEO
Platform) in (at least) the first line of the third column
of your data (formatted as described in section 3 of this
Guide), a corresponding list of gene Identifiers (often a
series of progressive numbers) is expected in your
data and each will be converted in the corresponding
Gene Symbol/Gene name, if
you previously set up the table for the relative platform as described in section 2.2 of the
chapter "Set up";
for example:
1007_s_at6.38
GPL96 1053_at
6.65
117_at 6.48 ...
... [Note - If the
first expression value is not in the first row due to
the presence of some header lines, please use the very
first row
anyway in your file to indicate the Platform code,
making sure that you are writing it in the third column. If
you have only one column in the first row, please
press the tabulator key twice then write the Platform
code.
Do not insert blank spaces or other characters at the
end of the text in a column].
4)if
you
write the word GeneID in the first line of the third
column of your data(formatted as described in
the section 3 of this Guide), an "Entrez Gene" Identifier (a
progressive number) is expected in your data and it will be
converted in the corresponding Gene Symbol/Gene name
searching in the "Genes"
Table (this has been
mandatorily setup as
described in section 1.2
of the chapter "Set
up"); for example:
780 6.38
GeneID 5982
6.65
3310 6.48 ...
... [Note - If the
first expression value is not in the first row due to
the presence of some header lines, please use the very
first row
anyway in your file to indicate the "GeneID" option,
making sure that you are writing it in the third column. If
you have only one column in the first row, please
press the tabulator key twice then write the "GeneID"
word.
Do not insert blank spaces or other characters at the
end of the text in a column].
5)if no
match has still been found, the "Unigene" Table will be searched to directly
interpret Gene identifiers in your data as GenBank
Accession Numbers (if you set up this table as
described in section 2.1
of the chapter "Set
up");
6)if no
match has still been found, the "Unigene" Table will then be searched to
directly interpret Gene identifiers in your data as UniGene Cluster
identifiers (if you set up this table as
described in section 2.1
of the chapter "Set
up").
When a match is found, this will prevent the software from
searching for a symbol in the next tables.
We suggest to use recently released data for each table to
be imported in the TRAM software.
2.1
Conversion of Sequence accession numbers to Gene Symbols
(Back to Index) If you have
labeled your expression data values by sequence
identifiers, you will have to generate and import
the complete UniGene
identifiers data table for your organism, which will match any
GenBank Accession Number for a transcript (RNA, EST)
to the known Gene Symbol, when available or, as a second
choice, to the corresponding UniGene Cluster ID, if
existing.
Note: this process has
been already performed (update: Dec. 2017) for the Homo
sapiens provided pre-setup versions of TRAM.
Note: In order to keep the data updated,all
data previously imported in this table will be deleted
during a new import. This step must be done before the import of
EST localization data (section 1.3) and/or Platform (section
2.2) data, if one of these import processes is performed. a)
Prepare a table containing four columns, separated
by a tabulator, relating eachGenBank Accession Number to the
respective UniGene Cluster ID, Gene Symbol (when
available) and GenBankIdentifier (GI) (if
desired), e.g.:
[Column Headers are not required]
[GenBank
[UniGene
[Gene
Symbol] [GenBank GI Accession]
Cluster
ID]
Identifier]
AF117710
Hs.523443 HBB
4378803
To do this, we propose
to import the default output file of "UniGene
Tabulator" (version 1.1 or later) software,
a tool able to parse the whole UniGene database for an
organism.
The following instructions are also available
as a guided procedure within the software in the "Set
Up" area. From the TRAM Home, click
on "Set Up"
button - then on "Gene ID"
- then on "Sequence IDs". Click on the
"Open "UniGene
Tabulator" site" button: http://apollo11.isto.unibo.it/software/UniGene_Tabulator/ your default
internet browser will show the software download page. Downloadthe current
version of the software for your OS. Please, follow the
instructions in the UniGene Tabulator User
Tutorial to automatically parseUniGene data
for the organism of your interest. Please, note that for
TRAM purpose, it is not necessary to import the UniGene
library data file into UniGene Tabulator.
At the end of the process, the file "UniGene.tab" will be
automatically created into the "UniGene Tabulator"
folder. A message will alert the user about the
availability of the "UniGene.tab" file at the end of the
process. This file contains a useful code conversion
among: GenBank
Accession Number, UniGene
cluster ID andGene
Symbol.
The
parsing process could employ several hours to complete,
depending on the amount of data available for
the selected organism.
At
the end of the process the file "UniGene.tab"
will appear in your desktop.
b)
How to import the UniGene tabulated data file in TRAM
Move the "UniGene.tab"
file into the TRAM folder.
Click on the "Import the
"UniGene.tab" file" button
to import the data into the respective "UniGene_ID" database
table.
Note: All
previously imported data will be deleted.
This step is necessary to use either GenBank Accession Numbers
or UniGene Cluster IDs
as gene identifiers.
The GenBank
Accession Number must lack the version of the sequence,
which if present is separated by a full stop mark from the
main number (i.e. do not use AK125137.1).
NOTE:
the field "Is_in_NCBI Gene" is filled with the
"Gene Symbol" itself if the "Gene_Symbol" for that
UniGene Cluster is found in the "NCBI Gene" database. By
default this information is not displayed.
You can display it by first simply clicking into the
"Gene_Symbol" field of anyone of the records of the TRAM
"UniGene" table, then executing these two commands in
succession from the "Records" Menu:
- Show All Records
- Relookup Field Contents...
The execution of the
Relookup could take some time.
The complete list of the "Gene Symbols" imported from the
"NCBI Gene" database during the "Set Up - Genes" process may
be accessed by going to the TRAM "Genes" table (e.g.,
clicking on the "Genes" button in the TRAM Home),
then clicking on the "All Symbols" button.
IMPORTANT: If you
perform this step afterimporting the gene
identifiers for a Platform (section 2.2), you have to run
the import and analysis of Platform data as well as of
sample expression data again, because the conversion of the
identifiers to the matching gene symbols may have been
changed.
Quality control. All
imported records should have a value in the fields
UniGene_ID (UniGene cluster identifier) and GenBank_AN
(GenBank Accession Number). At the end of the import of
UniGene.tab file into TRAM, you may search for records with
empty "UniGene_ID" or "GenBank_AN" field [to do this, go to
the "UniGene" table of TRAM, press "Find" on the window top
bar and then type "=" (without quotes) in the "UniGene_ID"
or "GenBank_AN" field]. If you find one or more records
without a UniGene_ID or a GenBank_AN, you may manually fill
in the missing values, after obtaining them by searching for
the GenBank Accession Number with an empty UniGene_ID (or
for the UniGene_ID with an empty GenBank Accession Number,
respectively) at the address:
http://www.ncbi.nlm.nih.gov/sites/entrez?db=unigene
For expression
data values related to personal "custom" gene identifiers,
with the correspondence between gene/probe identifiers and
gene symbols established by the user, the user has to import
the Custom identifiers data table(s).
The following instructions are also available as a
guided procedure within the software in the "Set Up" area. From the TRAM Home, click on
the "Set Up"
button - then on "Gene ID" -
then on "Custom IDs". a)
Prepare a table containing 2 columns, separated by a tab,
for
each of your custom identifier, e.g.:
Click on the "Import the
custom file" button to import the custom table into
the "Custom_ID" TRAM database table.
It is possible to subsequently import additional custom
tables for conversion of other identifiers. The conversion
specified in the "Custom_ID" TRAM database table will override any other
conversion.
NOTE: the "Custom" table will be used
with maximum priority to resolve both gene identifiers
listed in expression data files (section 3) and gene
identifiers listed in Platform data files (section 2.3).
2.3
Importing gene probe identifiers for a Platform
This step is
necessary to use gene probe IDs as gene identifiers for a
particular array Platform registered in the GEO (Gene
Expression Omnibus) online database or otherwise
available.
In order to relate the expression data values to Platform
identifiers, the corresponding identifiers data table(s)
must be imported.
The following instructions are also available as
a guided procedure within the software in the "Set Up"
area. From the TRAM Home, click on
the "Set Up"
button - then on "Gene ID"
- then on "Platform IDs". a) Alternative option: if
the expression data you are going to analyze are derived
from the GEO database, locate
the GEO Platform data for the platforms of your
interest by searching for a Platform (e.g., GLP96) in the
"accession" field of the web page "Accession Display": https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi
On the bottom of the resulting Platform description Web
page,
click on the "Download
full table..." button
and save the file,
or click on the "View
full table" button
and save the resulting Web page as a text file.
If neither of these two options is available, please click
on the link: SOFT
formatted family file(s) to download the Platform description file in
format .soft.
Manually change the file extension ".soft" into
".txt".
GEO_GPL_Download is a
useful tool to automatically
download data matching a
list of GEO platforms (GPL) from the GEO
database (Gene Expression Omnibus).
It is distributed along with TRAM in the
directory "TRAM_Utilities" at: http://apollo11.isto.unibo.it/software/TRAM/.
Requirements: any operating system (Linux, Mac
OS X, Windows, ...) with Python 2 or Python 3
and IDLE.
b)
Alternatively, you may use a platform data
file from any source, provided that you have at least two
columns of data (in tabulated text format): - the list of gene
identifiers (ID), describing
the genes included in that experimental platform;
this column must
have the header (first row):
"ID"
(without quotes); - the
corresponding GenBank Accession Number or Gene Symbol.
For example:
ID
[GB_ACC] [Gene
Symbol]
1007_s_at
U48705 DDR1
[Column Headers are not
required, except for the ID header]
Further data in the
column, e.g. Web addresses, will be automatically
ignored by TRAM.
c) Import the Platform
data file (tabulated text) in TRAM.
From the TRAM Home,
click on the "Set Up"
button -
then on "Gene ID"
- then on "Platform IDs". Click on
the "Import the Platform
data file" button. You will then be guided to
locate these columns:
- ID
(the
Platform ID for the probe)
- GB_ACC
(the
GenBank Accession Number for the probe
sequence,
when available, or alternatively
the
GenBank GI code, or as a last option
the RefSeq (NM_) code)
- Gene symbol
(the official Gene Symbol, when available)
Further data in
the column, e.g. Web addresses or additional GenBank
Accession Numbers following the first one, will be
automatically ignored by TRAM.
NOTE - If the
column type is not clear by simple inspection of the
column content within the first rows, please scroll down the window
to evaluate further records (rows) that could clarify if
that column contains sequence accession numbers and/or
gene symbol identifiers.
Platform data will be imported into the "Platform_ID" TRAM
database table. You will be requested to
assign a unique code
to each Platform after its import. At the end of the
import, you may delete the original Platform data file.
In addition, a text file with the processed Platform data
is automatically created in the "Platform" folder
within the "TRAM" folder. This file is
automatically named: "GPL...", where (...) is the code you
assigned to the platform. The prefix "GPL" is used
independently on the GEO origin of the platform. These
files could be useful in the case that you successively
need to execute a batch platform import (section "Special" from the TRAM Home) in
another copy of the TRAM software, provided that you
rename them as GPL1.txt, GPL2.txt and so on.
TRAM will try to use first
the GenBank Accession to relate the sequence to the corresponding
updated Gene Symbol (when available); alternatively, the
"Gene Symbol", as provided in the data file, will
be used.
If the same GenBank
Accession Number (GenBank_AN) is eventually assigned
to different Gene Names, the field
"Discrepancy GenBank vs Gene_Names" will
display "Yes" in all Records with that GenBank
Accession Number.
In particular, TRAM 1.3 will search to assign the Gene Name to each
platform gene identifier using various sources in
this priority order (the particular source used for each
Probe ID may be found in the "Gene_Name Source" field of
the "Platform Identifiers conversion table", briefly
"Platform"):
01. Gene symbol or name obtained from Custom
Table via the "ID"
Probe Identifier (all data provided by the user
and loaded in the "Custom" TRAM table);
02. Gene symbol or name obtained from "NCBI Gene"
via the GenBank
Accession (GenBank_AN field) originally provided
by the Platform scheme
available online for the gene probe and assigned
by "NCBI Gene" (whose data are processed and imported
during TRAM Set Up process in the TRAM "Gene Aliases and
RNA data table") to a specific locus;
03.Gene symbol or name
obtained from "gene_aliases.txt" file (a file
manually created by the user with two columns linking an
alias to a Gene Name and imported in TRAM during the Set
Up process) via the
Gene Symbol (Gene_Symbol field) originally
provided by the Platform scheme
available online for the gene probe;
04. Gene symbol or name obtained from "NCBI
Gene" Aliases (whose data are
processed and imported during TRAM Set Up process in
the TRAM "Gene Aliases and RNA data table") via
the Gene Symbol (Gene_Symbol field) originally
provided for the gene probe;
05. Gene symbol or name obtained from "NCBI
Gene" Clone Names (whose data are
processed and imported during TRAM Set Up process in the TRAM "Gene
Aliases and RNA data table") via
the Gene Symbol (Gene_Symbol field)
originally provided by the Platform
scheme available online for the gene probe;
06. Gene symbol as provided by the Platform
scheme available online, as long as it is present in "NCBI
Gene";
07.UniGene
Identifier obtained from "NCBI UniGene"(whose data are
processed and imported during TRAM Set Up process in the TRAM
"UniGene Identifiers conversion table", in brief:
"UniGene" table) via the GenBank
Accession (GenBank_AN field) originally
provided by
the Platform scheme available online for
the gene probe;
08.UniGene
Identifier obtained from "NCBI UniGene"(whose
data are processed and imported during TRAM Set
Up process in the TRAM
"UniGene Identifiers conversion table", in
brief: "UniGene" table) via the
GenBank Identifier (GenBank_GI
field) originally provided by the
Platform scheme available online for
the gene probe;
09. Gene symbol
just as provided by the Platform scheme available
online;
10.UniGene
Identifier related to the GenBank Accession, if it is not
matched to a locus;
11.GenBank
Accession Number (or GenBank GI) as provided by the
Platform scheme available online.
In brief, sources for Gene Name are:
01. Custom Table,
via Probe Identifier
02. "NCBI Gene",
via GenBank Accession in "Aliases and
RNA Table"
03. Custom file "gene_aliases.txt",
via Platform Gene
Symbol
04. "NCBI Gene" Aliases,
via Platform Gene Symbol in "Aliases
and RNA Table"
05. "NCBI Gene" Clone Names,
via Platform Gene Symbol in "Aliases
and RNA Table"
06. Gene Symbol provided by Platform
and present in "NCBI
Gene"
07. UniGene,
via GenBank Accession
08. UniGene,
via GenBank
Identifier
09. Gene Symbol as provided by Platform
10. UniGene Identifier used as Symbol
11. GenBank Accession used as Symbol
12. [None]
If the first 11 steps
give negative results, a name will not be assigned and the
gene will not be further analyzed. Please note that the priority order in TRAM 1.3 has
changed in comparison with previous versions of TRAM
because UniGene for Homo sapiens, the
previous priority source, has not been further updated
by NCBI, therefore "NCBI Gene" has been used as an
affordable and updated source of Gene Names related to a
GenBank Accession Number.
From the TRAM Home,
click on the "Genes" button - then on the "Alias"
button. This will lead
to the "Gene Aliases" Table. Since TRAM 1.1 (2013)
version, TRAM is able to resolve any Gene Alias resulting from
the above described process, converting each alternative
gene symbol (alias), if included in the section "Other
Aliases" of the "NCBI Gene" (formerly "Entrez Gene")
record for that gene, to the corresponding "Gene Symbol"
(for eukaryotes only). The
user can also place a file named "gene_aliases.txt" in
the TRAM folder listing additional aliases to be resolved,
not included in the "NCBI Gene" record, in tabulated text
format:
first column, gene symbol alias; second column, gene
symbol. If this file is found in
the TRAM folder during the execution of the "Set Up"
- "Genes" process, this file will be automatically
imported and processed. Clone Names The user can place
a file named
"gene_clone_names.txt" in the
TRAM folder listing Clone Names to be
resolved, in tabulated text format:
first column, clone name; second column, gene
symbol.
This file is automatically generated in TRAM 1.3 version
by parsing of "NCBI Gene" entries during "Set Up - Genes"
process (see section 1.2 c in this Guide). If this file
is in any way found in the TRAM folder
during the execution of the "Set Up" -
"Genes" process, it will be
automatically imported and processed.
Repeat the Platform import process for any desired
Platform.
From the "Platform" TRAM table, you may click on the
"Platforms Summary" button, which will take you to a
summary table of the data about each Platform. The button
"Show Identifiers" associated to each Platform record will
show all Identifiers of the relative Platform.
A file with formatted platform data ready to be imported
in TRAM will also be created at the end of each guided
platform import. This is useful for any subsequent
possible use of the "Special"
batch unsupervised platform import function described
below. Clicking on "Special"
in the TRAM "Home" window will allow the user to start a
batch data import of large pools of Platforms data without
the user intervention. To this aim, prepare
all the files with Platforms data as described, name them
GPL1.txt, GPL2.txt, ... and put them within the "Platform"
folder of the main directory of TRAM.
In this case, a fourth column must be added at
least in the first row, with the codeidentifying
the Platform whose data are present in the file (e.g.,
GPL96):
[Column Headers are not required] [ID]
[GB_ACC] [Gene
Symbol] [Platform]
1007_s_at
U48705 DDR1
GPL96
1053_at M87338
RFC2
...
... ...
You will be
asked to choose whether to delete or not the previously
imported Platform data.
--------- IMPORTANT - To
interpret the identifiers in your gene
expression data file as Platform ID for the
relative setup Platform, remember to write the Platform code (e.g.,
GPL... for a Platform) in the third column of your expression
data file, at least in the first row, so that your expression data
file will contain three columns separated by one
tabulator, in this format:
A protocol for
the execution of meta-analysis by the TRAM software is
available along with the TRAM 1.3 version ("TRAM_Meta_Analysis_Protocol_2017"
file, located in the "TRAM Utilities" directory of the TRAM
web site).
While TRAM_HUMAN.zip file
contains a pre-setup version ready to analyze expression data
from human organism, you may also download an empty TRAM
template that may be prepared for the analysis of data from
any organism.
Pre-setup versions may be directly used to import and analyze
expression data without performing the "Set up" process.
However, the user might need to perform the "Set up" section 2.3
to load additional Platform schemes if necessary to interpret
the gene identifiers listed in his expression data file (see
below).
Conversely, the empty TRAM template must always be prepared by
performing the "Set up" process from the beginning (section 1).
Note - Data saving TRAM, as any
FileMaker-based database,automaticallysavesany changes, so you will not find
any save options at the end of the import processes.
After the import processes, avoid any manual data change that
may cause the loss of the original imported data.
Note -
Advanced use You may open the program
files using your copy of FileMaker 12 or later, thus becoming
fully able to make any modification to the software. In this case, do
not open the program using the "TRAM" file, but open, within FileMaker
Pro, the file "TRAM.TMA" instead. Following modifications,
the correct functioning of the program requires its re-launch by "TRAM" runtime, due to data pathway
structure stored in the "TRAM" Scripts.
To cancel a TRAM operation before it is completed (not
recommended):
Press Command-period keys (Mac OS X) or Esc (Windows).
It is possible to compare two different biological conditions,
importing one as the A
sample (or samples pool), and the other as the B sample (or sample pools)
to be compared to A.
Switching by TRAM database tables may be done by clicking on the
relative buttons present in each layout.
The user is
responsible for the homogeneityor comparability of the datato be imported in terms of: biological
sample, microarray platform (although inter-sample
normalization methods are provided), and spot quality
filtering/data pre-processing. The software will map the imported values along the
chromosomes, but it can't check the validity of the
experimental design.
A protocol for the execution of meta-analysis by the TRAM
software is available along with the TRAM 1.3 version ("TRAM Meta-Analysis Protocol 2017"
file).
Each series of
data related to a "Sample" is defined as a "distinct
biological sample",for examplein the case of two channel
experiment, a
sample should be a single
channel, each
channel data being imported as a distinct data file.
Be sure that your
system default format uses "."(full stop mark) as a
decimal separator (English standard). See below
how to check and change the setting if necessary.
IMPORTANT. The expression data
file must be a tabulated (tab-delimited) text file
containing two columns
separated by a TAB character (tabulator key, ASCII9).
First ("left") column: Gene
probe identifier: Official Gene
Symbols/"NCBI Gene" names (default);
or, if
set up the relative conversions: Custom
identifiers, or Platform
Identifiers or GenBank
Accession Numbers.
Second
column: numerical expression value. Use
"." as a decimal separator
(and do not use a thousand separator). Be sure that your system default format uses
"."(full stop mark)
as a decimal separator (English standard).
If this is not the case, you must change the system
setting.
Mac OS X: in "System
Preferences" (from the "Apple" Menu),
click on "International", then on "Formats",
then choose as "Region" a country with the English standard
format for numbers (full stop mark as a decimal separator). System restart or user logout is not required to
make the change effective. Windows: in "Control
Panel" (from the "Start" Menu),
click on "International options" then modify the format of
numbers choosing a country with the English standard format
for numbers (full stop mark as a decimal separator). System restart or user logout is not required to
make the change effective. The expression value is usually the pre-processed intensity value,
i.e. the value assigned to the spot as it has been processed
by the software of the specific experimental platform used
(for instance following background subtraction for a
microarray spot).
Scientific notation is supported in the format, for example, 20E-2. TRAM considers the
expression values as linear data, and not
logarithm-transformed data.If necessary,
data should be retransformed before importing them in TRAM.
TRAM can back-transform log-transformed values (in base 2,
10 or e) if user prepares data using "Help with data"
utility (see below). Ratio values (e.g., ratio between two microarray
channels) are not admitted in TRAM.
GEO_GSM_Download is a tool useful to
download automatically data matching a list of GEO samples
(GSM) from the GEO database (Gene Expression Omnibus).
It is distributed along with TRAM in the directory
"TRAM_Utilities" at: http://apollo11.isto.unibo.it/software/TRAM/.
Requirements: any operating system (Linux, Mac OS X,
Windows, ...) with Python 2 or Python 3 and IDLE.
When the pre-processed expression values are not available,
the user may consider the background (BKD) median as the
median of the pixel intensities in the area surrounding the
spot, and the feature (spot) median as the median of the
pixel intensities in the area inside the spot. The spot
intensity may be then calculated by subtracting the
background median value from the feature median value and
used as the expression value for the corresponding gene. Clicking on the "Help with
data" button in the TRAM "Home" window will allow the
user to be interactively assisted in the preparation of text
files of the required format, including calculation of the spot
intensity by subtracting the
background value from the spot value (see below for details). Third
column [optional]: Platform code (e.g., GPL96),
it is needed only in the first row.
IMPORTANT - To interpret the
identifiers in your gene expression data file as ID for the relative Platform, you must previously have set up the
corresponding Platform as explained in section 2.2 of this Guide.
Some Platforms are pre-setup as described in the same section.
Example:
Note: If the first
expression value is not in the first row due to the
presence of some header lines, please use the very first row anyway in
your file to indicate the Platform code, making sure that
you are writing it in the third column. If you have only one column
in the first row, please press the tabulator key twice
then write the Platform code.
Do not insert blank spaces or other characters at the end
of the text in a column.
If you
use the GenBank Accession Numbers as identifiers, please do not
append the version of the sequence to the GenBank identifier,
i.e. use AB123456 and not AB123456.1.
Management of
absent/negative/zero values
Probes whose expression
value is absent (i.e. empty, not available) will
not be further considered by TRAM for the construction and
analysis of the maps, assuming that an expression level has
not been measured.
Sample expression values equal to or
lower than "0" (≤0) will be thresholded to 95% of the minimum
positive value present in that sample, in order to obtain
meaningful numbers when dividing "Samples Pool A" values by
"Sample Pool B" values.
Assuming that in these cases an expression level is too low
to be detected under the used experimental conditions, this
transformation still allows to obtain a ratio between values
in the pool A and values in the pool B, which is useful to
highlight differential gene expression.
Expression values assigned to
unmapped genes (without known genome coordinates)
will be normalized and it will be possible to browse through
them in the "Values_A_B_All" layout, but they will not be used in the
construction and analysis of the maps.
From the "Values_A_B" layout, the button "A/B (unmapped)" option brings to the layout "Values_A_B_All".
Import
utilities
The user must provide TRAM
with one or more expression data files with at least two
columns: Gene/Probe ID and its corresponding numerical
expression value. To prepare the files in this format, you
may use any word processor or spreadsheet program and save
the file in tabulated text format.
To simplify the extraction of the relevant columns from any
available tabulated text file providing expression data,
generated by the user's experimental platform or publicly
available from any online source, the TRAM internal utility
"Help with data" can be used by pressing the relative button in the TRAM Home.
IMPORTANT - To
interpret the identifiers in your gene expression data file
as ID for the relative
Platform, you must have previously set up the corresponding
Platform as explained in section
2.2 of this Guide. Some Platforms are pre-setup as
described in the same section.
Clicking on the "Help with data" button in the
TRAM "Home" window will allow the user to be interactively
assisted in the preparation of text files of the required
format. The user will be guided to import his data file and to
select the two columns containing gene identifiers and
expression values. A Platform code must be indicated if the
gene/probe identifiers are not the standard gene symbols and
they need to be converted into gene symbols using Platform data
loaded in TRAM (see section 2.2).
Finally, the software asks the user to save the data, generating
a text file suitable to be imported in TRAM. The user may choose
the desired file name.
If the user plans to import expression data files using "Batch Import"
mode of feeding the database, the text files must be saved with a name of the type
A1.txt, A2.txt ... (in the TRAM
folder "Batch_Import_A") or B1.txt,
B2.txt ... (in the TRAM folder
"Batch_Import_B"). Batch
processing of a sample series: it is possible to
prepare in batch mode a series
of sample data files related to the same work, obtained
with the same Platform and formatted in an identical way.
Put all the files to be processed in the "Series"
folder located in the "TRAM" folder, naming them S1.txt,
S2.txt
and so on.
From the TRAM "Home", click on the "Help with
data" button and then on the Data file batch processing button.
Locate the "ID" and "Value" columns when requested for the first
sample.
Insert the name of the Platform when requested. TRAM will then automatically
process all the files located in the "Series" folder using the
same criteria, generating a series of uniformly
processed data files with names such as P1.txt,
P2.txt
and so on. These files may be transferred in the
"Batch_Import_A" or "Batch_Import_B" folders to be automatically
imported by TRAM using the "Batch mode" import buttons in the
TRAM "Home", after renaming
them with names such as A1.txt, A2.txt ... or B1.txt,
B2.txt
..., respectively. Clicking on the "Special"
button in the TRAM "Home" window will allow the user to
automatically perform batch data import of large pools of
samples for both A and B Pools in succession, provided that the
expression data files have been prepared in the required format
(possibly using the "Help with data" utility) and have been saved in the TRAM folder
"Batch_Import_A" (with
names such as A1.txt, A2.txt ...) and in the TRAM folder
"Batch_Import_B" (with names
such as B1.txt, B2.txt). Clicking on the "Export" button in the TRAM "Home"
window will assist the user in the export of the (raw or
normalized) imported data.
The following instructions
are also available as a guided procedure within the software in
the appropriate "Set Up" area ("Set
up - Part 2 - Gene Identifiers conversion tables").
NOTE: TRAM
will try to convert the Gene identifiers present in your
expression data files to Gene Symbols/Gene names until a positive match is found, with the following priority order:
1)if you set up the "Custom" table as
described in section 2.2
of the chapter "Set up", the "Custom" Table will be first
searched to match Gene identifiers in your data to the
corresponding Gene Symbols/Gene names, overriding all other
conversions;
2)
if no match has been found, then the "Genes" table (mandatorily setupas described
in section 1.2
of the chapter "Set up") will be searched to directly interpret Gene
identifiers in your data as Gene Symbols/Gene names;
3)
if you writea Platform ID code (e.g., GPL... for a GEO
Platform) in (at least) the first line of the third column of your
data (formatted as described in the section 3 of this Guide), a
corresponding list of gene Identifiers (often a series of
progressive numbers) is expected in your data and each
will be converted in the corresponding Gene Symbol/Gene name, if you previously set up the table for the relative
platform as described in section 2.3 of the chapter "Set up";
for example:
1007_s_at6.38
GPL96 1053_at
6.65
117_at 6.48 ...
... [Note - If the first
expression value is not in the first row due to the
presence of some header lines, please use the very first row anyway in
your file to indicate the Platform code, making sure that
you are writing it in the third column. If you have only one column
in the first row, please press the tabulator key twice
then write the Platform code.
Do not insert blank spaces or other characters at the end
of the text in a column]. 4)if
you
write the word GeneID in the first line of the third column of your data(formatted as
described in the section 3 of this Guide), an "Entrez Gene" Identifier
(a progressive number) is expected in your data and it will be
converted in the corresponding Gene Symbol/Gene name searching
in the "Genes" Table (this has been mandatorily setup as described
in section 1.2 of
the chapter "Set up");
for example:
780 6.38
GeneID 5982
6.65
3310 6.48 ...
... [Note - If the first
expression value is not in the first row due to the
presence of some header lines, please use the very first row anyway in
your file to indicate the "GeneID" option, making sure
that you are writing it in the third column. If you have only one column
in the first row, please press the tabulator key twice
then write the "GeneID" word.
Do not insert blank spaces or other characters at the end
of the text in a column]. 5)if no match
has still been found, the "Unigene"
Table will be searched to directly interpret Gene identifiers in
your data as GenBank Accession Numbers (if you set up this table as described
in section 2.1 of
the chapter "Set up");
6)if no match
has still been found, the "Unigene"
Table will then be searched to directly interpret Gene
identifiers in your data as UniGene
Cluster identifiers (if you set up this table as described
in section 2.1 of
the chapter "Set up").
When a match is found, this will prevent the software from
searching for a symbol in the next tables.
We suggest to use recently released data for each table to be
imported in the TRAM software.
If you have a list of Gene Symbols as probe
identifiers, with the corresponding expression values, you can
directly go to "Home" and start to Import expression data,
otherwise go to the "Set Up" chapter, Part 2.
Import
start
In the "Main" ("Home") window there are two button
series designed for rapidly begin the import processes.
Don't worry if the progress bars seem to
advance too slow; this sometimes doesn't reflect the actual
progress of the task.
The first import button series
("Import
A" and "Import B") imports one expression data file
into "Values_A"
table or "Values_B"
TRAM database table, respectively. At the start of the import process, the user must
choose whether to retain or delete all previously imported data.
Clicking on "No" in the first dialog box will let the user add
to the previously imported data one or more other datasets. The
user may subsequently select any sample subset which must be
subjected to analysis.
The second dialog box asks for the selection of the file
containing the data table.
All data imported from a file will be labeled by the software
with a progressive order number (Sample_ID) to easily track (or delete from the analyzed set by
the "Remove Sample" function) all data belonging to a specific
set.
In addition, "Samples_A" and "Samples_B" tables allow the
visualization and annotation of the list of imported samples and
to visualize summary data for each sample.
The "Go" buttons open a
window in your default browser displaying the entry for
Platform, Series, Sample, Dataset and PubMed record
if you annotated (at any time) the corresponding fields with
codes for GPL, GSE, GSM, GDS and PMID, respectively.
At the start of an analysis, the user can also select which
samples are to be excluded or included (default) from the
current analysis, without removing them from the TRAM database;
alternatively the user may even remove any samples from the
database. Please note that changing the
set of samples to be analyzed causes restarting of
normalization (see below), which may take several minutes or
hours, depending on the number of loaded samples.
The software will ask
for the import of another set at the end of the process.
As final step, the user can check the results of the import
process.
When requested by the software,
click on the blue "Continue" button
at the top and on the right of the program window, to ensure a
correct functioning of the software.
The second import button series("Batch mode" buttons) works in the same way but it is
optimized to perform a batch, non user-supervised import.
By clicking on "Batch mode" (A or B) all files (formatted as just described for the
manual import) contained in
the
"Batch Import_A" folder or in the
"Batch_Import_B" folder,
respectively, will be imported. In these folders the file
must be named as
A1.txt, A2.txt, ... and
B1.txt, B2.txt, ..., respectively (without interruption in the
series of progressive numbers).
In the case that you would like to perform a batch import
maintaining the previously imported dataset, the first file name
should be numbered as the first not used Sample_ID number (e.g.
if the last imported set has Sample_ID = 5, the first file must
be A6) and that number will correspond to the Sample_ID of that
dataset. The software will alert you about this. You may check
for the currently used Sample_TRAM_IDs by clicking on the
"Samples A" and "Samples B" buttons, respectively, in the TRAM
Home.
Clicking on the "Special"
button in the TRAM "Home" window will allow the user to
automatically perform batch
expression data import of large pools of samples in succession
for both A and B Pools. Batch import may be followed
automatically by data analysis using the "Batch Import +
Analysis" button in the "Special" section.
After the import
process, expression data are visualized in the "Values_A" and "Values_B" tables, that you be displayed by clicking on the
buttons A and B, respectively, from TRAM "Home"
(opening window).
These are the data fields for the "Values" tables:
Identifier(the
original probe identifier in your data) Intensity
value(the
original numerical expression value) Sample_ID
(A1, A2... or B1, B2...). Platform
(filled if you indicated a Platform code "GPL..."
in
your expression data file. Exclude
(state of inclusion/exclusion of the data for the
analysis) Gene_name (Gene
Symbol/Gene name following conversion of Identifiers) Chr
(chromosome name) txStart
(start
position of the gene transcript on the chromosome) txEnd
(start position of the
gene transcript on the chromosome)
IMPORTANT - The conversion of gene or probe
identifiers to Gene Symbols/Gene names is performed during
expression data import. To keep the database indexed
and fast, variations of set up of the software are not
dynamically reflected in variations of gene assignment to the
probe identifiers. Therefore, changing
of any table related to the "Set
Up" chapter ("Chromosomes", "Genes", "EST_Clusters" and
"UniGene_ID", " Platforms ID", "Custom ID") should be followed
by reimport and reanalysis
of the expression data to make the changes effective. An exception to this rule is
the set up of new Platforms or new Custom ID sets that have to
be applied only to new, subsequently loaded samples and not to
previously imported samples. In this case reimport of all
samples is not needed. Clicking on the "Special"
button in the TRAM "Home" window will allow the user to
automatically perform batch data import of large pools of
samples. Interpretation and
Normalization of the imported data
The user provides TRAM with an "Intensity value" for each
spot, which is intended to be the
pre-processed intensity value, i.e. the numerical value
assigned to the spot as it has been processed by the
software of the specific experimental platform used (e.g.
following background subtraction for a microarray spot).
To allow comparison of gene expression data obtained by
different biological samples and/or by different
experimental platform, TRAM is able to perform some useful data normalization
methods.
The
normalization
type may be changed by a pop-up Menu from the "Values" or "Samples"
data tables.
Intra-sample (intra-array) normalization works within each
distinct sample, while inter-sample (inter-array)
normalization is simultaneously applied to the desired
sample sets.
You may select different combinations between these types of
normalization.
Please note that the
normalization process may require several hours for
databases in which tens of arrays were imported. Clicking on the "Special" button in the TRAM "Home"
window will allow the user to automatically perform normalization changes of large
pools of samples.
The normalization may also be set starting
an analysis, so
that normalization and analysis will be performed in chain
without the user's intervention.
Intra-sample
normalization
These methods
rescale values within each data set using a standard
internal reference for each sample.
None
No Intra-sample normalization is performed.
Mean [DEFAULT AFTER
INSTALLATION] Each value is expressed as
the percentage of the
corresponding sample mean value. This is
equivalent to the classic "global normalization" in the
microarray data analysis.
Median Each value is expressed as
the percentage of the
corresponding sample median value. This is
equivalent to the classic "global normalization" in the
microarray data analysis.
Max Each value is expressed as
the percentage of the
corresponding sample maximum value. This is
equivalent to the classic "scale normalization" in the
microarray data analysis. Inter-sample
normalization
These methods
rescale values within each sample set.
None
No Inter-sample normalization is performed.
Quantile For the implementation in
the database structure at the core of TRAM, each
intra-sample normalized value is given a rank following
sample data sorting in ascending order, then the mean
value for all the values with the same rank across all
samples is calculated. This mean value is assigned as
the expression value to each gene with the same rank in
each sample. An original variant of this method
implemented in TRAM is described below. (Bolstad et al.,
2003).
Scaled_Q (Scaled Quantile) [DEFAULT AFTER INSTALLATION] Derived from Quantile
method, except that the rank for each array is rescaled
according to the array with the maximum number of
probes. This original method allows to compensate when
comparing array with highly different number of probes
because in this way the highest values for arrays with
low number of probes are given ranks comparable to those
assigned to arrays with high number of probes (see the
article). DATA
SUMMARY - Values_A_B Layout
The summary of gene expression values, under the current mode of
normalization, may be viewed in the "Values_A_B"
layout.
This is an indexed database table summarizing all data points
available in the sample pool for each gene.
Along with the Mean value and the Standard Deviation (SD) value,
the SD value is also shown as a percentage of the expression
value. The "Mean" value of the data
points available for each locus is considered the expression
value for the respective gene and it is used in the subsequent
analysis.
The number of "Data Points" from which the summary data are
obtained is also displayed.
The yellow button "A/B
(unmapped)" brings to the layout "Values_A_B_All", which
includes also unmapped loci that are not listed in the
"Values_A_B" table used for the creation and analysis of the
transcriptome maps.
Clicking on the "Export" button the data for the
genes listed in "Values_A_B" table may be exported as a
tabulated text file.
The file contains by default the following columns, from left to
right:
01) Gene_name
02) Chromosome name
03) Chromosome Identifier (progressive number)
04) Gene mean expression value for pool A samples. 05) Gene mean
expression value for pool B samples. 06) Ratio between gene mean
expression value from pool A
samples and from pool B samples (A/B ratio).
Different TRAM databases may
be obtained by duplicating the fresh "TRAM" folder and starting a new
analysis session.
Please do not change the name of any file and folder of the
TRAM software.
You may download multiple copies of TRAM and run them
simultaneously, provided that each "TRAM" folder is located
in a different directory, so you may maintain the original
names of TRAM folder and files. Do not
move the "TRAM" folder while the software is open.
Run the "TRAM" software from a local hard disk.
Do not run the software from a network drive. Don't care if the
progress bars seem to advance too slow, this
sometimes doesn't reflect the actual progress of the
task.
If a TRAM analysis aborts
unexpectedly, it is advisable to restart it in a fresh
TRAM copy. For the
analysis of a pool of expression data arrays, the expression
value for each gene
symbol will be the mean expression value among all its
corresponding identifiers available in that sample pool.
Basically, TRAM software performs two
types of analysis: creation of transcriptome maps ("Map" mode)
or search for cluster of over- or under-expressed
neighbouring/contiguous genes ("Cluster" mode).
Clicking on the "Special" button in the TRAM
"Home" window will allow the user to automatically perform
all available analyses in sequence, after an initial choice
of the settings required for the analysis. Note: if Analysis is preceded by automated, batch
"Import A+B" of the expression data, the setting "Sample
Selection" will be ineffective, and all Samples in Pools A
and B will be imported and then will all be used for the
analysis.
You may start the analysis clicking on one of the red "Analysis"
buttons in the "Home" layout ("Home"). You will then be asked
to insert the analysis settings of your choice.
The two settings
common to both types of analysis are:
Poolchoice
(A, B or A vs. B to compare two series of samples between them
using A/B ratio);
Statistics calculations may be performed with respect to all genomesegments (or genes)
or to the set of segments (or genes) located in the same chromosome.
This implies both descriptive statistics (calculation of
percentile thresholds to select over/under-expressed
genes) and statistic analysis (parameters for calculation
of hypergeometric distribution in order to determine
significance of the identified over/under-expressed
segments or clusters).
4.1
Creating and analyzing maps of the transcriptome
Click on the "Chromosomal
Segments" button in the TRAM "Home" (TRAM main window).
The software will generate a graphical map of the transcriptome
showing a vertical line representing each chromosome. An
expression value is associated to each segment of the line, whose
size is determined by a window
(in bp) set by the user. This value is the mean for all
available expression data related to the genes included in each
segment.
Information about "Location" is derived from "NCBI Gene"
imported data and in the "Map" mode is obtained for the first
gene listed in each chromosomal segment.
In the
"Map" mode, results are always generated calculating both
types of analysis (the one based on all genes
in the genomeand the one based on
the genes located in the same chromosomethe segment belongs to). You are
required to select one type of analysis ("genome" or
"chromosome") in order to be directed, at the end of process, to
the results layout you selected, but the results for the other
layout are also available.
This is because TRAM spends much of the time during "Map"
analysis in creating chromosomal segments, so it is convenient
to calculate both statistics when segments are created.
SETTINGS
The available settings for this analysis are:
Window: defines the length
for a segment.
If the coordinates of a gene span the window boundaries, the
gene is included in each window in which a part of it lies.
Each segment on the map shows only those genes having an
available expression value in the corresponding sample or pool
of samples.
Sliding
window
shift:
defines the overlapping region between a segment and the next
one.
A shift equal to zero results into non overlapped segments.
For example, if the window is 1.000.000 bp and the shift equals
200.000 bp, the successive segments will be created with
coordinates:
1 - 1.000.000 bp
200.000 - 1.200.000 bp
400.000 - 1.400.000 bp, and so on.
This function could be useful to increase the sensitivity of the
search for over/under-expressed segments.
Percent(segment): defines the
threshold required to consider a segment as "Over-
(or Under-) expressed" (i.e. to be marked in
red or blue in the expression
bar).
The segment which shows mean expression value (calculated as the
mean of all known genes included in it) within the highest (n) percent of Values or within
the lowest (n) percent of Values, where
n=Factor (segment), will be highlighted (in red or blue colour,
respectively), thus displaying genomic regions globally over- or
under-expressed, respectively, with respect to the desired
threshold.
Percent (gene):defines the threshold expression value to consider a
gene as "Over- (or
Under-) expressed"
(i.e. to be marked in red
or blue in the
segment gene list).
The gene which shows
mean expression value within the highest
(n) percent of Values
or within the lowest
(n) percent of Values,
where n=Percent (gene), will be highlighted, being listed in red (over-expressed) or blue (under-expressed) colour font, respectively.
The number of
over/under-expressed genes in the segment is calculated
with respect to the Percent
(gene). Using two different
parameters for segment and genes allows the user to
perform a more refined analysis.
Number of genes in the window:defines the minimum number of
over/under-expressed genes required to mark the segment with the tag
"Over" (or "Under").
The Over-Expressed segment
listing a number of Over-Expressed genes equal to or greater
than the "Number of genes in the window" will be marked as "Over" in the "Map" layouts.
The Under-Expressed segment listing a number of
Under-Expressed genes equal to or greater than the "Number of genes in window"
will be marked as "Under"
in the "Map" layouts.
SAMPLE NUMBER:definesthe minimum number of Samples
for which an expression value must be available for a gene
in order to include that gene in the analysis.
RESULTS
The results of the
analysis are displayed within 30-90 minutes, depending on the
number of arrays analyzed. Changing the data normalization
type during the analysis requires additional time for the task
to be completed.
The results
of the analysis are displayed in the "Chromosomal Segments" layouts (i.e., "Map"
layouts).
Each chromosomal segment is actually a record of the database.
You can find and sort segments using desired criteria.
The "P"
field displays the p-value resulting from the
hypergeometric distribution calculation for the
"Over/Under"-expressed segments. This is the statistical
significance, i.e. the probability that the result (presence of
n over/under-expressed genes within the same segment) could have
been obtained by chance.
Due to the high number of segments in a genome, the "P" value
needs to be corrected to avoid False Discovery Rate (FDR). The "Q" field displays the p-value
corrected for FDR.
"P" and "Q" values are displayed only for the segments
fulfilling criteria to be tagged as over/under-expressed. If Q≤0.05, the
over/under-expression is considered to be statistically
significant. For details and
references about the statistical analysis, see the article
describing "TRAM".
The user may also
produce a graphical output showing the series of chromosome
transcriptome maps aligned horizontally and may choose to select
representation of specific chromosomes or set of chromosomes.
In addition, specific buttons help retrieve online database
entries for the desired genes.
In the "Map" layouts based on all gene values, segments that
result to be significantly over/under-expressed only in this
type of analysis, but not in the corresponding one based on
pertinent chromosome values, will be marked by a "G" and the
intensity bar will be highlighted in yellow. The button "Show
only"->"Genome Specific" will retrieve only these "G"
segments. In the "Map" layouts
based on chromosome-specific
values, segments that
result to be significantly over/under-expressed only in this
type of analysis, but not in the corresponding one based on all
gene values, will be marked by a "C" and the intensity bar will
be highlighted in yellow. The button "Show only"->"Chromos.
Specific" will retrieve only these "C" segments.
Clicking on the "Export
Results Data" button allows the export of the results
as a tabulated text file that will be saved in the "Results"
folder present in the main "TRAM" directory.
The file contains the following columns, from left to right:
01) Chromosome name
02) Chromosomal location
03) Segment Start genomic position
04) Segment End genomic position
05) Segment expression value
06) Label of segment Over/Under-expression ("Over", "Under")
07) P value
08) Q value
09) List of genes (symbols) included in the segment
10) Number of Over-expressed genes in the segment 11) Number of
Under-expressed genes in the segment
12) Total number of genes in the segment
A second file with the label "Set" in the file name is
generated, containing the summary of the analysis settings,
which are also displayed at the top in all TRAM results layout.
The user can also export result data in different formats (e.g.,
Excel) using the "Export Records..." command from the "File"
Menu.
4.2
Searching for clusters of neighbouring over/under-expressed
genes (Back to Index)
In the "Cluster" mode, the software will search for
sets of contiguous/neighbouring
genes all expressed beyond a defined "n" threshold,
i.e. with expression values higher than the (100 - "n")
percentile or lower than the "n" percentile.
In this mode, results are centered on individual differentially
expressed loci and they are complementary and more sensitive
compared to the "Map" mode of analysis, which requires the
definition of an arbitrary window length within which genes must
be comprised. SETTINGS
Click on the
"Gene Clusters" button
in the TRAM "Home" (TRAM main window).
The available
settings for this analysis are: Percent
(gene): defines the thresholds required
to consider a gene as "Over- (or Under-) expressed" (i.e. to be marked in
red or blue in the
expression bar). The genes showing
mean expression value within the highest
(n) percent of Values
or within the lowest
(n) percent of Values,
where n=Percent (gene), will be highlighted, being listed in red
(over-expressed) or blue (under-expressed) colour font,
respectively, Over-Expressed gene
(marked as "CLUST-O" in
the results layout) Under-Expressed gene (marked as "CLUST-U" in the results
layout) Gap:defines the maximum number of non "Over" or
"Under"-expressed genes allowed to be localized between two
"Over" or "Under"-expressed genes in a cluster.
Setting a gap equal to 1 means that two over- (under-) expressed
genes will be included in the "Cluster" even when they are
separated along the chromosome by a gene not fulfilling the
conditions to be considered over/under-expressed. For example, a
cluster composed by the over-expressed genes A, B, and C,
could contain no more than 2 non-over-expressedgenes: one between genes A and B
and the other between genes B and C. Genes with this feature will
be marked as "GAP" in
the results layouts. If Gap=0, only
contiguous genes will be considered to be in cluster. Genes with no expression
data in the analyzed sample set will not be considered as "GAP"
and will instead be marked "EMPTY"
in the results layouts. They are visualized, but they are
ignored by the searching for cluster process. Gene
Type: the
software will construct a scheme of the linear succession of
genes present in the table "Genes", filled during the set up
process.
The user can set TRAM to use any of the following, while
constructing the linear map of genes:
1) Gene symbols only
(genes
with a "NCBI" Gene symbol/identifier assigned); or
2) All symbols and UniGene clusters
(genes as at point 1) plus
sequences having a UniGene (EST) cluster
identifier).
SAMPLE NUMBER:definesthe minimum number of Samples
for which an expression value must be available for a gene in
order to include that gene in the analysis.
RESULTS
The Results of the "Cluster" analysis are typically
displayed within a few minutes.
Changing the data
normalization type during the analysis takes additional time
for the task to be completed.
The results of the analysis are
displayed in the "Cluster"layouts. Each gene is actually a record
(row) of the database. You can find and sort genes using desired
criteria.
The "P" field displays the p-value
resulting from the hypergeometric distribution calculation for
the "Over/Under"-expressed Clusters. This is the statistical
significance, i.e. the probability that the result (presence
of n over/under-expressed clusters within the transcriptome)
could have been obtained by chance. Due to the high number of
genes in a genome, the "P" value needs to be corrected to
avoid False Discovery Rate (FDR). The "Q"
field displays the p-value corrected for FDR. "P" and "Q" values are displayed only for the clusters
fulfilling criteria to be tagged as over/under-expressed. If Q≤0.05, the
over/under-expression is considered to be statistically
significant. For details and references
about the statistical analysis, see the article describing
"TRAM".
The number (#) of over/under-expressed
genes in the cluster, the Length
(in bp) of the chromosomal region covered by the cluster,
the number of individual Data Points (e.g., array spots) from which
summary data for each gene are obtained are also displayed.
Specific buttons help retrieve
online database entries for the desired genes.
In the "Cluster" layouts based on all gene values, genes that
result to be significantly over/under-expressed only in this
type of analysis, but not in the corresponding one based on
pertinent chromosome values, will be marked by a "G" and the
intensity bar will be highlighted in yellow. The button "Show
only"->"Genome Specific" will retrieve only these "G"
genes. In the "Cluster"
layouts based on chromosome-specific
values, segments
that result to be significantly over/under-expressed only in
this type of analysis but not in the corresponding one based
on all gene values, will be marked by a "C" and the intensity
bar will be highlighted in yellow. The button "Show
only"->"Chromos. Specific" will retrieve only these "C"
genes.
Clicking on the "Export
Results Data" button will
export the results as a tabulated text file that will be saved
in the "Results" folder present in the main "TRAM" directory. The file contains the
following columns, from left to right: 01) Cluster ID (a unique
number used as cluster identifier) 02) Type of Cluster
(CLUST-O: over-, CLUST-U: under-expressed) 03) Count of
over/under-expressed genes in the Cluster 04) Length (bp) of the
region covered by the cluster 05) Chromosome name
06) Chromosomal location 07) Gene symbol/name 08) Gene Start
genomic position
09) Gene End genomic position
10) Gene expression value (mean among all pool samples)
11) Sample Count (number of Samples with
an Expression value
for that Gene)
(two values in the A/B analysis, for A and
B, respectively)
12) Cluster Mean Expression
(mean expression value of the genes
in the cluster)
13) Label of gene
Over/Under-expression ("Over", "Under")
14) Number of
individual Data Points processed for each gene 15) P value
16) Q value
17) Gene description
A second file with the label
"Set" in the file name is generated containing the summary of
the analysis settings, which are also displayed at the top in
all TRAM results layout.
The user can also
export result data in different formats (e.g., Excel) using the
"Export Records..." command from the "File" Menu.
4.3 Use of TRAM as "TRAM
Results Viewer"(TRV)
(Back to Index)
Since 1.2 version, a copy of TRAM itself (empty) may be used as a "TRAM
Results Viewer" (TRV) in order to regenerate a
grahical view of the results obtained by a copy of TRAM filled
with species-specific data Tables and with Results generated
by the analysis allowed by TRAM. 1. Choose "! Export Main Tables" from the "Script"
Menu of the copy of TRAM where the analyses were executed.
This will export all main Tables (all fields) from TRAM in
.fmp12 format into the "Results" folder of TRAM
("Settings", "Chromosomes", "Genes", "Samples", "Values_A_B",
"Map" and "Cluster" Results Tables).
Platforms, UniGene/ESTs and sample
"Values" Tables data will not be exported. 2. Copy the resulting 16 Tables in the "Results"
folder of a distinct, empty copy of TRAM 1.2 itself (TRV). 3. Choose "! Import Main Tables"
from the "Script" Menu of TRV.
This will import the Tables in TRV. The source "Table" files
can then be deleted from the TRV "Results" folder.
TRV is intended to allow the distribution of a set of data
and results from a particular TRAM analysis without
distributing the original whole file (which can be of the
size of ~25 Gb when approximately 700 samples are analyzed). Due to the lack of Platform data, as well as of
individual sample "Values" data, in TRAM used as a TRV
some types of analysis cannot be run (adding /
deleting / including / excluding Samples) and some
functions and buttons cannot work (inspecting
individual sample Values, inspecting Platform / UniGene /
ESTs tables).
One set of fields which represent
one entry (i.e.
containing all requested data for a subject, e.g. a gene probe). The record browser is a
small book icon at the top left of the window. You may also
browse the records faster using the cursor at the right of the
small book icon.
5.4 Field (Back to Index) The database unit containing a specific data type
(e.g., "Gene_name").
A particular graphical organization of the field of
a table. A table can be visualized in
more than one layout. A layout may display fields
from a table or its
related fields from other tables. A file may show data within
different layouts. Visualization of a field is
independent from the storage of the contained data.
Browsing among the layouts
can be made by clicking on the "Layout:"
pop-up Menu at the upper left corner.
You may browse the database
by clicking on the small book pages at the top left of the
window, or using the cursor at
the right of the small book icon or by entering a record number and
clicking on the "Return" key. The
following information is constantly displayed in the top bar of
the window
(if not,
select "Status Toolbar" from the "View" Menu): Records: total number of Records in the table. Found: total number of the subset of Records
currently selected. Clicking
on the green circular button will retrieve the complementary
subset of currently omitted records. Sorted: sorting status of the Records
(Sorted/Unsorted). The FileMaker
Pro-based database may be used basically in these "modes": "Browse", "Find",
and "Preview". Switching among different
modes can be done from the "View"
Menu or from the
pop-up Menu bar at the bottom left of the window.
5.6 Browse Mode (Back to Index)
One way to use the database.
It allows entry, view, browse, sort, and manipulation of data. It may be selected from: the "View" menu or the mode pop-up Menu bar at the bottom left of the
window.
In the "Browse" mode, the record sets can be browsed
by clicking on the
small book icon (with the arrows to move "back" and "forward")
in the upper left corner.
Browsing among the tables can be done by clicking on the "Layout" pop-up Menu at the
upper left corner.
5.7 Find Mode
(Back to Index)
An alternative mode to use the
database. It allows searching for
specific content in the database fields, using any different combination
of criteria (see the "Search mode"
section below for more details). It may be selected from: the "View" menu or the mode pop-up Menu bar at the bottom left of the
window.
The user can fill in a blank
form allowing to search in specific fields.
In the "Find" mode, the small
book icon in the upper left corner represents different "requests"
that are made for searching the database.
In FileMaker Pro "Find"
mode, the "AND" - "OR" - "NOT" operators may be implemented in this way:
"AND" by filling criteria in
different fields
located in the same "Request",
"OR" by generating additional requests (from "Requests" Menu) in the same query,
"NOT" by generating
additional requests (from
"Requests" Menu) and
clicking
on the "Omit"
button (located in the window top
bar).
The "Operators" pop-up Menu
appears by clicking on a field while pressing the "ctrl" key, allowing query
of: exact matches, duplicate
values, ranges, wild
cards and more.
Click on the "Perform Find"
button at the top of the window to start the query.
The result of the search is
the subset of the entries matching the set search criteria.
5.8 Preview Mode
(Back to Index)
An alternative way to use the
database. It visualizes a print
preview of the found records. It may be selected from: the "View" menu or the pop-up Menu bar at the bottom left of the
window.
In the "Preview" mode, the user
can obtain a print preview of the data in the current table. Browsing among the tables
can be done by clicking on the "Layout:" pop-up Menu at the upper left corner.
File Options... It is possible to set only
the "Spelling" options.
Change
Password... There is no default password
set.
Page
setup... Standard page set up
command.
Print... Standard print command. The appearance will match
the layout
currently displayed on the screen.
Import
Records This is the general "Import"
function of FileMaker Pro.
Export
Records...
Export command for the found records set in a
given table. Records are exported in
their current sorting mode. User can select fields to
be exported, their relative order, and the separation
character.
Save a
Copy as...
Save a copy of the database, complete, compressed
or as a clone
(database structure with no record present).
Select all Selection of all text present
within a selected field (to select a field, click into
the field).
Find/Replace Utility for
searching/replacing text strings within fields. Note: Use "Find" mode (from
"View" Menu)
for full search and selection of a record set.
Spelling Utility to check spelling of
text strings within fields.
Export
Field Contents... Utility to export the contents
of the selected field to a file.
New
Record Create a new empty record in
the database. The new Record will be the
latest of the current record set.
Duplicate
Record Duplicate the current record
in the database. The new Record will be the
latest of the current record set.
Delete
Record... Delete the current record in
the database.
Delete
Found Records... Delete all currently found
records in the database.
Go to
Record Move to the selected record
by number, previous or next.
Show All
Records Show all the records in the
database.
Show
Omitted Only Show all the records in the
database not included in the current "found" set.
Omit
Record Remove the selected record
out of the current found set, without deleting it.
Omit
Multiple... Remove more than a record,
selected by numbers,
out of the current found set, without deleting them.
Modify
Last Find Return to the last performed
search in order to edit it.
Saved
Finds Save a set of search
criteria.
Sort
Records... Sort the current record set
according to desired criteria.
Unsort Display the current record
set according to the order of creation of each record.
Replace
Field Contents Replace the value of a field
in all Found Records with the value specified in the current record, or by
calculation.
Relookup
Field Contents... This command executes a
relook up of the value of a field by reading the matched value
in a related table (the relationship has been established during
database development using a "key" field).
Revert
Record... Restore the value of a
field, discarding any change, before clicking out of that field.
Search Search a "Help" system for the general commands.
TROUBLESHOOTING
(Back
to Index) Sometimes, power failure,
hardware problems, or other factors can damage a FileMaker Pro database file. When the runtime application
discovers a damaged file, a dialog box appears, prompting the user to contact
the creator. Even if the dialog box does
not appear, files can exhibit erratic behaviour. If you have FileMaker Pro or
FileMaker Pro
Advanced installed
you can recover it
using the "Recover" command. Otherwise, to recover a damaged file: - On Mac OS X machines,
press Command + Option (cmd-alt) while double-clicking the runtime application icon. Hold the keys down until you
see the "Open Damaged File" dialog box.
- On Windows
machines, press Ctrl+Shift while double-clicking the runtime
application icon. Hold the keys down until you see the Open
Damaged File dialog box. During the recovery process,
the runtime application: 1. Creates a new file;
2. Renames any damaged files by adding "Old" to
the end of the file name;
3. Gives the repaired file
the original name.
TECHNICAL
NOTES
(Back to Index)
The minimum software
requirements are: Mac OS X 10.6, OS X Lion
10.7, OS X Mountain Lion 10.8;
Windows XP Professional, Home Edition (Service Pack 3); Windows Vista Ultimate,
Business, Home Premium (Service Pack 2);
Windows 7 Ultimate, Professional, Home Premium;
Windows 8 Standard and Pro edition.
Other specifications may be found here.
A connection to the Internet is
required to display the software Guide and to download data
for set up, but not to run the tool.
Please do
not change the name of all files and folders of the TRAM
software.
You may download multiple
copies of TRAM and run them simultaneously, provided that each
"TRAM" folder is located in a different directory. Do
not move the "TRAM" folder while the software is open.
Run the "TRAM" software from a local hard disk.
Do not run the software from a network drive.
If a TRAM analysis aborts unexpectedly, it is
advisable to restart it in a fresh TRAM copy.
The
scripts at the core of TRAM software are "FileMaker Pro"
scripts.
TRAM 1.3 is composed of a 228 MB database engine
("TRAM.app") and a template ("TRAM.TMA") with 43 data tables,
134 relationships among them and 489 script definitions.
Following set up including NCBI UniGene and UCSC EST
localization data, the size becomes about 5 GB for human
"TRAM.TMA" file.
Time required to import and process a typical microarray data
file is about 10 minutes.
Typical execution time is 1-2 hours for a "Map" analysis and 5-10 minutes for a "Cluster"
analysis, depending on the number of analyzed samples, which
also heavily affects the time required to refresh data when the
type of data normalization is changed.
Large file size and relative slowness of data processing are
mainly due to systematic indexing of all data contained in TRAM,
with the advantage of very fast data browsing, navigation and
search at the end of data import and processing, which may be
run in batch mode.
We encourage any creative use, modification and
non-commercial redistribution of TRAM, as long as the original
paper is cited, and statement that the original program has
been modified is provided (in such a case).
7.1 Known software
limits (Back
to Index) Due to FileMaker Pro
limits: maximum TRAM file size is 8
terabytes (1024 gigabytes); text field can contain up to
1 billion characters; numbers field can contains
values from 10^-400 up to 10^400.
At present, TRAM requires an unambiguous mapping. Genes
common to X and Y chromosomes (e.g., CSF2RA) are now mapped, but
only on chromosome X. The double X-Y location remains indicated
in the "Location" field of the "Genes" Table.
The limit of 25 chromosomes for a genome is declared
only for the possibility to display synthetic maps with all
chromosomes shown horizontally aligned; however, it does not
apply to the data import, standard visualization mode and all
data analysis.
The
following 34 Platforms (commercially
available) that have been used
for the analysis of at least 2,000
GEO samples are already loaded
as default in the pre set-up
versions for human, 2017
(the number of sample available
for each Platform has been
updated up to November 08,
2017):
HUMAN
(genome-wide
expression arrays, Platforms
with > 2,000
Samples in GEO, excluding
exon arrays)
01)
GPL570
[HG-U133_Plus_2]
(134,861 Samples)
Affymetrix Human Genome U133 Plus
2.0 Array 02)
GPL10558 Illumina
HumanHT-12(67,677
Samples) V4.0
expression
beadchip
03) GPL96
[HG-U133A]
(40,122
Samples)
Affymetrix Human Genome U133A Array
04)
GPL6244 [HuGene-1_0-st]
(32,630 Samples)
Affymetrix
Human Gene 1.0 ST Array
[transcript
(gene) version]
30)
GPL1708 Agilent-012391
(3,292 Samples)
Whole
Human Genome Oligo Microarray G4112A
(Feature
Number version) 31) GPL10379
Rosetta/Merck
(2,859 Samples)
Human RSTA
Custom Affymetrix 2.0 microarray
32)
GPL13607 Agilent-028004
(2,436 Samples)
SurePrint G3
Human GE 8x60K Microarray
(Feature Number version)"
33) GPL3991
Rosetta/Merck
(2,396 Samples)
Human 3.0 A1
34)
GPL887 Agilent-012097
(2,170 Samples)
Human
1A Microarray (V2) G4110B
(Feature
Number version)
The following further 27 Platforms
(Total=61) are also already loaded as default in the
pre set-up versions for human, 2017 (the number of
samples available for each Platform has been updated up to
November 08, 2017). They have been used
in any of the TRAM analyses published to date (May 2018) and
they are listed below in numerical/alphabetical order:
35) GPL80 [Hu6800]
(1,490 Samples)
Affymetrix
Human Full Length HuGeneFL Array
36) GPL91 [HG_U95A]
(1,095 Samples)
Affymetrix
Human Genome U95A Array
(Platform in atypical format, not parsable by
TRAM; the annotated file gnf1h.annot2007.tsv
downloaded from
http://biogps.org/downloads/
has been used instead).
(Platform in
atypical format, only Gene Symbols
present in "NCBI Gene" have been left in the
TRAM "Platform" Table).
60) GPL7091 Agilent
Human oligo 22k
A (16
Samples)
(Platform in atypical format, PT_ACC and
GB_ACC
data fields have been manually merged to
maximize probe
annotation).
61) GPL10665 SMD Print_607
(31 Samples)
(Platform in atypical format, Gene Symbol and
Clusterid / UniGene data fields have been
manually merged
to maximize probe annotation).