UniGene
Tabulator
1.1
Guide
(Windows version)
Citation:
Lenzi L, Frabetti
F,
Facchin F, Casadei R, Vitale L, Canaider S, Carinci P, Zannotti M,
Strippoli P.
UniGene
Tabulator: a full parser for the UniGene format.
Bioinformatics. 2006 Oct 15;22(20):2570-1. Epub
2006 Aug 7
INTRODUCTION
This online Guide is designed
for detailed
documentation of
UniGene Tabulator 1.1 software.
A quick illustrated Tutorial
guide on how to install the software
and import the desired UniGene clusters is also available.
--
UniGene Tabulator is a software
solution
designed to
manage UniGene biological flat files.
It implements a structured
representation of
each UniGene’s format fields,
importing
data into a common database managing system,
which can be used in a
local personal computer
(Macintosh and Windows environments).
This database (collection of
related tables)
enables one to index, retrieve
or export UniGene information.
More
sophisticated functions are possible if one uses FileMaker Pro 8 or
better.
Minimal requirements are:
Macintosh OS X: 10.3.9
Windows OS: 2000 (Service Pack 4)
or XP (Service Pack 2)
Download UniGene_Tabulator
1.1
for Windows
from address:
http://apollo11.isto.unibo.it/software/
Choose the file: UGTabWin.zip
The downloaded file should be
automatically
decompressed,
generating a "UniGene Tabulator"
folder.
Failing this, the
decompression needs an “Unzip” utility.
The UniGene
Tabulator Folder
contains:
"UniGene Tabulator.exe" file
(runtime
application);
"UniGene.UGT"
(database file);
the “FMP
Acknowledgements.pdf” file;
the
“Extensions” folder,
containing a "Dictionaries" folder with the
dictionary
file for
supported languages and
an “English” folder with 3 files;
40 ".dll" files;
the “Win_Tutorial” and “Win_Guide” folders
contain
a copy of the on-line documentation, for local
(off-line) use.
Please do not change
the name of all files and folders
of the "UniGene Tabulator" software.
You may download multiple copies
of "UniGene Tabulator"
and run them simultaneously,
provided that each "UniGene Tabulator" folder
is located
in a different directory.
UniGene Tabulator is based
on
FileMaker Pro 8
(FileMaker Pro, Inc.)
database management software (www.filemaker.com/index.html),
and is released as a FileMaker Pro 8 template,
along with a free runtime
application able to run "FileMaker Pro"
at the core of the software.
The UniGene Tabulator solution
imports UniGene
“.data” flat files,
containing cluster
information, and ".lib.info" flat files,
containing library
information, into the
database file “UniGene.UGT”.
Choose file for the
desired organism from UniGene ftp server:
ftp://ftp.ncbi.nih.gov/repository/UniGene/.
UniGene Line
types/qualifiers in
“.data” file
(ftp://ftp.ncbi.nih.gov/repository/UniGene/README):
ID
UniGene cluster ID
TITLE
Title for the cluster
GENE
Gene symbol
CYTOBAND Cytological
band
EXPRESS Tissues
of origin
for ESTs in cluster
RESTR_EXPR Single tissue or
development
stage contributes
more than half the total EST frequency for this gene.
GNM_TERMINUS genomic
confirmation of the presence of a 3' terminus;
T if a non-templated polyA
tail is
found among a cluster's
sequences; else otherwise I if templated As are found in
genomic sequence or S if a canonical polyA signal is found
on the genomic
sequence.
LOCUSLINK LocusLink/EntrezGene
identifier associated with at
least one sequence in this cluster (Hs
only)
CHROMOSOME Chromosome. For plants, CHROMOSOME refers
to
mapping on
the arabidopsis genome.
STS
STS
NAME= Name
of STS
ACC= GenBank/EMBL/DDBJ
accession number of STS
[optional field]
DSEG= GDB
Dsegment number
[optional field]
UNISTS= identifier in NCBI's UNISTS database
TXMAP
Transcript map interval
MARKER=
Marker found on at least one sequence in this cluster
RHPANEL=
Radiation Hybrid panel used to place marker
PROTSIM Protein
Similarity data for the sequence with
highest-scoring protein
similarity in
this cluster
ORG=
Organism
PROTGI= Sequence
GI of protein
PROTID=
Sequence ID of protein
PCT=
Percent alignment
ALN=
length of aligned
region (aa)
SCOUNT
Number
of
sequences in the cluster
SEQUENCE Sequence
ACC=
GenBank/EMBL/DDBJ
accession number of sequence
NID=
Unique nucleotide
sequence identifier (gi)
PID=
Unique protein
sequence identifier (used for non-ESTs)
CLONE= Clone
identigier (used for
ESTs only)
END=
End (5'/3') of
clone insert read (used for ESTs only)
LID=
Library ID;
see
Hs.lib.info for library name and tissue
MGC= 5'
CDS-completeness
indicator; if present,
the clone associated with this sequence is
believed
CDS-complete. A value greater than 511 is the gi of
the CDS-complete
mRNA matched by the EST, otherwise
the value is an indicator of the
reliability of
the
test indicating CDS completeness; higher values
indicate more
reliable
CDS-completeness predictions.
SEQTYPE= Description of the
nucleotide sequence.
Possible values are mRNA, EST and HTC.
TRACE= The Trace ID of
the EST
sequence,
as provided by NCBI Trace Archive
PERIPHERAL=
Indicator that the
sequence is a
suboptimal
representative of
the gene
represented by this
cluster. Peripheral sequences are
those that
are in
a cluster which represents a spliced gene without
sharing a splice
junction with any other sequence.
In many cases, they are unspliced transcripts
originating from
the gene.
//
End
of
record
This software parses cluster
data in 5 related tables.
1) “UniGene" is the master
table, it
collects the known information about the
transcribed locus – e.g.
UniGene
cluster identifier, genome localization
or total
number of sequences in the cluster - in a single record.
The master table has
a relation “one
to many” with each other table.
2) “SEQUENCE” imports
information about
the nucleotidic sequences.
By definition, UniGene clusters are sets of
related
nucleotidic sequences,
so there is at least one nucleotidic sequence
in a given cluster.
This table combines information about a sequence
(obtained from
both “.data” and “.lib.info” files) in a single record.
Each cluster
will in this table generate
a number of records equal
to
its sequence number.
3) “STS” parses each known
sequence tagged
sites located in the transcribed
locus. Each cluster will generate
one or
more records in this table.
4) “TXMAP” collects the
“transcript map
interval”
retrieved by radiation hybrid analysis.
5) “PROTSIM” retrieves
information about
proteins having a high similarity with
the peptidic product of the cluster.
Library
field qualifiers
in “.lib.info” file:
ID=
Library ID
TITLE=
Title for the library
TISSUE=
Tissue used to obtain library
VERBATIM_TISSUE=
Library tissue,
details in vertebrates (optional)
DEVELOPMENTAL_STAGE=
Developmental stage of the library
CANSOURCE=
Cancer type used to obtain library,
“normal” if tissue is normal
VERBATIM_DEVELOPMENTAL_STAGE=Developmental
stage of the library,
details in vertebrates (optionals)
VECTOR=
Vector used to obtain the library
UniGene Tabulator software
retains data about
library in table “Lib.info_Entries”,
information is reformatted to make it available to table “SEQUENCE”
by
relationships.
METHOD
First, a detailed
description of
UniGene flat
file format
(ftp://ftp.ncbi.nih.gov/repository/UniGene/README)
has been accurately analyzed to:
1. identify characters usable as
consistent
limits for each data type;
2. convert the flat file format
into a multiple
related table series,
allowing the appropriate import for each data
type.
Our strategy is based on
importing the
downloaded file.
At the beginning, table
“SEQUENCE”, will collect
data from the
“UniGene” file selected.
The lines of the UniGene
data
file
are
delimited by
a line feed “LF”,
so each line will result in a different record.
During this first step, each
UniGene line will be tagged,
according to its starting characters as containing data:
1 about a sequence;
2 about an STS;
3 about the transcript map
position;
4 about a similar protein;
5 about general
cluster information.
There will be 5 types of line.
The software will maintain data about sequence
(line type 1) in table
"SEQUENCE";
sequence data will be parsed into corresponding fields of the same
record,
and these will be correlated to their cluster within the main table
"UniGene".
Information about sequence
tagged
sites (line type 2) found in
clusters
will be parsed in the table "STS";
information about gene
map
positions by Hybrid Radiation Map experiments
(line type 3) will be parsed in the table “TXMAP”;
information about known ortholog proteins (line type 4)
will be parsed in the table “PROTSIM”.
In table “SEQUENCE”, which
collects information
about cluster sequences,
there are some exceptions:
fields “TISSUE”, “DEV_STAGE”,
“CANCER_SOURCE”,
“VERBATIM_TISSUE” and
“VERBATIM_DEVELOPMENTAL_STAGE” are calculated
from table
“Lib.info_Entries” by
a relationship, using the key field “LID”;
fields “Lib_TITLE” and
“Lib_VECTOR” are
directly visualized from table
“Lib.info_Entries” by the latter relation.
Lines tagged
“5” are not parsed directly.
Firstly, data about the same cluster are joined and only the complete
data
will be parsed in the
“UniGene” table, where each
bit of information is
extracted
from the appropriate field.
Thus each record in this table collects data from a single cluster.
Table “Lib.info_Data”,
which is
not visible,
imports data from “.lib.info” files.
As above, information
about a
single library
is spread into more than one record.
Like “general
cluster information” above,
first library information will be joined,
and
only complete data will be parsed
in table
“Lib.info_Entries”.
Every step of this process
is
driven by a
specific FileMaker Pro script;
the software will ask the user
when a choice
is
needed.
Imqort processes of
UniGene data and Library data are independent
to each
other, one can choose to
perform both or to import only UniGene data
and subsequently to
import
library data.
The first choice will be import library information before importing
the UniGene file.
The UniGene import process
will clear ALL previously data (parsed and raw),
while importing only Library
data will delete
only library data.
In the master file, the main
layout
is “UniGene”, from the "Layout menu"
(a pop-up Menu in the top left
corner, above the small book icon).
Each record contains fields within
a
"portal", the FileMaker Pro tool for
construction of relational
databases.
In portals, each field visualizes a field of a related table,
though not
all
are visualized (to see others fields click on buttons “Table”,
or choose from the "Layout
menu").
--
The free included FM
runtime
allows free record management and
browsing,
while to create new fields for elaboration or further
relationship
definition one needs to install the FMP
application.
We encourage any creative
use,
modification and
non-commercial redistribution
of
UniGene tabulator, as long as
the original paper is cited,
and a statement is provided that the
original
program has been modified
(in such case).
The availability of
complete
UniGene datasets
in relational database format
makes for easy integration with other
biological
databases available in
the same or similar format; for example: GenBank and EntrezGene.
Each field
in each table corresponds to a "Feature
Qualifier"
according to UniGene Format.
UniGene Tabulator
USEFUL FIELDS and field type
descriptions
Table
UniGene:
“CLUSTER”
–
Cluster ID (Text field)
“TITLE”
–
Cluster title (Text field)
“GENE”
–
Gene Symbol (Text field)
“CYTOBAND”
–
Cytological band related to
the expressed locus (Text field)
“GeneID_LID”
–
Entrez Gene related identifier/LocusLink ID
(Number field)
“HOMOL”
–
Presence of known homologue proteins
(Text field)
“EXPRESS”
–
Tissue used to obtain ESTs (Text field)
“RESTR_EXP”
– Tissue
related to more than half of the ESTs
(Text field)
“POLY_A”
–
Presence of at least an
EST whth
Poly A sequence (Text field)
“CHRO”
–
Chromosome related to the expressed locus
(Number field)
“SCOUNT”
–
Number of total sequence related to
the cluster (Number field)
Table
SEQUENCE:
“NACC”
–
Sequence accession number (Text field)
“CLON”
–
Clone identifier related to the sequence
(Text field)
“END”
–
Position of the sequence referring to clone
(Number field)
“NUID”
–
Unique nucleotide sequence identifier (gi)
(Number Field)
“LIBR”
–
Library ID used to obtain the sequence
(Text field)
“PUID”
–
Unique protein sequence identifier
(used for non-DSTs)
“SEQTYPE”
–
Description of the nucleotidic sequence
(Text Field)
“TRACE”
–
The Trace ID of the EST sequence,
as provided by NCBI Trace Archive
(Text
field)
“PERIPHERAL”
–
Indicator that the sequence is a suboptimal
representative of the gene
(Text
field)
“TISSUE”
–
Tissue used to obtain library (Text field)
“DEV_STAGE”
–
Developmental stage of the tissue used to
obtain library (Text field)
“CANCER_SOURCE”
–
Descriptions of the tissue used (Text field)
“VERBATIM_TISSUE”
–
Detailed description of the tissue for
vertebrate organisms (Text field)
“VERBATIM_DEVELOPMENTAL_STAGE” –
Detailed
description of the developmental
stage for vertebrate organisms (Text
field)
Table STS:
“NAME”
–
Name of the STS related to a sequence
(Text field)
“ACC”
–
Genbank accession number of the STS
(Text field)
“DSEG”
–
GDB Dsegment number (Text field)
“UNISTS”
– Identifier in NCBI's UNISTS database
(Text field)
Table TXMAP:
“MARKER”
–
Marker found on at least one sequence
in this
cluster (Text field)
“RHPANEL”
–
Radiation Hybrid panel used to place marker
(Text field)
Table PROTSIM:
“ORG”
–
Organism of the ortholog protein
(Text field)
“PROTGI”
–
Sequence GI of ortholog protein
(Text field)
“PROTID”
–
Sequence ID of ortholog protein
(Text field)
“PCT”
–
Percent alignment (Number field)
“ALN”
– Length of aligned region (aa)
(Number field)
INSTALLATION
Once decompressed,
UniGene Tabulator can
readily be used.
GENERAL DEFINITIONS
File
A set of database tables.
Table
A set of records pertaining
to the same subject.
Record
One set of fields which
constitute one
entry.
The record browser is a small
book icon
at the
top left of the window.
You may browse the
database by
clicking on the
book pages,
or enter a record number and click on the "Return" key.
You
constantly visualize the following
information:
Records: total number of Records in the table
Found: total number of Records currently selected
Sorted: sorting status of the Records
(Sorted/Unsorted)
Field
One area of the record
containing a specific
data type.
Browse
Mode
One way to use the database.
It allows data entry, viewing,
browsing,
sorting, manipulation.
It may be selected from:
the
"View" menu, or
the
mode pop-up Menu bar, at the bottom left of the window.
Find
Mode
An alternative mode of using the
database.
It allows you to search for
specific
content in the
database fields,
using any different combination
of criteria
(see
the "Search mode"
section below for details about searching).
It may be selected from:
the "View" menu, or
the
mode pop-up Menu bar, at the bottom left of the window.
Preview
Mode
An alternative way to use the
database.
It visualizes a print preview of
the
records found.
It may be selected from:
the "View" menu, or
the
mode pop-up Menu bar, at the bottom left of the window.
Layout
A particular graphical
organization of the
field of a table.
A file may show data within
different layouts.
A layout may display fields from
a table or
its
related fields from
other tables.
Visualization of a field is
independent of
the storage of the data contained.
USE
1. Download UniGene flat file
Download the
UniGene file with
the format ".data.gz"
for the
organism desired via ftp at:
ftp://ftp.ncbi.nih.gov/repository/UniGene/
(decompress the files when
appropriate).
Download the corresponding
library information file with the format
".lib.info.gz".
The UniGene page
containing the ftp "UniGene" download link may
also be reached from within the software using the
“Download UniGene data” button. This invokes the default browser
and makes it open a page containing the “Downlad UniGene” link on
the left side blue bar.
Should you be asked for user “Name” and “Password”,
type “anonymous” and your e-mail address, respectively.
At the
end of this step, the users should have
two text files
containing clusters data and library
information.
Import process require that such files are be localized in
the UniGene Tabulator folder, renamed as follows:
cluster data file ->
cluster.data
library data file ->
library.data
Be sure that the file extension is ".data" and not ".txt".
2.
Import UniGene clusters and/or
library information.
Different
UniGene Tabulator databases may be obtained by duplicating
the fresh
"UniGene Tabulator" folder and starting new import sessions.
Records from different database tables may then be exchanged among
different .UGT databases.
IMPORTANT.
Do not import the same text file more than once into
UniGene Tabulator database; download or decompress
the files
again if you need to repeat the import twice.
The
".tab" text files provided along with the distribution are only
illustratory outputs from the program, and are not intended to be
reimported into UniGene Tabulator, which is designed to import and
parse the original UniGene format data files.
Open the "UniGene tabulator" file
in
the "UniGene Tabulator" folder.
Advanced
use:
You may open the program files using your
copy of FileMaker Pro 8 or later,
thus being fully able to use any modification in
the software.
In this case, do not open the program using
the "UniGene Tabulator" file,
but open the file "UniGene.UGT" with your FileMaker.
Following modifications, correct function
of the program requires that
you relaunch it by "UniGene Tabulator"
runtime,
due to data pathway structure stored in the “UniGene Tabulator” scripts.
Click on the "Import
UniGene"
button.
This starts both importing and parsing of the data.
Select options from the dialog boxes when required.
You may choose if
you want
to import
Library information too.
You can import library
information later
by clicking on “Import Library” button.
The time required to obtain a
completely parsed UniGene
database mainly
depends on the total cluster number and on the total number of
GenBank sequences composing the clusters. Complete parsing for large
data files
may require up to several days of
calculation.
Precomputed databases for Homo
sapiens and Danio rerio
are provided at:
http://apollo11.isto.unibo.it/software/UniGene_Tabulator/
Each field
in each table corresponds to a data type typical of the UniGene Format.
Since Unigene Tabulator 1.1 version:
following parsing of UniGene data
files, the software will create
the UniGene.tab file within
the 'UniGene Tabulator' folder.
3.
Use UniGene Tabulator as database.
The FileMaker Pro-based database
may be used
basically in these "modes":
"Browse", "Find" and
"Preview".
Switching among different modes
can be obtained
from the "View" Menu
or from the pop-up Menu bar at the bottom
left of the
window.
BROWSE
MODE (“NAVIGATION”)
In the "Browse" mode, one can
browse among
the
record sets by clicking on
the small book icon in the upper
left
corner, or move up and
down between
entries
using buttons
at the top left of the UniGene layout.
Browsing among
the tables can be done by clicking on the “Table” buttons
in the desired section
(Sequence,
Protein similarity, STS, Transcript
Map).
Alternatively, you can move among the
tables by clicking
on the "Layout" pop-up Menu at the
upper left corner.
SEARCH
MODE (“FIND”)
In the "Find" mode, the
small
book
icon in the upper left corner
represents different "requests"
that
are made for searching in the database.
In the "Find" mode, the user can
fill
in a blank form
allowing searching
in specific fields.
When searching in the master
table, if one
entry contains various recurrences
of a feature, all related records of
the
respective feature are displayed.
In FileMaker Pro "Find" mode, the
"AND" - "OR" - "NOT" operators may be used
in
this way:
"AND" by filling in different fields located in the same
"request",
"OR" by
generating additional
requests
(from "Requests" Menu) in
the same query,
"NOT" by generating additional
requests (from "Requests" Menu)
and checking the "Omit"
box.
The "Symbols" pop-up Menu in the
"Find" mode allows querying of
exact matches, ranges, duplicates,
wildcards and more.
The searching results are entry
subsets
matching the criteria desired.
PREVIEW
MODE (“PRINT”)
In the "Preview" mode, one can
obtain a print preview of the data
in the actual table.
Browsing among
the tables can be done by clicking on the “Table” buttons
in the desired section
(Sequence,
Protein similarity, STS, Transcript
Map).
Alternatively, you can move up and down among the
tables by clicking
on the "Layout" pop-up Menu at the
upper left corner.
“UniGene
Tabulator” FUNCTIONS AND MENU
COMMANDS
“UniGene Tabulator” MENU
About
FileMaker Pro RUNTIMES
Shows information about the
software in a
new window.
Preferences
Standard preferences panel,
memory can be set up to 256 Mb.
Quit
UniGene Tabulator
Close the program (same as to
click on the cross
button
on the right upper corner of UniGene window).
FILE
MENU
File
Options
In this application it
is possible to set only the "Spelling" option.
Change Password
There is no default password.
Page
setup
Standard page set-up command.
Print
Standard print command; you can
choose to
print:
all records in the "Found" set, or
only the current record, or
a
"blank" mask of the record fields.
The appearance will be
that of
the layout
currently selected from the layout Menu.
Import
Records
This is the general "Import"
function
of FileMaker Pro.
Use only "Import UniGene"
function
for correct UniGene file import,
from the "Actions" Menu, or
clicking
on the "Import UniGene"
button in
"UniGene tabulator" file.
Export
records
Export command for the found
records set in a given table.
Records are exported in their current sorting.
Users can select fields to be
exported, their relative order,
and the separation character.
The option “ALL” will
export all
fields (from
all tables)
into a Unicode UTF-16 file (default parameters).
Save/Send
Record as
An alternative export
function. It exports data from the current record,
or the find set of records, into an "Excel" file (.xls).
Send Mail
To send data from each record in
the found set by single e-mail.
Save
a Copy as
Save a copy of the database,
complete,
compressed or
as clone (database structure with
no record
present).
EDIT MENU
Undo
Standard "Undo" command.
Cut
Standard "Cut" text command.
Copy
Standard "Copy" text command.
Paste
Standard "Paste" text command.
Clear
Deletion of selected text.
Select
all
Selection of all the text
within
a selected
field
(to select a field, click on
the field).
Find/Replace
Utility for search/replace text
strings within
fields.
Note: Use "Find" mode (from
"View" Menu)
for full
search and selection of a
record set.
Spelling
Utility for ceck spelling of text
strings within
fields.
Export
Field Contents
Utility to export the contents of
the field selected to a file.
VIEW
MENU
Browse
Mode
Switch to the "Browse Mode" (see
"General Definitions" above).
Find
Mode
Switch to the "Find Mode" (see
"General Definitions" above).
Preview
Mode
Switch to the "Preview Mode" (see
"General Definitions" above).
View as Form
A possible way to display
individually the current record of a
found set of records.
Got
to layout
A possible way to switch between
different layout:
UniGene, SEQUENCES, STS, TXMAP, PROTSIM.
View as List
A possible way to display
all the
records of a found set
in list form.
View as Table
A possible way to display all the
records of a found set as a
spreadsheet-like table.
Toolbars
To switch on/off the toolbars of
the application: "Standard"
and "Text Formatting".
Status
Area
To switch on/off the "Status
Area", the left column toolbar.
Text
Ruler
To switch on/off the text ruler
of the application.
Zoom
in
To increase layout dimensions,
same as "Zoom +" button.
Zoom
out
To decrease layout dimensions,
same as "Zoom -" button.
RECORD
MENU
New
Record
Create a new empty record in the
database.
The new Record will be the latest
in the current
record set.
Duplicate
Record
Duplicate the current record in
the database.
The new Record will be the
latest
of the current
record set.
Delete
Record
Delete the current record in the
database.
Delete
All Records
Delete all currently found
records in the
database.
Go
to Record
To move to the selected record by
number, previous or next.
Show
All Records
Show all the records in the
database.
Show
Omitted Only
Show records in the
database outside of the current found set.
Omit
Record
Remove the selected record out of
the current found set,
without deleting it.
Omit
Multiple
Remove more then one record,
selected by numbers, out of the current
found set, without deleting them.
Modify
Last Find
Return to the last performed
search to edit it.
Sort
Records
Sort the current records set
according to criteria desired.
Unsort
Sort the current records set
according to the order insertion.
Replace
Field Contents
Replace the value of a field in
all found sets of records with
the value specified in the current record, or by calculation.
Relookup
Field Contents
Relookup the value of a field by
a matching by a selected key
field.
Revert Record
Restore the value of a field,
discard any changes, before clicking out of
that field.
ACTIONS MENU
Import
UniGene data
Import data from a file in
UniGene
flat file format (.data).
(equivalent to the
"Import UniGene" button in the software
main window).
Since Unigene Tabulator 1.1 version:
following parsing of UniGene data
files, the software will create
the UniGene.tab file within
the 'UniGene Tabulator' folder.
This file is a text tabulated file and contains four columns:
NACC
CLUSTER
GENE
NUID
[GenBank
[UniGene
[Gene
Symbol]
[GenBank
Accession number] Cluster
ID]
GI]
See also:
http://www.ncbi.nlm.nih.gov/Sitemap/sequenceIDs.html
It could be useful to readily convert GenBank Accession number into
Gene Symbol, for meta-analysis purposes.
Import
Library info
Import data
from a file in “.lib.info”
flat file format.
(equivalent to the
"Import Library"
button in the software main window).
Export
From SEQUENCE
This action will export
data clustered by ACCESSION
NUMBER information,
from the current set of found sequence records
(from all sequence records if no record subset is currently found).
This action
shows two possibilities:
1. All – Each GENBANK
ACCESSION
NUMBER will be exported along with all
the related information in a tabulated form
(i.e. all fields presents in the table
"SEQUENCE", in this order:
CLUSTER, TITLE, GENE, CYTOBAND, GeneID_LID, HOMOL,
EXPRESS, RESTR_EXPR,
POLY_A, CHRO, SCOUNT, NACC, CLON, END, NUID, LIBR, PUID,
MGC, SEQTYPE,
TRACE, PERIPHERAL, TISSUE, DEV_STAGE, CANCER_SOURCE,
VERBATIM_TISSUE,
VERBATIM_DEVELOPMENTAL_STAGE);
2. Custom –
User can choose the fields to be exported
(i.e. certain selected fields among those
described above).
User must choose name and
position of the
output file.
The same action starts if you click
on button "Export Sequence" into
main layout.
Exporting from other data tables
may
be easily performed by
choosing the layout of interest, then using the general "Export
Record..."
command in the "File" Menu.
Erase
Data
Two possibilities are shown:
1 "Delete raw data": delete only
original raw
data about library information.
It may be useful to "clean" the database following parsing.
Use this
option to reduce the file size.
2 "Delete ALL data": delete all
data in the database tables,
including original flat file raw data and parsed data.
HELP
MENU
About
UniGene Tabulator
This command shows information
about the
software in a
new window.
UniGene
Tabulator Help
This command shows
the UniGene tabulator tutorial in a new
window.
OTHER
FUNCTIONS IN THE MAIN LAYOUT
The
mouse pointer is shown as an
hand over the buttons.
Clicking on “Cluster” word of the
title will
open
the actual UniGene record for the
current
cluster,
in the default browser.
Clicking on the arrow
right to
“PROTGI” field
in the "Protein similarity" section will open
the corresponding record
of the
Entrez “Protein” database
in the default browser.
Clicking on the tag of the
“GeneID/LID” field
will open,
the corresponding record of the Entrez ”Gene” database
in the default browser.
---
PROBLEMS
Sometimes, power failure,
hardware problems, or
other factors can damage a
FileMaker database file.
When the runtime application
discovers a
damaged file, a dialog box appears,
telling the users to contact the
creator.
Even
if the dialog box does not appear, files can exhibit erratic behavior.
If you have FileMaker Pro or Developer
installed you can recover it by
using the “Recover command”.
Otherwise in
Windows machines, press Ctrl +
Shift while double-clicking the
runtime application icon.
Hold the
keys down
until you see the open Damage File dialog box.
During the recovery process, the
runtime
application:
1. creates a new file;
2. renames any damaged file by
adding “Old” to
the end of the filename;
3. gives the repaired file the
original name.
---
Software limits
Due to FileMaker Pro 8
limits,
maximum UniGene Tabulator file size is 8 terabytes (1024
gigabytes).
Text fields can contain up to 2GB of characters,
numbers fields can
contain
values up to 800 digits.
Unigene Tabulator parsed
UniGene build Hs.190, including library information,
in about 3,5 days (on a
Pentium 4 1,80GHz).
Technical notes
The scripts at the core of
UniGene Tabulator
software are "FileMaker Pro" scripts.
Bugs report
Please report any suggestion, bug
or problem
to:
pierluigi.strippoli@unibo.it
l.lenzi@unibo.it