UniGene
Tabulator
1.1
Tutorial (Windows
version)
Citation:
Lenzi L, Frabetti
F,
Facchin F, Casadei R, Vitale L, Canaider S, Carinci P, Zannotti M,
Strippoli P.
UniGene
Tabulator: a full parser for the UniGene format.
Bioinformatics. 2006 Oct 15;22(20):2570-1. Epub
2006 Aug 7
INTRODUCTION
This online Guide is
designed
for detailed
documentation of
UniGene Tabulator 1.1 software
It teaches how to install the
software and how to
import the desired
UniGene
entries in the database.
Please
refer to the user Guide
for more detailed information.
Download UniGene_Tabulator
1.1
for Mac OS X
from address:
http://apollo11.isto.unibo.it/software/
Choose the file: UGTabWin.zip
The downloaded file should be
automatically
decompressed,
generating a "UniGene Tabulator"
folder.
Failing this, the decompression needs
the “Unzip” utility.
The UniGene
Tabulator Folder
contains:
"UniGene Tabulator.exe" file
(runtime
application);
"UniGene.UGT"
(database file);
the “FMP
Acknowledgements.pdf” file;
the
“Extensions” folder,
containing a "Dictionaries" folder with the
dictionary
file for
supported languages and
an “English” folder with 3 files;
40 ".dll" files;
the “Win_Tutorial” and “Win_Guide” folders
contain
a copy of the on-line documentation, for local
(off-line) use.
Please do not change
the name of all files and folders
of the "UniGene Tabulator" software.
You may download multiple copies
of "UniGene Tabulator"
and run them simultaneously,
provided that each "UniGene Tabulator" folder
is located
in a different directory.
UniGene Tabulator
is based
on
FileMaker Pro 8
(FileMaker Pro, Inc.)
database management software (www.filemaker.com/index.html),
and it released as a FileMaker Pro 8 template,
along with a free runtime
application able to run "FileMaker Pro"
at the core of the software.
INSTALLATION
Once decompressed,
UniGene Tabulator can
readily be used.
GENERAL
DEFINITIONS
File
A set of database tables.
Table
A set of recorpertaining
to the same subject.
Record
One set of fields which
constitute one
entry.
The record browser is a small
book icon at the
top left of the window.
You may browse the
database by
clicking on the
book pages,
or enter a record number and click on the "Return" key.
You
constantly visualize the following
information:
Records: total
number of Records in the table
Found: total number
of Records currently selected
Sorted: sorting
status of the Records
(Sorted/Unsorted)
Field
One area of the record
containing a specific
data type
Browse
Mode
One way to use the database.
It allows data entry, viewing,
browsing,
sorting, manipulation.
It may be selected from:
the "View"
menu, or
the mode
pop-up Menu bar, at the bottom left of the window.
Find
Mode
An alternative mode to use the
database.
It allows searching for specific
content in the
databases fields,
using any different combination
of criteria
(see the "Search mode"
section below for details about searching).
It may be selected from:
the "View" menu, or
the mode pop-up Menu bar, at the bottom left
of the window.
Preview
Mode
An alternative way to use the
database.
It visualizes a print preview of
the found
records.
It may be selected from:
the
"View" menu,
or the pop-up Menu
bar, at the bottom left of the window.
Layout
A particular graphical
organization of the
field of a table.
A file may show data within
different layouts.
A layout may display fields from
a table or
its
related fields from other tables.
Visualization of a field is
independent from
the storage of the contained data.
USE
1.Download
UniGene
flat files
Download the UniGene file with
the format ".data.gz"
for the organism desired via ftp at:
ftp://ftp.ncbi.nih.gov/repository/UniGene/
(decompress the files when
appropriate).
Download the corresponding
library information file with the format
".lib.info.gz".
The UniGene page containing the ftp "UniGene" download link may
also be reached from within the software using the
“Download UniGene data” button.
This invokes the default browser and makes it open a page
containing the “Downlad UniGene” link on the left side blue bar.
Should you be asked for user “Name” and “Password”,
type “anonymous” and your e-mail address, respectively.
At the
end of this step, the users should have
two text files
containing clustes data and library
information.
Import process require that such files are be localized in
the
application folder, renamed as follows:
cluster data file ->
cluster.data
library data file ->
library.data
Be sure that the file
extension is ".data" and not ".txt".
2.
Import UniGene clusters and/or
library information
Different
UniGene Tabulator databases may be obtained by duplicating
the fresh
"UniGene Tabulator" folder and starting new import sessions.
Records from different database tables may then be exchanged among
different .UGT databases.
IMPORTANT.
Do not import
the same text file more
than once into
UniGene Tabulator database; download or decompress
the files
again if you need to repeat the import twice.
The
".tab" text files provided along with the distribution are
only illustratory outputs from the program, and are not intended
to be reimported into UniGene Tabulator, which is designed to import
and parse the original UniGene format data files.
Open the "UniGene tabulator" file
into
the "UniGene Tabulator" folder.
Click on the "Import UniGene"
button.
This starts both importing and parsing of the data.
Select options from the dialog boxes when required.
First, you may choose if
you want
to import
Library information too.
If you choose “Yes” you will be
asked to
select the file “.lib.info”
and then the file “.data”;
if you choose “No” you will
be asked to select only the “.data” file.
You can import library
information later
clicking on “Import Library” button.
The time required to obtain a
completely parsed UniGene
database mainly
depends on the total cluster number and on the total number of
GenBank sequences composing the clusters. Complete parsing for large
data files
may require up to several days of
calculation.
Precomputed databases for Homo sapiens and Danio rerio are provided at:
http://apollo11.isto.unibo.it/software/UniGene_Tabulator
Since Unigene Tabulator 1.1 version:
following parsing of UniGene data
files, the software will create
the UniGene.tab file within
the 'UniGene Tabulator' folder.
Layout appearance
may be adjusted
using
"Zoom +"/"Zoom -" buttons,
or clicking on the small
resizing buttons at the bottom left corner
of any window.
Each field
in each table corresponds to a data type
typical of the UniGene Format.
3.
Use UniGene Tabulator as database.
The FileMaker Pro-based database
may be used
basically in these "modes":
"Browse", "Find" and
"Preview".
Switching among different modes
can be obtained
from the "View" Menu
or from the pop-up Menu bar at the bottom
left of the
window.
BROWSE MODE
(“NAVIGATION”)
In the "Browse" mode,
one can browse among
the
record sets by
clicking on the small book icon in the upper
left
corner:
Alternatively, you can move up
and down among the entries
using buttons
at the top left of the UniGene layout:
Browse among
the tables can be made clicking on the “Table” button
in the desired section (Sequence, Protein similarity, STS, Transcript
Map).
Alternatively, you can
move up and down among the
tables clicking
on the "Layout" pop-up Menu at the upper left corner.
SEARCH MODE (“FIND”)
In the "Find" mode, the
small
book
icon in the upper left corner
represents different "requests"
that
are made for searching the database.
In the "Find" mode, the user can
fill
a blank form
allowing searching
in specific fields.
When searching in the master
"UniGene" table,
if one
entry contains various recurrences of a feature,
all related records of
the
respective feature are displayed.
In FileMaker Pro "Find" mode, the
"AND" - "OR" - "NOT" operators may be used
in
this way:
"AND" by filling in different fields located in the same "Request",
"OR" by
generating additional requests
(from "Requests" Menu) in
the same query,
"NOT" by generating additional
requests (from "Requests" Menu)
and checking the "Omit" box.
The "Symbols" pop-up Menu in the
"Find" mode allows query of
exact matches, ranges, duplicates,
wildcards and more.
The searching results are entries
subsets
matching the desired criteria.
PREVIEW MODE
(“PRINT”)
In the "Preview" mode, one can
obtain a print preview of the data
in the actual table.
Browsing among
the tables can be done by clicking on the “Table” buttons
in the desired section
(Sequence,
Protein similarity, STS, Transcript
Map).
Alternatively, you can move up and down among the
tables by clicking
on the "Layout" pop-up Menu at the
upper left corner.
“UniGene
Tabulator”
FUNCTIONS AND MENU COMMANDS
UniGene Tabulator
MENU
About
FileMaker Pro RUNTIMES
Shows information about the
software into a
new window.
Preferences
Standard preferences panel, memory can be set up to 256 Mb.
Quit
UniGene Tabulator
Close the program (same as to
click on the red
button
on the left upper corner of UniGene window).
FILE MENU
File
Options
In this application it
is possible to set only the "Spelling" options.
Change Password
There is no a default password.
Page
setup
Standard page set up command.
Print
Standard print command; you can
choose to
print:
all records in the "Found" set, or
only the current record, or
a "blank" mask of the record
fields.
The appearance will be that of
the layout
currently selected from the layout
Menu.
Import
Records
This is the general "Import"
function
of FileMaker Pro.
Use only "Import UniGene"
function
for correct UniGene file import,
from the "Actions" Menu, or clicking
on the "Import UniGene"
button in
"UniGene tabulator" file.
Export
records
Export command for the found
records set in a given table.
Records are exported in their current sorting.
Users can select fields to be
exported, their relative order,
and the separation character.
The option “ALL” will
export all
fields (from
all tables)
into a Unicode UTF-16 file (default parameters).
Save/Send
Record as
An alternative export
function. It export data from the current record,
or the find set of record, into an "Excel" file (.xls).
Send Mail
To send data from each record in
the found set by single e-mail.
Save
a Copy as
Save a copy of the database,
complete,
compressed or
as clone (database structure with
no record
present).
EDIT
MENU
Undo
Standard "Undo" command.
Cut
Standard "Cut" text command.
Copy
Standard "Copy" text command.
Paste
Standard "Paste" text command.
Clear
Deletion of selected text.
Select
all
Selection of all the text
within
a selected
field
(to select a field, click into
the field).
Find/Replace
Utility for search/replace text
strings within
fields.
Note: Use "Find" mode (from
"View" Menu)
for full search and selection of a
record set.
Spelling
Utility for ceck spelling of text
strings within
field.
Export
Field Contents
Utility to export the contents of
the selected field to a file.
VIEW MENU
Browse
Mode
Switch to the "Browse Mode" (see
"General Definitions" above).
Find
Mode
Switch to the "Find Mode" (see
"General Definitions" above).
Preview
Mode
Switch to the "Preview Mode" (see
"General Definitions" above).
View as Form
A possible way to display
individually the current record of a
found set of records.
Got
to layout
A possible way to switch between
different layout:
UniGene, SEQUENCES, STS, TXMAP, PROTSIM.
View
as List
A possible way to display all the
records of a found set as list.
View
as Table
A possible way to display all the
records of a found set as
spreadsheet-like table.
Toolbars
To switch on/off the toolbars of
the application: "Standard"
and "Text Formatting".
Status
Area
To switch on/off the "Status
Area", the left column toolbar.
Text
Ruler
To switch on/off the text ruler
of the application.
Zoom
in
To increase layout dimensions,
same as "Zoom +" button.
Zoom
out
To decrease layout dimensions,
same as "Zoom -" button.
RECORDS
MENU
New
Record
Create a new empty record in the
database.
The new Record will be the latest
of the current
record set.
Duplicate
Record
Duplicate the current record in
the database.
The new Record will be the
latest
of the current
record set.
Delete
Record
Delete the current record in the
database.
Delete
All Records
Delete all currently found
records in the
database.
Go
to Record
To move to the selected record by
number, previous or next.
Show
All Records
Show all the records in the
database.
Show
Omitted Only
Show records in the
database outside of the current found set.
Omit
Record
Remove the selected record out of
the current found set,
without deleting it.
Omit
Multiple
Remove more then a record,
selected by numbers,
out of the current found set, without deleting them.
Modify
Last Find
Return to the last performed
search to edit it.
Sort
Records
Sort the current records set
according to
desired criteria.
Unsort
Sort the current records set
according to the order insertion.
Replace
Field Contents
Replace the value of a field into
all found set of record with
the value specified in the current record, or by calculation.
Relookup Field Contents
Relookup the value of a field by
a matching by a selected key
field.
Revert Record
Restore the value of a field,
discard any changes,
before to click out of that field.
ACTIONS
MENU
Import
UniGene data
Import data from a file in
UniGene
flat file format (.data).
(equivalent to the
"Import UniGene" button in the software
main window).
Since Unigene Tabulator 1.1 version:
following parsing of UniGene data
files, the software will create
the UniGene.tab file within
the 'UniGene Tabulator' folder.
This file is a text tabulated file and contains four columns:
NACC
CLUSTER
GENE
NUID
[GenBank
[UniGene
[Gene
Symbol]
[GenBank
Accession number] Cluster
ID]
GI]
See also:
http://www.ncbi.nlm.nih.gov/Sitemap/sequenceIDs.html
It could be useful to readily convert GenBank Accession number into
Gene Symbol, for meta-analysis purposes.
Import
Library info
Import data
from a file in “.lib.info”
flat file format.
(equivalent to the
"Import Library"
button in the software main window).
Export
data from SEQUENCE
This command is also
available as a button named "Export Sequence"
on the
main layout:
This action will export
data clustered by ACCESSION
NUMBER information,
from the current set of found sequence records
(from all sequence records if no record subset is currently found).
This action
shows two possibility:
1. All – Each GENBANK
ACCESSION
NUMBER will be exported along with all
the related information in a tabulated form
(i.e. all fields presents in the table
"SEQUENCE", in this order:
CLUSTER, TITLE, GENE, CYTOBAND, GeneID_LID, HOMOL,
EXPRESS, RESTR_EXPR,
POLY_A, CHRO, SCOUNT, NACC, CLON, END, NUID, LIBR, PUID,
MGC, SEQTYPE,
TRACE, PERIPHERAL, TISSUE, DEV_STAGE, CANCER_SOURCE,
VERBATIM_TISSUE,
VERBATIM_DEVELOPMENTAL_STAGE);
2. Custom –
User can choose the fields to be exported
(i.e. only selected fields among those
described above).
User must choose name and
position of the
output file.
The same action starts
clicking
on button "Export Sequence" into
main layout.
Export from other data tables may
be easily performed
choosing the layout of interest, then using the general "Export
Record..."
command in the "File" Menu.
Erase
Data
Two possibilities are shown:
1 "Delete raw data": delete only
original raw
data about library information.
It may be useful to "clean" the database following parsing.
Use this
option to reduce the file size.
2 "Delete ALL data": delete all
data in the database tables,
including original flat file raw data and parsed data.
HELP MENU
Info about
UniGene Tabulator
This command shows information
about the
software in a
new window.
UniGene
Tabulator Help
This command shows
this tutorial
about
the software in a new
window.
OTHER FUNCTIONS
IN THE MAIN LAYOUT
The mouse pointer is shown as an
hand over the buttons.
Clicking on “Cluster” word of the
title will
open
the actual UniGene record for the
current
cluster,
in the default browser.
Clicking on the arrow
right to
“PROTGI” field
in the "Protein similarity" section will open
the corresponding record
of the
Entrez “Protein” database,
in the default browser.
Clicking on the tag of the
“GeneID/LID” field
will open
the corresponding record of the Entrez "Gene” database,
in the default browser.
---
PROBLEMS
Sometimes, power failure,
hardware problems, or
other factors can damage a
FileMaker database file.
When the runtime application
discovers a
damaged file, a dialog box appears,
telling the users to contact the
creator.
Even
if the dialog box does not appear, files can exhibit erratic behavior.
If you have FileMaker Pro or Developer
installed you can recover it by
using the “Recover command”.
Otherwise in
Windows machines, press Ctrl +
Shift while double-clicking the
runtime application icon.
Hold the
keys down
until you see the open Damage File dialog box.
During the recovery process, the
runtime
application:
1. creates a new file;
2. renames any damaged file by
adding “Old” to
the end of the filename;
3. gives the repaired file the
original name.
--
Software limit
Due to FileMaker Pro 8
limits,
maximum UniGene Tabulator file size is 8 terabytes (1024
gigabytes).
Text fields can contain up to 2GB of characters,
numbers fields can
contain
values up to 800 digits.
Unigene Tabulator parsed
UniGene build Hs.190, including library infirmation,
in about 3,5 days (on a
Pentium 4 1,80GHz).
Technical notes
The scripts at the core of
UniGene Tabulator
software are "FileMaker Pro" scripts.
Bugs report
Please report any suggestion, bug
or problem
to:
pierluigi.strippoli@unibo.it
l.lenzi@unibo.it