UniGene_Tabulator 1.0 Win Tutorial

UniGene Tabulator 1.1
Tutorial (Windows version)

Citation:
Lenzi L, Frabetti F, Facchin F, Casadei R, Vitale L, Canaider S, Carinci P, Zannotti M, Strippoli P.
UniGene Tabulator: a full parser for the UniGene format.
Bioinformatics. 2006 Oct 15;22(20):2570-1. Epub 2006 Aug 7

INTRODUCTION

This online Guide is designed for detailed documentation of
UniGene Tabulator 1.1 software
It teaches how to install the software and how to import the desired
UniGene entries in the database.
Please refer to the user Guide for more detailed information.

Download UniGene_Tabulator 1.1 for Mac OS X from address:
http://apollo11.isto.unibo.it/software/

Choose the file: UGTabWin.zip

The downloaded file should be automatically decompressed,
generating a "UniGene Tabulator" folder.
Failing this, the decompression needs the “Unzip” utility.

The UniGene Tabulator Folder contains:
"UniGene Tabulator.exe" file (runtime application);
"UniGene.UGT" (database file);
the “FMP Acknowledgements.pdf” file;
the “Extensions” folder,
    containing a "Dictionaries" folder with the dictionary file for
    supported languages and an “English” folder with 3 files;
40 ".dll" files;
the “Win_Tutorial” and “Win_Guide” folders contain
    a copy of the on-line documentation, for local (off-line) use.

Please do not change the name of all files and folders
of the "UniGene Tabulator" software.

You may download multiple copies of "UniGene Tabulator"
and run them simultaneously,
provided that each "UniGene Tabulator" folder is located
in a different directory.

UniGene Tabulator is based on FileMaker Pro 8 (FileMaker Pro, Inc.)
database management software (www.filemaker.com/index.html),
and it released as a FileMaker Pro 8 template,
along with a free runtime application able to run "FileMaker Pro"
at the core of the software.

INSTALLATION
Once decompressed, UniGene Tabulator can readily be used.

GENERAL DEFINITIONS

File
A set of database tables.

Table
A set of recorpertaining to the same subject.

Record
One set of fields which constitute one entry.
The record browser is a small book icon at the top left of the window.

You may browse the database by clicking on the book pages,
or enter a record number and click on the "Return" key.
You constantly visualize the following information:
Records: total number of Records in the table
Found: total number of Records currently selected
Sorted: sorting status of the Records (Sorted/Unsorted)

Field
One area of the record containing a specific data type

Browse Mode
One way to use the database.
It allows data entry, viewing, browsing, sorting, manipulation.
It may be selected from:
                        the "View" menu, or
                        the mode pop-up Menu bar, at the bottom left of the window.

Find Mode
An alternative mode to use the database.
It allows searching for specific content in the databases fields,
using any different combination of criteria
    (see the "Search mode" section below for details about searching).
It may be selected from:
            the "View" menu, or
            the mode pop-up Menu bar, at the bottom left of the window.

Preview Mode
An alternative way to use the database.
It visualizes a print preview of the found records.
It may be selected from:
            the "View" menu,
            or the pop-up Menu bar, at the bottom left of the window.

Layout
A particular graphical organization of the field of a table.
A file may show data within different layouts.
A layout may display fields from a table or
its related fields from other tables.
Visualization of a field is independent from the storage of the contained data.

USE

1.Download UniGene flat files

Download the UniGene file with the format ".data.gz"
for the organism desired via ftp at:
ftp://ftp.ncbi.nih.gov/repository/UniGene/
(decompress the files when appropriate).
Download the corresponding library information file with the format
".lib.info.gz".

The UniGene page containing the ftp "UniGene" download link may
also be reached from within the software using the
“Download UniGene data” button.
This invokes the default browser and makes it open a page
containing the “Downlad UniGene” link on the left side blue bar.
Should you be asked for user “Name” and “Password”,
type “anonymous” and your e-mail address, respectively.

At the end of this step, the users should have two text files
containing clustes data and library information.
Import process require that such files are be localized in
the application folder, renamed as follows:
cluster data file -> cluster.data
library data file -> library.data

Be sure that the file extension is ".data" and not ".txt".

2. Import UniGene clusters and/or library information

Different UniGene Tabulator databases may be obtained by duplicating
the fresh "UniGene Tabulator" folder and starting new import sessions.
Records from different database tables may then be exchanged among
different .UGT databases.

IMPORTANT. Do not import the same text file more than once into
UniGene Tabulator database; download or decompress the files
again if you need to repeat the import twice.

The ".tab" text files provided along with the distribution are
only illustratory outputs from the program, and are not intended
to be reimported into UniGene Tabulator, which is designed to import
and parse the original UniGene format data files.

Open the "UniGene tabulator" file into the "UniGene Tabulator" folder.

Click on the "Import UniGene" button.
This starts both importing and parsing of the data.
Select options from the dialog boxes when required.

First, you may choose if you want to import Library information too.
If you choose “Yes” you will be asked to select the file “.lib.info”
and then the file “.data”;
if you choose “No” you will be asked to select only the “.data” file.

You can import library information later clicking on “Import Library” button.

The time required to obtain a completely parsed UniGene database mainly
depends on the total cluster number and on the total number of
GenBank sequences composing the clusters. Complete parsing for large data files
may require up to several days of calculation.
Precomputed databases for Homo sapiens and Danio rerio are provided at:
http://apollo11.isto.unibo.it/software/UniGene_Tabulator

Since Unigene Tabulator 1.1 version:
following parsing of UniGene data files, the software will create
the UniGene.tab file within the 'UniGene Tabulator' folder.

Layout appearance may be adjusted using "Zoom +"/"Zoom -" buttons,

or clicking on the small resizing buttons at the bottom left corner
of any window.

Each field in each table corresponds to a data type
typical of the UniGene Format.

3. Use UniGene Tabulator as database.

The FileMaker Pro-based database may be used basically in these "modes":
"Browse", "Find" and "Preview".
Switching among different modes can be obtained from the "View" Menu
or from the pop-up Menu bar at the bottom left of the window.

BROWSE MODE (“NAVIGATION”)

In the "Browse" mode, one can browse among the record sets by
clicking on the small book icon in the upper left corner:

Alternatively, you can move up and down among the entries
using buttons at the top left of the UniGene layout:

Browse among the tables can be made clicking on the “Table” button
in the desired section (Sequence, Protein similarity, STS, Transcript Map).

Alternatively, you can move up and down among the tables clicking
on the "Layout" pop-up Menu at the upper left corner.

SEARCH MODE (“FIND”)

In the "Find" mode, the small book icon in the upper left corner
represents different "requests" that are made for searching the database.
In the "Find" mode, the user can fill a blank form allowing searching
in specific fields.

When searching in the master "UniGene" table,
if one entry contains various recurrences of a feature,
all related records of the respective feature are displayed.

In FileMaker Pro "Find" mode, the "AND" - "OR" - "NOT" operators may be used
in this way:

"AND" by filling in different fields located in the same "Request",

"OR" by generating additional requests
           (from "Requests" Menu) in the same query,

"NOT" by generating additional requests (from "Requests" Menu)
           and checking the "Omit" box.

The "Symbols" pop-up Menu in the "Find" mode allows query of
exact matches, ranges, duplicates, wildcards and more.

The searching results are entries subsets matching the desired criteria.

PREVIEW MODE (“PRINT”)

In the "Preview" mode, one can obtain a print preview of the data
in the actual table.
Browsing among the tables can be done by clicking on the “Table” buttons
in the desired section (Sequence, Protein similarity, STS, Transcript Map).
Alternatively, you can move up and down among the tables by clicking
on the "Layout" pop-up Menu at the upper left corner.

“UniGene Tabulator” FUNCTIONS AND MENU COMMANDS

UniGene Tabulator MENU

About FileMaker Pro RUNTIMES
Shows information about the software into a new window.

Preferences
Standard preferences panel, memory can be set up to 256 Mb.

Quit UniGene Tabulator
Close the program (same as to click on the red button
on the left upper corner of UniGene window).

FILE MENU

File Options
In this application it is possible to set only the "Spelling" options.

Change Password
There is no a default password.

Page setup
Standard page set up command.

Print
Standard print command; you can choose to print:
                  all records in the "Found" set, or
                  only the current record, or
                  a "blank" mask of the record fields.
The appearance will be that of the layout
        currently selected from the layout Menu.

Import Records
This is the general "Import" function of FileMaker Pro.
Use only "Import UniGene" function for correct UniGene file import,
from the "Actions" Menu, or clicking on the "Import UniGene" button in
"UniGene tabulator" file.

Export records
Export command for the found records set in a given table.
Records are exported in their current sorting.
Users can select fields to be exported, their relative order,
and the separation character.
The option “ALL” will export all fields (from all tables)
into a Unicode UTF-16 file (default parameters).

Save/Send Record as
An alternative export function. It export data from the current record,
or the find set of record, into an "Excel" file (.xls).

Send Mail
To send data from each record in the found set by single e-mail.

Save a Copy as
Save a copy of the database, complete, compressed or
as clone (database structure with no record present).

EDIT MENU

Undo
Standard "Undo" command.

Cut
Standard "Cut" text command.

Copy
Standard "Copy" text command.

Paste
Standard "Paste" text command.

Clear
Deletion of selected text.

Select all
Selection of all the text within a selected field
(to select a field, click into the field).

Find/Replace
Utility for search/replace text strings within fields.
Note: Use "Find" mode (from "View" Menu)
      for full search and selection of a record set.

Spelling
Utility for ceck spelling of text strings within field.

Export Field Contents
Utility to export the contents of the selected field to a file.

VIEW MENU

Browse Mode
Switch to the "Browse Mode" (see "General Definitions" above).

Find Mode
Switch to the "Find Mode" (see "General Definitions" above).

Preview Mode
Switch to the "Preview Mode" (see "General Definitions" above).

View as Form
A possible way to display individually the current record of a
found set of records.

Got to layout
A possible way to switch between different layout:
UniGene, SEQUENCES, STS, TXMAP, PROTSIM.

View as List
A possible way to display all the records of a found set as list.

View as Table
A possible way to display all the records of a found set as
spreadsheet-like table.

Toolbars
To switch on/off the toolbars of the application: "Standard"
and "Text Formatting".

Status Area
To switch on/off the "Status Area", the left column toolbar.

Text Ruler
To switch on/off the text ruler of the application.

Zoom in
To increase layout dimensions, same as "Zoom +" button.

Zoom out
To decrease layout dimensions, same as "Zoom -" button.

RECORDS MENU

New Record
Create a new empty record in the database.
The new Record will be the latest of the current record set.

Duplicate Record
Duplicate the current record in the database.
The new Record will be the latest of the current record set.

Delete Record
Delete the current record in the database.

Delete All Records
Delete all currently found records in the database.

Go to Record
To move to the selected record by number, previous or next.

Show All Records
Show all the records in the database.

Show Omitted Only
Show records in the database outside of the current found set.

Omit Record
Remove the selected record out of the current found set,
without deleting it.

Omit Multiple
Remove more then a record, selected by numbers,
out of the current found set, without deleting them.

Modify Last Find
Return to the last performed search to edit it.

Sort Records
Sort the current records set according to desired criteria.

Unsort
Sort the current records set according to the order insertion.

Replace Field Contents
Replace the value of a field into all found set of record with
the value specified in the current record, or by calculation.

Relookup Field Contents
Relookup the value of a field by a matching by a selected key field.

Revert Record
Restore the value of a field, discard any changes,
before to click out of that field.

ACTIONS MENU

Import UniGene data
Import data from a file in UniGene flat file format (.data).
(equivalent to the "Import UniGene" button in the software main window).

Since Unigene Tabulator 1.1 version:
following parsing of UniGene data files, the software will create
the UniGene.tab file within the 'UniGene Tabulator' folder.

This file is a text tabulated file and contains four columns:

NACC                 CLUSTER        GENE            NUID
[GenBank         [UniGene       [Gene Symbol]   [GenBank
Accession number]    Cluster ID]                    GI]

See also:
http://www.ncbi.nlm.nih.gov/Sitemap/sequenceIDs.html

It could be useful to readily convert GenBank Accession number into
Gene Symbol, for meta-analysis purposes.

Import Library info
Import data from a file in “.lib.info” flat file format.
(equivalent to the "Import Library" button in the software main window).

Export data from SEQUENCE
This command is also available as a button named "Export Sequence"
on the main layout:

This action will export data clustered by ACCESSION NUMBER information,
from the current set of found sequence records
(from all sequence records if no record subset is currently found).

This action shows two possibility:
1. All – Each GENBANK ACCESSION NUMBER will be exported along with all
   the related information in a tabulated form
(i.e. all fields presents in the table "SEQUENCE", in this order:
   CLUSTER, TITLE, GENE, CYTOBAND, GeneID_LID, HOMOL, EXPRESS, RESTR_EXPR,
   POLY_A, CHRO, SCOUNT, NACC, CLON, END, NUID, LIBR, PUID, MGC, SEQTYPE,
   TRACE, PERIPHERAL, TISSUE, DEV_STAGE, CANCER_SOURCE, VERBATIM_TISSUE,
   VERBATIM_DEVELOPMENTAL_STAGE);
2. Custom – User can choose the fields to be exported
(i.e. only selected fields among those described above).

User must choose name and position of the output file.

The same action starts clicking on button "Export Sequence" into main layout.

Export from other data tables may be easily performed
choosing the layout of interest, then using the general "Export Record..."
command in the "File" Menu.

Erase Data
Two possibilities are shown:
1 "Delete raw data": delete only original raw data about library information.
   It may be useful to "clean" the database following parsing.
   Use this option to reduce the file size.
2 "Delete ALL data": delete all data in the database tables,
   including original flat file raw data and parsed data.

HELP MENU

Info about UniGene Tabulator
This command shows information about the software in a new window.

UniGene Tabulator Help
This command shows this tutorial about the software in a new window.

OTHER FUNCTIONS IN THE MAIN LAYOUT

The mouse pointer is shown as an hand over the buttons.

Clicking on “Cluster” word of the title will open
the actual UniGene record for the current cluster,
in the default browser.

Clicking on the arrow right to “PROTGI” field
in the "Protein similarity" section will open
the corresponding record of the Entrez “Protein” database,
in the default browser.

Clicking on the tag of the “GeneID/LID” field will open
the corresponding record of the Entrez "Gene” database,
in the default browser.

---

PROBLEMS
Sometimes, power failure, hardware problems, or other factors can damage a
FileMaker database file.
When the runtime application discovers a damaged file, a dialog box appears,
telling the users to contact the creator.
Even if the dialog box does not appear, files can exhibit erratic behavior.
If you have FileMaker Pro or Developer installed you can recover it by
using the “Recover command”.
Otherwise in Windows machines, press Ctrl + Shift while double-clicking the
runtime application icon.
Hold the keys down until you see the open Damage File dialog box.
During the recovery process, the runtime application:
1. creates a new file;
2. renames any damaged file by adding “Old” to the end of the filename;
3. gives the repaired file the original name.

--
Software limit
Due to FileMaker Pro 8 limits,
maximum UniGene Tabulator file size is 8 terabytes (1024 gigabytes).
Text fields can contain up to 2GB of characters,
numbers fields can contain values up to 800 digits.
Unigene Tabulator parsed UniGene build Hs.190, including library infirmation,
in about 3,5 days (on a Pentium 4 1,80GHz).

Technical notes
The scripts at the core of UniGene Tabulator software are "FileMaker Pro" scripts.

Bugs report
Please report any suggestion, bug or problem to:
pierluigi.strippoli@unibo.it
l.lenzi@unibo.it