UniGene_Tabulator Win Guide

UniGene Tabulator 1.1
Guide (Windows version)

Citation:
Lenzi L, Frabetti F, Facchin F, Casadei R, Vitale L, Canaider S, Carinci P, Zannotti M, Strippoli P.
UniGene Tabulator: a full parser for the UniGene format.
Bioinformatics. 2006 Oct 15;22(20):2570-1. Epub 2006 Aug 7

INTRODUCTION

This online Guide is designed for detailed documentation of
UniGene Tabulator 1.1 software.
A quick illustrated Tutorial guide on how to install the software
and import the desired UniGene clusters is also available.

--

UniGene Tabulator is a software solution designed to
manage UniGene biological flat files.
It implements a structured representation of each UniGene’s format fields,
importing data into a common database managing system,
which can be used in a local personal computer
(Macintosh and Windows environments).

This database (collection of related tables) enables one to index, retrieve
or export UniGene information.
More sophisticated functions are possible if one uses FileMaker Pro 8 or better.

Minimal requirements are:
Macintosh OS X: 10.3.9
Windows OS: 2000 (Service Pack 4) or XP (Service Pack 2)

Download UniGene_Tabulator 1.1 for Windows from address:
http://apollo11.isto.unibo.it/software/

Choose the file: UGTabWin.zip

The downloaded file should be automatically decompressed,
generating a "UniGene Tabulator" folder.
Failing this, the decompression needs an “Unzip” utility.

The UniGene Tabulator Folder contains:
"UniGene Tabulator.exe" file (runtime application);
"UniGene.UGT" (database file);
the “FMP Acknowledgements.pdf” file;
the “Extensions” folder,
    containing a "Dictionaries" folder with the dictionary file for
    supported languages and an “English” folder with 3 files;
40 ".dll" files;
the “Win_Tutorial” and “Win_Guide” folders contain
    a copy of the on-line documentation, for local (off-line) use.

Please do not change the name of all files and folders
of the "UniGene Tabulator" software.

You may download multiple copies of "UniGene Tabulator"
and run them simultaneously,
provided that each "UniGene Tabulator" folder is located
in a different directory.

UniGene Tabulator is based on FileMaker Pro 8 (FileMaker Pro, Inc.)
database management software (www.filemaker.com/index.html),
and is released as a FileMaker Pro 8 template,
along with a free runtime application able to run "FileMaker Pro"
at the core of the software.

The UniGene Tabulator solution imports UniGene “.data” flat files,
containing cluster information, and ".lib.info" flat files,
containing library information, into the database file “UniGene.UGT”.
Choose file for the desired organism from UniGene ftp server: ftp://ftp.ncbi.nih.gov/repository/UniGene/.

UniGene Line types/qualifiers in “.data” file
(ftp://ftp.ncbi.nih.gov/repository/UniGene/README):

       ID           UniGene cluster ID
       TITLE        Title for the cluster
       GENE         Gene symbol
       CYTOBAND     Cytological band
       EXPRESS      Tissues of origin for ESTs in cluster
       RESTR_EXPR   Single tissue or development stage contributes
                    more than half the total EST frequency for this gene.
GNM_TERMINUS genomic confirmation of the presence of a 3' terminus;
                    T if a non-templated polyA tail is found among a cluster's
                    sequences; else otherwise I if templated As are found in
                    genomic sequence or S if a canonical polyA signal is found
                    on the genomic sequence.
       LOCUSLINK    LocusLink/EntrezGene identifier associated with at
                    least one sequence in this cluster (Hs only)
       CHROMOSOME   Chromosome. For plants, CHROMOSOME refers to mapping on
                    the arabidopsis genome.
       STS          STS
            NAME=   Name of STS
            ACC= GenBank/EMBL/DDBJ accession number of STS
                   [optional field]
            DSEG=   GDB Dsegment number [optional field]
            UNISTS= identifier in NCBI's UNISTS database
       TXMAP        Transcript map interval
     MARKER=      Marker found on at least one sequence in this cluster
            RHPANEL=     Radiation Hybrid panel used to place marker
       PROTSIM      Protein Similarity data for the sequence with
                    highest-scoring protein similarity in this cluster
            ORG=         Organism
PROTGI=      Sequence GI of protein
            PROTID=      Sequence ID of protein
            PCT=         Percent alignment
            ALN=         length of aligned region (aa)
      SCOUNT       Number of sequences in the cluster
       SEQUENCE     Sequence
            ACC=         GenBank/EMBL/DDBJ accession number of sequence
            NID=         Unique nucleotide sequence identifier (gi)
            PID=         Unique protein sequence identifier (used for non-ESTs)
            CLONE=       Clone identigier (used for ESTs only)
            END=         End (5'/3') of clone insert read (used for ESTs only)
            LID=         Library ID;
                         see Hs.lib.info for library name and tissue
            MGC=    5' CDS-completeness indicator; if present,
                         the clone associated with this sequence is believed
                         CDS-complete. A value greater than 511 is the gi of
                         the CDS-complete mRNA matched by the EST, otherwise
                         the value is an indicator of the reliability of the
                         test indicating CDS completeness; higher values
                indicate more reliable CDS-completeness predictions.
           SEQTYPE=      Description of the nucleotide sequence.
                         Possible values are mRNA, EST and HTC.
           TRACE=        The Trace ID of the EST sequence,
                         as provided by NCBI Trace Archive
PERIPHERAL=   Indicator that the sequence is a suboptimal
                         representative of the gene represented by this
                         cluster. Peripheral sequences are those that are in
                         a cluster which represents a spliced gene without
                         sharing a splice junction with any other sequence.
                         In many cases, they are unspliced transcripts
                         originating from the gene.

//           End of record

This software parses cluster data in 5 related tables.

1) “UniGene" is the master table, it collects the known information about the
transcribed locus – e.g. UniGene cluster identifier, genome localization
or total number of sequences in the cluster - in a single record.
The master table has a relation “one to many” with each other table.

2) “SEQUENCE” imports information about the nucleotidic sequences.
By definition, UniGene clusters are sets of related nucleotidic sequences,
so there is at least one nucleotidic sequence in a given cluster.
This table combines information about a sequence
(obtained from both “.data” and “.lib.info” files) in a single record.
Each cluster will in this table generate a number of records equal to
its sequence number.

3) “STS” parses each known sequence tagged sites located in the transcribed
locus. Each cluster will generate one or more records in this table.

4) “TXMAP” collects the “transcript map interval”
retrieved by radiation hybrid analysis.

5) “PROTSIM” retrieves information about proteins having a high similarity with
the peptidic product of the cluster.

Library field qualifiers in “.lib.info” file:

       ID=                          Library ID
       TITLE=                       Title for the library
       TISSUE=                      Tissue used to obtain library
       VERBATIM_TISSUE=             Library tissue,
                                    details in vertebrates (optional)
       DEVELOPMENTAL_STAGE=         Developmental stage of the library
       CANSOURCE=                   Cancer type used to obtain library,
                                    “normal” if tissue is normal
       VERBATIM_DEVELOPMENTAL_STAGE=Developmental stage of the library,
                                    details in vertebrates (optionals)
       VECTOR=                      Vector used to obtain the library

UniGene Tabulator software retains data about library in table “Lib.info_Entries”,
information is reformatted to make it available to table “SEQUENCE”
by relationships.

METHOD

First, a detailed description of UniGene flat file format
(ftp://ftp.ncbi.nih.gov/repository/UniGene/README)
has been accurately analyzed to:
1. identify characters usable as consistent limits for each data type;
2. convert the flat file format into a multiple related table series,
   allowing the appropriate import for each data type.

Our strategy is based on importing the downloaded file.

At the beginning, table “SEQUENCE”, will collect data from the
“UniGene” file selected.

The lines of the UniGene data file are delimited by a line feed “LF”,
so each line will result in a different record.

During this first step, each UniGene line will be tagged,
according to its starting characters as containing data:
1 about a sequence;
2 about an STS;
3 about the transcript map position;
4 about a similar protein;
5 about general cluster information.

There will be 5 types of line.
The software will maintain data about sequence
(line type 1) in table "SEQUENCE";
sequence data will be parsed into corresponding fields of the same record,
and these will be correlated to their cluster within the main table "UniGene".

Information about sequence tagged sites (line type 2) found in clusters
will be parsed in the table "STS";
information about gene map positions by Hybrid Radiation Map experiments
(line type 3) will be parsed in the table “TXMAP”;
information about known ortholog proteins (line type 4)
will be parsed in the table “PROTSIM”.

In table “SEQUENCE”, which collects information about cluster sequences,
there are some exceptions:
fields “TISSUE”, “DEV_STAGE”, “CANCER_SOURCE”, “VERBATIM_TISSUE” and
“VERBATIM_DEVELOPMENTAL_STAGE” are calculated from table “Lib.info_Entries” by
a relationship, using the key field “LID”;
fields “Lib_TITLE” and “Lib_VECTOR” are directly visualized from table
“Lib.info_Entries” by the latter relation.

Lines tagged “5” are not parsed directly.
Firstly, data about the same cluster are joined and only the complete data
will be parsed in the “UniGene” table, where each bit of information is
extracted from the appropriate field.
Thus each record in this table collects data from a single cluster.

Table “Lib.info_Data”, which is not visible,
imports data from “.lib.info” files.
As above, information about a single library
is spread into more than one record.
Like “general cluster information” above,
first library information will be joined,
and only complete data will be parsed
in table “Lib.info_Entries”.

Every step of this process is driven by a specific FileMaker Pro script;
the software will ask the user when a choice is needed.

Imqort processes of UniGene data and Library data are independent
to each other, one can choose to perform both or to import only UniGene data
and subsequently to import library data.
The first choice will be import library information before importing
the UniGene file.
The UniGene import process will clear ALL previously data (parsed and raw),
while importing only Library data will delete only library data.

In the master file, the main layout is “UniGene”, from the "Layout menu"
(a pop-up Menu in the top left corner, above the small book icon).
Each record contains fields within a "portal", the FileMaker Pro tool for
construction of relational databases.
In portals, each field visualizes a field of a related table,
though not all are visualized (to see others fields click on buttons “Table”,
or choose from the "Layout menu").

--

The free included FM runtime allows free record management and browsing,
while to create new fields for elaboration or further relationship
definition one needs to install the FMP application.

We encourage any creative use, modification and non-commercial redistribution
of UniGene tabulator, as long as the original paper is cited,
and a statement is provided that the original program has been modified
(in such case).

The availability of complete UniGene datasets in relational database format
makes for easy integration with other biological databases available in
the same or similar format; for example: GenBank and EntrezGene.

Each field in each table corresponds to a "Feature Qualifier"
according to UniGene Format.

UniGene Tabulator USEFUL FIELDS and field type descriptions

Table UniGene:
“CLUSTER”                 – Cluster ID (Text field)
“TITLE”                    – Cluster title (Text field)
“GENE”                   – Gene Symbol (Text field)
“CYTOBAND”               – Cytological band related to
                                 the expressed locus (Text field)
“GeneID_LID”             – Entrez Gene related identifier/LocusLink ID
                                 (Number field)
“HOMOL”                  – Presence of known homologue proteins
                                 (Text field)
“EXPRESS”                  – Tissue used to obtain ESTs (Text field)
“RESTR_EXP”              – Tissue related to more than half of the ESTs
                                 (Text field)
“POLY_A”                 – Presence of at least an EST whth
                                 Poly A sequence (Text field)
“CHRO”                   – Chromosome related to the expressed locus
                                 (Number field)
“SCOUNT”                   – Number of total sequence related to
                                 the cluster (Number field)

Table SEQUENCE:
“NACC”                         – Sequence accession number (Text field)
“CLON”                         – Clone identifier related to the sequence
                                 (Text field)
“END”                          – Position of the sequence referring to clone
                                 (Number field)
“NUID”                         – Unique nucleotide sequence identifier (gi)
                                 (Number Field)
“LIBR”                         – Library ID used to obtain the sequence
                                 (Text field)
“PUID”                         – Unique protein sequence identifier
(used for non-DSTs)
“SEQTYPE”                      – Description of the nucleotidic sequence
                         (Text Field)
“TRACE”                        – The Trace ID of the EST sequence,
                                 as provided by NCBI Trace Archive (Text field)
“PERIPHERAL”                   – Indicator that the sequence is a suboptimal
                                 representative of the gene (Text field)
“TISSUE”                       – Tissue used to obtain library (Text field)
“DEV_STAGE”                    – Developmental stage of the tissue used to
obtain library (Text field)
“CANCER_SOURCE”                – Descriptions of the tissue used (Text field)
“VERBATIM_TISSUE”              – Detailed description of the tissue for
                                 vertebrate organisms (Text field)
“VERBATIM_DEVELOPMENTAL_STAGE” – Detailed description of the developmental
                                 stage for vertebrate organisms (Text field)

Table STS:
“NAME”                         – Name of the STS related to a sequence
                                 (Text field)
“ACC”                          – Genbank accession number of the STS
                                 (Text field)
“DSEG”                         – GDB Dsegment number (Text field)
“UNISTS”                       – Identifier in NCBI's UNISTS database
(Text field)

Table TXMAP:
“MARKER”                       – Marker found on at least one sequence
                                 in this cluster (Text field)
“RHPANEL”                      – Radiation Hybrid panel used to place marker
                                 (Text field)

Table PROTSIM:
“ORG”                          – Organism of the ortholog protein
                                 (Text field)
“PROTGI” – Sequence GI of ortholog protein
                                 (Text field)
“PROTID”                     – Sequence ID of ortholog protein
                                 (Text field)
“PCT” – Percent alignment (Number field)
“ALN”                          – Length of aligned region (aa)
                                 (Number field)

INSTALLATION

Once decompressed, UniGene Tabulator can readily be used.

GENERAL DEFINITIONS

File
A set of database tables.

Table
A set of records pertaining to the same subject.

Record
One set of fields which constitute one entry.
The record browser is a small book icon
at the top left of the window.

You may browse the database by clicking on the book pages,
or enter a record number and click on the "Return" key.
You constantly visualize the following information:
    Records: total number of Records in the table
    Found: total number of Records currently selected
    Sorted: sorting status of the Records (Sorted/Unsorted)

Field
One area of the record containing a specific data type.

Browse Mode
One way to use the database.
It allows data entry, viewing, browsing, sorting, manipulation.
It may be selected from:
            the "View" menu, or
            the mode pop-up Menu bar, at the bottom left of the window.

Find Mode
An alternative mode of using the database.
It allows you to search for specific content in the database fields,
using any different combination of criteria
    (see the "Search mode" section below for details about searching).
It may be selected from:
the "View" menu, or
            the mode pop-up Menu bar, at the bottom left of the window.

Preview Mode
An alternative way to use the database.
It visualizes a print preview of the records found.
It may be selected from:
the "View" menu, or
            the mode pop-up Menu bar, at the bottom left of the window.

Layout
A particular graphical organization of the field of a table.
A file may show data within different layouts.
A layout may display fields from a table or
its related fields from other tables.
Visualization of a field is independent of the storage of the data contained.

USE

1. Download UniGene flat file

Download the UniGene file with the format ".data.gz"
for the organism desired via ftp at:
ftp://ftp.ncbi.nih.gov/repository/UniGene/
(decompress the files when appropriate).
Download the corresponding library information file with the format
".lib.info.gz".

The UniGene page containing the ftp "UniGene" download link may
also be reached from within the software using the
“Download UniGene data” button. This invokes the default browser
and makes it open a page containing the “Downlad UniGene” link on
the left side blue bar.
Should you be asked for user “Name” and “Password”,
type “anonymous” and your e-mail address, respectively.

At the end of this step, the users should have two text files
containing clusters data and library information.
Import process require that such files are be localized in
the UniGene Tabulator folder, renamed as follows:
cluster data file -> cluster.data
library data file -> library.data

Be sure that the file extension is ".data" and not ".txt".

2. Import UniGene clusters and/or library information.

Different UniGene Tabulator databases may be obtained by duplicating
the fresh "UniGene Tabulator" folder and starting new import sessions.
Records from different database tables may then be exchanged among
different .UGT databases.

IMPORTANT. Do not import the same text file more than once into
UniGene Tabulator database; download or decompress the files
again if you need to repeat the import twice.

The ".tab" text files provided along with the distribution are only
illustratory outputs from the program, and are not intended to be
reimported into UniGene Tabulator, which is designed to import and
parse the original UniGene format data files.

Open the "UniGene tabulator" file in the "UniGene Tabulator" folder.

Advanced use:
You may open the program files using your copy of FileMaker Pro 8 or later,
thus being fully able to use any modification in the software.
In this case, do not open the program using the "UniGene Tabulator" file,
but open the file "UniGene.UGT" with your FileMaker.
Following modifications, correct function of the program requires that
you relaunch it by "UniGene Tabulator" runtime,
due to data pathway structure stored in the “UniGene Tabulator” scripts.

Click on the "Import UniGene" button.
This starts both importing and parsing of the data.
Select options from the dialog boxes when required.

You may choose if you want to import Library information too.

You can import library information later by clicking on “Import Library” button.

The time required to obtain a completely parsed UniGene database mainly
depends on the total cluster number and on the total number of
GenBank sequences composing the clusters. Complete parsing for large data files
may require up to several days of calculation.
Precomputed databases for Homo sapiens and Danio rerio are provided at:
http://apollo11.isto.unibo.it/software/UniGene_Tabulator/

Each field in each table corresponds to a data type typical of the UniGene Format.

Since Unigene Tabulator 1.1 version:
following parsing of UniGene data files, the software will create
the UniGene.tab file within the 'UniGene Tabulator' folder.

3. Use UniGene Tabulator as database.

The FileMaker Pro-based database may be used basically in these "modes":
"Browse", "Find" and "Preview".
Switching among different modes can be obtained from the "View" Menu
or from the pop-up Menu bar at the bottom left of the window.

BROWSE MODE (“NAVIGATION”)

In the "Browse" mode, one can browse among the record sets by clicking on
the small book icon in the upper left corner, or move up and down between
entries using buttons at the top left of the UniGene layout.

Browsing among the tables can be done by clicking on the “Table” buttons
in the desired section (Sequence, Protein similarity, STS, Transcript Map).
Alternatively, you can move among the tables by clicking
on the "Layout" pop-up Menu at the upper left corner.

SEARCH MODE (“FIND”)

In the "Find" mode, the small book icon in the upper left corner
represents different "requests" that are made for searching in the database.
In the "Find" mode, the user can fill in a blank form allowing searching
in specific fields.

When searching in the master table, if one entry contains various recurrences
of a feature, all related records of the respective feature are displayed.

In FileMaker Pro "Find" mode, the "AND" - "OR" - "NOT" operators may be used
in this way:

"AND" by filling in different fields located in the same "request",

"OR" by generating additional requests
     (from "Requests" Menu) in the same query,

"NOT" by generating additional requests (from "Requests" Menu)
     and checking the "Omit" box.

The "Symbols" pop-up Menu in the "Find" mode allows querying of
exact matches, ranges, duplicates, wildcards and more.

The searching results are entry subsets matching the criteria desired.

PREVIEW MODE (“PRINT”)

In the "Preview" mode, one can obtain a print preview of the data
in the actual table.
Browsing among the tables can be done by clicking on the “Table” buttons
in the desired section (Sequence, Protein similarity, STS, Transcript Map).
Alternatively, you can move up and down among the tables by clicking
on the "Layout" pop-up Menu at the upper left corner.

“UniGene Tabulator” FUNCTIONS AND MENU COMMANDS

“UniGene Tabulator” MENU

About FileMaker Pro RUNTIMES
Shows information about the software in a new window.

Preferences
Standard preferences panel, memory can be set up to 256 Mb.

Quit UniGene Tabulator
Close the program (same as to click on the cross button
on the right upper corner of UniGene window).

FILE MENU

File Options
In this application it is possible to set only the "Spelling" option.

Change Password
There is no default password.

Page setup
Standard page set-up command.

Print
Standard print command; you can choose to print:
             all records in the "Found" set, or
             only the current record, or
a "blank" mask of the record fields.

The appearance will be that of the layout
currently selected from the layout Menu.

Import Records
This is the general "Import" function of FileMaker Pro.
Use only "Import UniGene" function for correct UniGene file import,
from the "Actions" Menu, or clicking on the "Import UniGene" button in
"UniGene tabulator" file.

Export records
Export command for the found records set in a given table.
Records are exported in their current sorting.
Users can select fields to be exported, their relative order,
and the separation character.
The option “ALL” will export all fields (from all tables)
into a Unicode UTF-16 file (default parameters).

Save/Send Record as
An alternative export function. It exports data from the current record,
or the find set of records, into an "Excel" file (.xls).

Send Mail
To send data from each record in the found set by single e-mail.

Save a Copy as
Save a copy of the database, complete, compressed or
as clone (database structure with no record present).

EDIT MENU

Undo
Standard "Undo" command.

Cut
Standard "Cut" text command.

Copy
Standard "Copy" text command.

Paste
Standard "Paste" text command.

Clear
Deletion of selected text.

Select all
Selection of all the text within a selected field
(to select a field, click on the field).

Find/Replace
Utility for search/replace text strings within fields.
Note: Use "Find" mode (from "View" Menu)
           for full search and selection of a record set.

Spelling
Utility for ceck spelling of text strings within fields.

Export Field Contents
Utility to export the contents of the field selected to a file.

VIEW MENU

Browse Mode
Switch to the "Browse Mode" (see "General Definitions" above).

Find Mode
Switch to the "Find Mode" (see "General Definitions" above).

Preview Mode
Switch to the "Preview Mode" (see "General Definitions" above).

View as Form
A possible way to display individually the current record of a
found set of records.

Got to layout
A possible way to switch between different layout:
UniGene, SEQUENCES, STS, TXMAP, PROTSIM.

View as List
A possible way to display all the records of a found set
in list form.

View as Table
A possible way to display all the records of a found set as a
spreadsheet-like table.

Toolbars
To switch on/off the toolbars of the application: "Standard"
and "Text Formatting".

Status Area
To switch on/off the "Status Area", the left column toolbar.

Text Ruler
To switch on/off the text ruler of the application.

Zoom in
To increase layout dimensions, same as "Zoom +" button.

Zoom out
To decrease layout dimensions, same as "Zoom -" button.

RECORD MENU

New Record
Create a new empty record in the database.
The new Record will be the latest in the current record set.

Duplicate Record
Duplicate the current record in the database.
The new Record will be the latest of the current record set.

Delete Record
Delete the current record in the database.

Delete All Records
Delete all currently found records in the database.

Go to Record
To move to the selected record by number, previous or next.

Show All Records
Show all the records in the database.

Show Omitted Only
Show records in the database outside of the current found set.

Omit Record
Remove the selected record out of the current found set,
without deleting it.

Omit Multiple
Remove more then one record, selected by numbers, out of the current
found set, without deleting them.

Modify Last Find
Return to the last performed search to edit it.

Sort Records
Sort the current records set according to criteria desired.

Unsort
Sort the current records set according to the order insertion.

Replace Field Contents
Replace the value of a field in all found sets of records with
the value specified in the current record, or by calculation.

Relookup Field Contents
Relookup the value of a field by a matching by a selected key field.

Revert Record
Restore the value of a field, discard any changes, before clicking out of
that field.

ACTIONS MENU

Import UniGene data
Import data from a file in UniGene flat file format (.data).
(equivalent to the "Import UniGene" button in the software main window).

Since Unigene Tabulator 1.1 version:
following parsing of UniGene data files, the software will create
the UniGene.tab file within the 'UniGene Tabulator' folder.

This file is a text tabulated file and contains four columns:

NACC                 CLUSTER        GENE            NUID
[GenBank         [UniGene       [Gene Symbol]   [GenBank
Accession number]    Cluster ID]                    GI]

See also:
http://www.ncbi.nlm.nih.gov/Sitemap/sequenceIDs.html

It could be useful to readily convert GenBank Accession number into
Gene Symbol, for meta-analysis purposes.

Import Library info
Import data from a file in “.lib.info” flat file format.
(equivalent to the "Import Library" button in the software main window).

Export From SEQUENCE
This action will export data clustered by ACCESSION NUMBER information,
from the current set of found sequence records
(from all sequence records if no record subset is currently found).

This action shows two possibilities:
1. All – Each GENBANK ACCESSION NUMBER will be exported along with all
   the related information in a tabulated form
(i.e. all fields presents in the table "SEQUENCE", in this order:
   CLUSTER, TITLE, GENE, CYTOBAND, GeneID_LID, HOMOL, EXPRESS, RESTR_EXPR,
   POLY_A, CHRO, SCOUNT, NACC, CLON, END, NUID, LIBR, PUID, MGC, SEQTYPE,
   TRACE, PERIPHERAL, TISSUE, DEV_STAGE, CANCER_SOURCE, VERBATIM_TISSUE,
   VERBATIM_DEVELOPMENTAL_STAGE);
2. Custom – User can choose the fields to be exported
(i.e. certain selected fields among those described above).

User must choose name and position of the output file.

The same action starts if you click on button "Export Sequence" into main layout.

Exporting from other data tables may be easily performed by
choosing the layout of interest, then using the general "Export Record..."
command in the "File" Menu.

Erase Data
Two possibilities are shown:
1 "Delete raw data": delete only original raw data about library information.
   It may be useful to "clean" the database following parsing.
   Use this option to reduce the file size.
2 "Delete ALL data": delete all data in the database tables,
   including original flat file raw data and parsed data.

HELP MENU

About UniGene Tabulator
This command shows information about the software in a new window.

UniGene Tabulator Help
This command shows the UniGene tabulator tutorial in a new window.

OTHER FUNCTIONS IN THE MAIN LAYOUT

The mouse pointer is shown as an hand over the buttons.

Clicking on “Cluster” word of the title will open
the actual UniGene record for the current cluster,
in the default browser.

Clicking on the arrow right to “PROTGI” field
in the "Protein similarity" section will open
the corresponding record of the Entrez “Protein” database
in the default browser.

Clicking on the tag of the “GeneID/LID” field will open,
the corresponding record of the Entrez ”Gene” database
in the default browser.

---

PROBLEMS
Sometimes, power failure, hardware problems, or other factors can damage a
FileMaker database file.
When the runtime application discovers a damaged file, a dialog box appears,
telling the users to contact the creator.
Even if the dialog box does not appear, files can exhibit erratic behavior.
If you have FileMaker Pro or Developer installed you can recover it by
using the “Recover command”.
Otherwise in Windows machines, press Ctrl + Shift while double-clicking the
runtime application icon.
Hold the keys down until you see the open Damage File dialog box.
During the recovery process, the runtime application:
1. creates a new file;
2. renames any damaged file by adding “Old” to the end of the filename;
3. gives the repaired file the original name.

---
Software limits
Due to FileMaker Pro 8 limits,
maximum UniGene Tabulator file size is 8 terabytes (1024 gigabytes).
Text fields can contain up to 2GB of characters,
numbers fields can contain values up to 800 digits.
Unigene Tabulator parsed UniGene build Hs.190, including library information,
in about 3,5 days (on a Pentium 4 1,80GHz).

Technical notes
The scripts at the core of UniGene Tabulator software are "FileMaker Pro" scripts.

Bugs report
Please report any suggestion, bug or problem to:
pierluigi.strippoli@unibo.it
l.lenzi@unibo.it