GeneBase Database
Version 1.0 (2015)
Tutorial
INDEX
REQUIREMENTS
SET UP
1. Creation of a local Gene eukaryotic entries
database
1.1. Downloading the Gene eukaryotic entries
1.2. Parsing the Gene entries
1.3. Importing
the Gene entries
2. Creation of a local exon and intron
sequence database
2.1. Downloading and parsing the chromosome
sequenses
2.2. Calculating exon and intron
sequences
2.3. Importing exon and intron
sequences
2.4. Exporting
sequences in FASTA format
3. Additional tables
3.1. Transcripts table
3.2.
Reports table
GENERAL DEFINITIONS
4.1 File
4.2 Table
4.3 Record
4.4 Field
4.5 Layout
4.6 Browse Mode
4.7 Find Mode
4.8 Preview Mode
MENU AND COMMANDS
5.1 GeneBase
5.2 File
5.3 Edit
5.4 View
5.5 Records
5.6 Scripts
5.7 Help
TROUBLESHOOTING
REQUIREMENTS
(Back
to Index)
The
software minimum requirements are:
Mac OS X 10.6, OS X
Lion 10.7, OS X Mountain Lion 10.8;
Windows XP
Professional, Home Edition (Service Pack 3);
Windows Vista
Ultimate, Business, Home Premium (Service Pack 2);
Windows 7 Ultimate,
Professional, Home Premium;
Windows 8 Standard
and Pro edition.
Minimum system requirements are:
Mac OS X 10.6,
Intel-based Mac CPU (Central
Processing Unit), 1 GigaByte (GB) of RAM (Random
Access Memory), 1024x768 or higher resolution
video adapter and display.
Windows XP Professional, Home Edition (Service Pack
3), 700 MegaHertz (MHz) CPU or faster, 256 MegaBytes (MB)
of RAM, 1024x768 or higher resolution video adapter and
display.
A connection to the Internet
is required to display the software Guide and to
download data for set up, but not to run the tool.
The downloaded file
should be automatically decompressed, generating a "GeneBase"
folder.
Failing this, double
click on the file to activate the default decompression
utility of your system.
The
GeneBase Folder contains:
"GeneBase" (Macintosh)
or "GeneBase.exe" (Windows) file
(the runtime application);
"GeneBase.fmp12"
(database file);
"FMP Acknowledgments.pdf" file;
"Extensions" folder, containing a
"Dictionaries" folder,
with the
dictionary file for supported languages;
(and an "English" folder with 3 files, for
Windows);
40 ".dll" files (for
Windows);
"parse_entrezgene.py" file (only in the empty
template version);
"calculate_sequences.py" file (only in the empty
template version).
GeneBase
is based on FileMaker Pro 12 (FileMaker Pro, Inc.) database management
software (www.filemaker.com/index.html),
and is
released as a FileMaker Pro 12 template, along with a runtime
application able to run "FileMaker Pro" at the core of the
software.
The runtime is
freely distributed, in compliance with the license of
"FileMaker Pro 12 Advanced" developer package that was
used to create the program.
Standard database commands (Find, Sort, Export
records) are available within each layout of GeneBase
(see "GENERAL DEFINITIONS"
and "MENU AND COMMANDS" sections in this
Guide).
Please do not change the names of any files and in
the GeneBase folder.
NOTE - Be sure that
your system default format uses
"." (full stop)
as a decimal separator (English
standard).
If this is not the case, you must
change the system setting.
Mac OS X: in
"System Preferences" (from the "Apple" Menu), click on
"International", then on "Formats", then choose as
"Region" a country with the English standard format for
numbers (full stop mark as a decimal separator).
System
restart or user logout is not required to make the
change effective.
Windows: in
"Control Panel" (from the "Start" Menu), click on
"International options" then modify the format of
numbers choosing a country with the English standard
format for numbers (full stop mark as a decimal
separator).
System
restart or user logout is not required to make the
change effective.
Python 2.6 or 2.7 (https://www.python.org/)
is only required
to run the scripts useful for some set up steps.
SET UP
1. Creation of a local Gene
eukaryotic entries database
(Back
to Index)
In this section, the user is guided to
download, parse and import the National Center for
Biotechnology Information's (NCBI) Gene database entries
into the GeneBase tables software.
1.1)
Downloading
the Gene
eukaryotic entries
Go to
the website page:
http://www.ncbi.nlm.nih.gov/gene
Using the Entrez text
query, find the set of genes of interest.
Please note that
if there are millions of resulting entries, in order to
avoid downloading errors, it is highly recommended
that you split them through different searches and
subsequently download
different files instead of one over-sized file.
As many Prokaryota
lack a gene table listing the exon/intron structure
and thus are not supported, please add to your
search:
AND
"eukaryota"[organism]
We also recommend
you add the following restrictions, to avoid genes
without a gene table structure (causing errors
during parsing steps), by typing in the Search box:
AND
"source_genomic"[properties] AND alive[property] AND
"srcdb_refseq_known"[Properties]
These criteria
selects for gene entries from a genomic source
("source_genomic"), current and primary and not
obsolete ("alive"), and related to a known RefSeq
reference sequence ("srcdb_refseq_known").
For
further explanations, please see the
Gene Help book
(http://www.ncbi.nlm.nih.gov/books/NBK3841/)
You can
download the found entry
set by
selecting
from the "Send to" pop-up menu at the top
righthand corner of the web page:
"File", "Format ASN.1" and
"Sort by Relevance"; then click on the "Create
File" button.
In the default download
folder of your browser, you will obtain a file usually
named "gene_result.txt".
The download could take
some hours, depending on the number and the size of
retrieved genes.
Example
1
In order to obtain only current entries about
genomic genes from the Animalia, Plants and Fungi
kingdoms, excluding RefSeq models, download each kingdom
set separately using the following representative queries:
"Animalia"[Organism]
AND "source_genomic"[properties] AND alive[property] AND
"srcdb_refseq_known"[properties]
"Plants"[Organism] AND
"source_genomic"[properties] AND alive[property] AND
"srcdb_refseq_known"[properties]
"Fungi"[Organism] AND
"source_genomic"[properties] AND alive[property] AND
"srcdb_refseq_known"[properties]
In the default download
folder of your browser, you will obtain three files usually
named "gene_result.txt", "gene_result(1).txt" and "gene_result(2).txt". Please note that
you need to have these files in the Genebase folder.
Example
2
In order to obtain only current entries about
eukaryotic SOD1 genes, excluding RefSeq models, use the following
representative query:
"SOD1"[Gene] AND "eukaryota"[organism]
AND "source_genomic"[properties] AND alive[property] AND
"srcdb_refseq_known"[properties]
1.2) Parsing the Gene entries
The "parse_entrezgene.py"
Python script provided here extracts and parses all
the available information contained in the
"gene_result.txt" file(s) and creates three tab-delimited
files that can be imported into the GeneBase software.
The "gene_result.txt"
file(s) needs to be in the same directory from which the
command is launched. Execute the program by typing the
UNIX command "python parse_entrezgene.py" or by running the script
from the IDLE utility.
For those not used to
UNIX and Python languages, we recommend using the IDLE
utility to run Python scripts provided here. Please see
the following quick guide
(Section A).
The programme is
finished when the message "XXX gene results processed"
appears, where XXX is the number of NCBI's Gene entries downloaded in the first
step. In the
working folder you will obtain three files "gene_ontology.txt",
"gene_summary.txt" and gene_table.txt".
For the entire
eukaryotic set, this step will require three hours.
1.3) Importing the Gene
entries
Copy (if
not already there) the "gene_ontology.txt", "gene_summary.txt"
and "gene_table.txt" files into the folder where an
empty copy of the GeneBase.fmp12 database is available. Open
the GeneBase database and import the three files selecting the script "Import_Summary_Table_Ontology" from the "Scripts"
menu.
At the start of the import
process, the user needs to choose whether to retain or
delete all previously imported data. Clicking on "No" in
the first dialogue box will let the user add to the other
previously imported data records; otherwise, by clicking
on "Yes", all previously imported
data will be deleted and only new data will be
available.
A message will
appear warning that the import step is complete.
For the entire eukaryotic set, this step will
require five days.
Now you can navigate between four
different database tables containing the available
information. The violet
buttons will help to navigate among software
tables; the blue buttons will
open useful links about the gene or the gene product of the
current record; the orange buttons
will help to navigate between table records.
"Gene_Summary"
and "Gene_Table" tables have a box on the
right, showing useful related fields of other related software
tables, giving the opportunity to perform crossed searches.
Here
is the "Gene_Summary"
table:
The software
will calculate and extract information for each gene in
specific calculated
fields of the "Gene_Summary" table:
FIELD
DESCRIPTION
"Gene_ID":
the Entrez
gene unique identifier;
"Status":
the gene entry status;
"Genome_Annotation_Status":
shows "not in current annotation release" if the gene is not
annotated on the most recent genome
annotation (it may happen
also with "live" gene entries), otherwise
is empty;
"Gene_Type":
possible values are tRNA,
rRNA, snRNA, scRNA, snoRNA,
miscRNA, ncRNA, protein-coding, pseudo, other, and
unknown;
(for an efficient record retrieval we
recommend using
"protein coding" without the "-" symbol);
"Organism":
the binomial organism's Latin name (Genus species) and strain
when appropriate;
"Lineage":
the lineage;
"Taxonomy_ID":
the NCBI Taxonomy
database entry identifier;
"Gene_Locus":
the gene locus;
"Gene_Description":
the full descriptive name;
"Location":
the genomic location of the gene;
"Primary_Source":
the identifier of the major resource
outside of NCBI that
provided information about this gene. For
some taxa, this
resource may be the nomenclature authority; in other taxa it
may
be the group that defines genes and submits annotation to
public sequence databases;
"Ensembl":
the matching
Ensembl accession number;
"HPRD":
the Human
Protein Reference Database accession number (human
only);
"MIM":
the Mendelian Inheritance in Man (MIM)
number for the gene
(human only);
"Vega":
the
matching Vega (The Vertebrate Genome Annotation Database)
accession number;
"Alias":
unofficial symbols and descriptions that
have been used for
this
gene and its products. If there is no official symbol,
and no locus_tag, the symbol at the top of the
display is
repeated in this
section;
"Locus_Tag":
corresponds to the systematic feature qualifier used by the
international sequence collaboration (INSDC,
DDBJ/EMBL
/GenBank) and can be assigned by sequence submitters as a
unique, systematic gene descriptor. When such a value is
not
available from the submitted sequence, the identifier from
a
collaborating model organism database is
used. Locus tag is
also used as the preferred symbol if an
official symbol has
not been used to
identify a gene;
"Protein_Names":
the protein names as annotated on the
RefSeq protein;
"Protein_Description":
the protein descriptive text;
"Summary":
descriptive text about the gene, its
cellular localization,
its function, and its effect on
phenotype;
"Official_Symbol":
the gene symbol provided by the named external authority;
"Official_Full_Name":
the gene's full name provided by the named external
authority;
"RefSeq_Status":
any of the set of status descriptions
defined by RefSeq.
Here is the "Gene_Table" table:
The
software has calculated and
extracted information for each available exon
(including the corresponding intron if an intron
follows that exon); useful numbers specifically
calculated by GeneBase and not available in NCBI Gene
are highlighted in red. Each record represents one
exon with the following calculated fields:
FIELD
DESCRIPTION
"Gene_ID":
the Entrez
gene unique identifier;
"Gene_Type":
possible values
are tRNA, rRNA, snRNA, scRNA,
snoRNA,
miscRNA, ncRNA, protein-coding, pseudo,
other, and unknown; (for an efficient record
retrieval we recommend using "protein
coding"
without the "-" symbol);
"Organism":
the
binomial organism's Latin name (Genus
species) and strain when appropriate;
"Official_Symbol":
the gene symbol provided by
the named external
authority;
"Chr_Accession":
the chromosome entry GenBank
accession number;
(for an efficient record retrieval we recommend
using for example "NC" without the
"_" symbol);
"gi_Accession":
the GenInfo identifier
assigned to the
chromosome sequence;
"Chr_Start":
the
genomic coordinate of the gene start;
"Chr_End":
the
genomic coordinate of the gene end;
"Chr_Strand":
shows "plus"
or "minus" for genome strands
respectively;
"Gene_Size":
the length of the gene in base
pair (bp);
"UTR5'_Start":
the genomic coordinate of the 5´ untranslated
region start;
"UTR5'_End":
the genomic coordinate of the 5´ untranslated
region end;
"UTR3'_Start":
the genomic coordinate of the 3´ untranslated
region start;
"UTR3'_End":
the genomic coordinate of the 3´ untranslated
region end;
"CDS_Start":
the
genomic coordinate of the coding sequence
start (if present);
"CDS_End":
the
genomic coordinate of the coding sequence
end (if present);
"CDS_Size":
the length of the
coding sequence (if present);
"RefSeq_Status":
any of
the set of status descriptions defined
by RefSeq for each transcript
variant;
"RefSeq_RNA_Label":
the transcript variant number
(if present);
"RefSeq_RNA_Accession":
the RefSeq
RNA accession number; (for an
efficient
record retrieval we recommend using
for example "NP" without the "_"
symbol);
"Exon_Start":
the genomic coordinate of the exon start;
"Exon_End":
the genomic coordinate of the exon end;
"Exon_Length_bp":
the length of the exon in bp;
"Exon_Number":
the exon number;
"Last_Exon":
shows
"Yes" if it is the 3´ terminal exon;
"RefSeq_Protein_Label":
the isoform
number (if present);
"RefSeq_Protein_Accession":
the RefSeq protein accession
number; (for an
efficient
record retrieval we recommend using
for example "NP" without the "_"
symbol);
"Coding_Exon_Start":
the genomic coordinate of the coding exon
start;
"Coding_Exon_End":
the genomic coordinate of the coding exon end;
"Coding_Exon_Length_bp":
the length of the coding exon in bp;
"Coding_Exon_Number":
the coding
exon number;
"Last_Coding_Exon":
shows "Yes" if it is the last coding exon;
"Downstream_Intron_Start":
the genomic coordinate of the intron start;
"Downstream_Intron_End":
the genomic coordinate of the intron end;
"Downstream_Intron_Length_bp":
the length of the intron in bp;
"Intron_Donor"*:
please see section 2;
"Intron_Acceptor"*:
please see section 2;
"Canonical"*:
please
see section 2;
"Tag_Exon"*:
please see section
2;
"Tag_Coding_Exon"*:
please
see section 2;
"Tag_Intron"*:
please see section 2;
"Exon_Sequence"*:
please
see section 2;
"Coding_Exon_Sequence"*:
please see section 2;
"Downstream_Intron_Sequence"*:
please see section 2;
"Non_Redundant_Exon":
shows "Yes -
Unique" if this exon is unique;
shows "Yes - Merged" if this exon is the first
one of each group of exons
belonging to
transcript variants and thus
present multiple
times in the database; to
search for a non
redundant set of exons just type
"Yes" in this
field in the "Find Mode";
"Non_Redundant_Coding_Exon":
shows
"Yes - Unique" if this coding exon is
unique;
shows "Yes - Merged" if this coding exon
is the first one
of each group of coding exons
belonging
to transcript variants and thus
present
multiple times in the database; to
search for
a non redundant set of coding exons
just type
"Yes" in this field in the "Find
Mode";
"Non_Redundant_Intron":
shows
"Yes - Unique" if this intron is unique;
shows "Yes -
Merged" if this intron is the first
one of
each group of introns belonging to
transcript variants and thus present multiple
times in the
database; to search for a non
redundant set of introns just type "Yes" in this
field
in the "Find Mode";
"QC_Exon_Length"*:
please see section 2;
"QC_Coding_Exon_Length"*:
please see section 2;
"QC_Intron_Length"*:
please
see section 2.
* These fields are filled by
executing the steps described in section 2.
The orange button "View records as list" will change the
view mode showing all the records of that isoform as a
list, from 5' (top) to 3' (down).
The orange button "Find identical introns" will find
identical introns belonging to transcript variants based
on intron genomic coordinates.
Here is the "Gene_Ontology" table:
FIELDS
DESCRIPTION
"Gene_ID":
the
Entrez gene unique identifier;
"Gene_Type":
possible values are tRNA, rRNA, snRNA, scRNA, snoRNA, miscRNA,
ncRNA,
protein-coding, pseudo,
other, and unknown; (for an efficient data
retrieval we recommend
using "protein coding" without the "-"
symbol);
"Organism":
the binomial
organism's Latin name (Genus
species)
and strain when
appropriate;
"Official_Symbol":
the gene symbol
provided by the named external authority;
"GO_ID":
the Gene Ontology (GO) identifier number (for example
GO:51117);
"Evidence_Code":
the Evidence
information (explanations for these abbreviations are
provided by the Gene
Ontology website: click on the blue button
"Evidence Code Guide").
2. Creation of a local exon and
intron sequence database
(Back to Index)
The "Gene_Table"
table is designed to contain intron and exon sequences
(please see "Gene_Table"
field descriptions in section 1.3). In this section,
the user is guided to download and parse chromosome
sequences and to calculate exon and intron sequences to be
imported into GeneBase local database. The user can choose
a record subset from which to calculate sequences, for
example selecting only protein-coding genes.
2.1) Downloading and
parsing chromosome sequences
From "Gene_Table" table
select "Show all" if you
are interested in all available gene exons and
introns. Alternatively, the user can select a record subset from
which to calculate exon and intron sequences; please see sections 4 and
5 for
details regarding "Find Mode". Please select the script "Export_Selected_Records" from the
"Scripts" menu.
A
message will appear warning that the import step is
complete. The final output is the creation of two text
files in the GeneBase folder: one with the chromosome
accessions list (named "chr_accession.txt")
and one with exon and intron genomic coordinates (named
"exon_intron.txt"
and used in section 2.3);
both are related to all shown records or to the record
subset (depending on what the user is interested in).
On the web page
select:
Database:
Nucleotide;
File:
click on the "Browse" button and
select the "chr_accession.txt" created
in the previous step and located in the GeneBase folder.
By clicking on the "Retrieve" button, a window with
the description of the retrieved records will appear.
Click on the "Retrieve
records for XXX UID(s)" link (where XXX is
the number of the retrieved records that should be
equal to the number of chromosome accessions listed in
the uploaded file). You can download the found entry set
choosing from the "Send to" pop-up menu at the
right-top corner of the web page:
"File", "FASTA"
and "Default Order"; then clicking on the "Create File"
button.
The download could
take some hours, depending on the number and the
size of chromosomes.
Please
note that you should not exceed the download limit
of 10 GB to avoid errors in the output file. You can
divide the chromosome accession list file
(chr_accession.txt) into two or more files and
repeat the download step for each of them.
In the default
download folder of your browser, you will obtain one
or more files usually named "sequence.fasta",
"sequence(1).fasta", "sequence(2).fasta" and so forth.
We recommend
checking that the number of downloaded entries is
equal to initial retrieved chromosome accession number
(e.g.
using the "grep" and "wc" UNIX utilities: grep gi
sequence.fasta | wc -l).
Create a text file named exactly
"file_list.txt" with a list of the downloaded
FASTA file names (one row
each name, even if you have only one file with the
chromosome sequences, write only that name).
Example
If
you have only one file with the chromosome
sequences, write only: sequence.fasta.
If you have three files with the chromosome
sequences, write:
sequence.fasta
sequence(1).fasta
sequence(2).fasta
For Windows users
only: you need to convert the FASTA file(s) in tabular
format, for example by using Galaxy web tool.
2.2)
Calculating
exon and intron sequences
The "calculate_sequences.py" Python script
provided here automatically parses downloaded
chromosome sequences and extracts exon and intron
sequences using the genomic coordinates exported from
all or selected records of "Gene_Table" database table (please see
section 2.1).
You need to have
the following files in
the same folder:
1) the "calculate_sequences.py"
script;
2) the "exon_intron.txt"
file, containing the genomic coordinates and created
in
section 2.1;
3) the FASTA
file(s) with the chromosome sequences downloaded in section 2.1;
4) the text file "file_list.txt"
with the list of
the downloaded FASTA file names created
in section 2.1.
Execute the "calculate_sequences.py"
script by typing the UNIX command "python calculate_sequences.py"
or by running the script from the IDLE utility.
For those not used
to UNIX and Python languages, we recommend using the
IDLE utility to run Python scripts provided here.
Please see the following quick guide (Section
B).
The
programme is finished when the message "Exon and
intron sequences calculated" appears. In the working folder
you will obtain a file named "exon_intron_seq.tab".
For the "Validated"
and "Reviewed" eukatyotic gene set, this step will
require three hours.
2.3)
Importing
exon and intron sequences
If the file created
in the previous step ("exon_intron_seq.tab")
exceeds the size of 4 GB, please divide it in order
to create more files of less than 4 GB,
which is the size limit of text files to be imported
into a FileMaker database. Please repeat for each
file created
the following import step.
Import
the exon and intron calculated sequences by selecting the script "Import_Exon_Intron_Sequences" from the
GeneBase "Scripts" menu: a window will appear to find
the sequence file(s) ("exon_intron_seq.tab") location.
A
message will appear warning that the import step is
complete.
For the entire
eukaryotic gene set, this step will require two days.
The software will
calculate and extract sequences and other information
in specific calculated fields of the "Gene_Table" table:
FIELD
DESCRIPTION
"Intron_Donor":
the intron donor site (5'
end);
"Intron_Acceptor":
the intron acceptor site (3'
end);
"Canonical":
shows "Yes" if the donor
site is "GT" and the acceptor
site is "AG", otherwise shows "No";
"Tag_Exon"*:
it shows the single-line
identifier beginning with the ">"
symbol of the current exon
sequence;
"Tag_Coding_Exon"*:
it
shows the single-line identifier beginning with the
">"
symbol of the current coding
exon sequence;
"Tag_Intron"*:
it
shows the single-line identifier beginning with the
">"
symbol of the current intron
sequence;
"Exon_Sequence":
the exon sequence (each row has 70 bases);
"Coding_Exon_Sequence":
the coding exon sequence (each
row has 70 bases);
"Downstrean_Intron_Sequence":
the intron sequence (each
row has 70 bases);
"QC_Exon_Length":
shows "Yes"
if the exon calculated sequence is of the
correct length (length stated in "Exon_Length_bp"
field),
otherwise shows "No";
"QC_Coding_Exon_Length":
shows "Yes" if the coding exon calculated sequence is
of
the correct length (length
stated in
"Coding_Exon_Length_bp" field), otherwise shows
"No";
"QC_Intron_Length":
shows "Yes" if
the intron calculated sequence is of the
correct length
(length stated in "Intron_Length_bp"
field), otherwise shows "No".
2.4)
Exporting sequences in FASTA format
From "Gene_Table" table
select "Show all" if you
are interested in all available gene exons and
introns. Alternatively, the user can select a record subset from
which to export exon and intron sequences in FASTA
format. The omission
of records with empty "Exon_Sequence",
"Coding_Exon_Sequence"
or "Intron_Sequence"
fields is recommended, in order to speed up the
exporting step. Please see sections
4
and 5
for details regarding "Find Mode". The
set of current exon/coding exon/intron sequences can be
automatically exported in FASTA format by selecting, from the GeneBase "Scripts" menu,
the corresponding scripts:
"Export_Exon_Sequences_In_FASTA_Format", "Export_Coding_Exon_
Sequences_In_FASTA_Format" and "Export_Intron_Sequences_In_FASTA_Format".
A message
will appear warning that each selected export step is
complete. These scripts could
take some hours and the database file size could
temporarily increase, depending on the number and the
size of sequences to be exported. The final output is
the creation of a FASTA file
in the GeneBase folder for each desired set: "Exons_FASTA.txt"
and/or "Coding_Exons_FASTA.txt"
and/or "Introns_FASTA.txt". These
files contain the
single-line identifiers (as
indicated by the "Tag_Exon",
"Tag_Coding_Exon"
and "Tag_Intron"
"Gene_Table" fields) and the
corresponding sequences and
are suitable, for example, for the creation of a
local BLASTN database through the "makeblastbd"
executable available along with the BLAST local
suite.
Briefly, the user
needs to install the operating system specific
BLAST Command Line Applications from the following
website page:
ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
The
following example command line creates a
BLAST local database from the exon
sequences exported from GeneBase ("Exons_FASTA.txt"):
$
makeblastdb -in Exons_FASTA.txt -dbtype nucl -out
Exons_db
Please note that
merging at text level Exons_FASTA.txt and
Intron_FASTA.txt files and processing this merged file
with megablastdb would result in a complete
exon/intron database.
The
following example command line
compares a file containing query
sequences to the created local
database:
$
blastn -query Query.txt -db
Exons_db -out Results.txt
3.
Additional tables
(Back to Index)
3.1)
Transcripts table
A table named "Transcripts"
is generated showing the RefSeq status provided for
each gene and for each of its transcript. This table
gives the possibilities to compare the gene-level
status with the status provided for all the gene
variant transcripts, which are not always the same.
FIELD
DESCRIPTION
"Gene_ID":
the Entrez gene unique
identifier;
"RNA_Accession":
the
RefSeq RNA accession number;
"RefSeq_Status":
any of the set of status
descriptions defined by RefSeq for each
transcript variant;
"Gene_RefSeq_Status": any
of the set of gene-level status descriptions defined by
RefSeq.
3.2) Reports table
In addition a fifth
table named "Reports"
is generated in order to calculate values of interest
for statistics. Values are updated depending on record
subsets found in "Gene_Table":
for example, if you search for "Gene_Table"
records belonging to Homo sapiens (please see sections 4 and 5
for details regarding "Find Mode"), switching to the "Reports" tables,
values will be updated considering the Homo sapiens
record subset. On the contrary, if you select "Show all" from "Gene_Table" table,
"Reports"
values will be calculated considering all available
records.
FIELD
DESCRIPTION
"Mean_Exon_Length":
the mean value of the length
in bp of all exons
depending on record subsets found
in
"Gene_Table";
"Standard_Deviation_Exon_Length":
the standard deviation value of the length in
bp of all exons depending on record
subsets
found in "Gene_Table";
"Mean_Coding_Exon_Length":
the mean value of the length
in bp of all
coding exons depending on record subsets found
in "Gene_Table";
"Standard_Deviation_Coding_Exon_Length": the standard deviation value of the length in
bp of all coding exons depending on record
subsets found in "Gene_Table";
"Mean_Intron_Length":
the mean value of the length
in bp of all
introns depending on record subsets found in
"Gene_Table";
"Standard_Deviation_Intron_Length":
the standard deviation value of the length in
bp of all introns depending on record subsets
found in "Gene_Table".
GeneBase contains now
information and sequences of genes of interest. Using general
FileMaker commands (please see following sections), the user
can perform original calculations and searches.
4. GENERAL DEFINITIONS
(Back to Index)
4.1
File
A set of database tables.
4.2 Table
A set of records referring to
the same subject type (e.g., the 'Gene_Summary' table).
One set of
fields which represents one entry (i.e. containing all requested
data for a subject, e.g. a gene probe).
The record browser is a
small book icon at the top left of the window.
You may also browse the records faster using the cursor to the
right of the small book icon.
4.4. Field
The database unit
containing a specific data type (e.g., 'Gene_ID').
4.5.
Layout
A particular graphical
organization of the field of a table.
A table can be visualized
in more than one layout.
A layout may display
fields from a table or its related fields from other tables.
A file may show data within
different layouts.
Visualization of a field
is independent from the storage of the contained data.
Browsing among the layouts
can be done by clicking on the 'Layout:'
pop-up Menu in the upper left corner.
You may browse the
database by clicking on the small book pages at the top left
of the window, by using the cursor to the right of the small book
icon or by
entering a record number
and clicking on the "Return" key.
The
following information is constantly displayed on the Status
toolbar (if not, select "Status
Toolbar" from the "View" Menu):
Records: total number of Records in the table.
Found: total number of the subset of Records
currently selected. Clicking
on the green circular button will retrieve the complementary
subset of currently omitted records.
Sorted: sorting status of the Records
(Sorted/Unsorted).
The FileMaker
Pro-based database may be used basically in these "modes":
'Browse', 'Find',
and 'Preview'.
Switching among different
modes can be done from the 'View'
Menu or from the
pop-up Menu bar at the bottom left of the window.
4.6
Browse Mode
One way to use the
database.
It allows entry, viewing, browsing,
sorting, and manipulation of data.
It may be selected from
the 'View' menu or
the pop-up mode Menu bar at the bottom left of the
window.
In the 'Browse' mode, the
record sets can be browsed by clicking on the small book icon (with the arrows
to move 'back' and 'forward') in the upper left corner.
Browsing among the tables can
be done by clicking on the 'Layout:' pop-up Menu in the upper
left corner.
An alternative mode for using
the database.
It allows you to search
for specific content in the database fields, using any different
combination of criteria
(see the 'Find mode'
section below for more details).
It may be selected from
the 'View' menu or
the mode pop-up Menu bar at the bottom left of the
window.
The user can fill in a blank
form allowing to searches in specific fields.
In the "Find" mode, the small
book icon in the upper left corner represents different
"requests" that are made for searching the database.
In FileMaker Pro 'Find'
mode, the "AND" - "OR" - "NOT" operators may be implemented in this way:
"AND" by filling criteria
in different fields
located in the same "Request",
"OR" by generating additional requests
(from "Requests"
Menu) in the same query,
"NOT" by generating
additional requests
(from
"Requests" Menu) and
clicking
on the "Omit"
button (located on the top bar
in the window).
The 'Operators' pop-up Menu
appears by clicking on a field while pressing the 'ctrl' key, allowing the
query of:
exact matches, duplicate
values, ranges, wild cards and more.
Click on the 'Perform Find'
button at the top of the window to start the query.
The result of the search
is the subset of the entries matching the set search criteria.
An alternative way of
using the database.
It visualizes a print
preview of the found records.
It may be selected from
the "View" menu or
the pop-up Menu bar at the bottom left of the window.
In the "Preview" mode, the
user can obtain a print preview of the data in the current table.
Browsing among the
tables can be done by clicking on the 'Layout:' pop-up
Menu in the upper left corner.
About
FileMaker Pro Runtime...
Information about
FileMaker Pro Runtime at the core of the software.
Preferences...
Standard preferences
panel; cache memory size can be set at up to 256 Mb.
Hide
GeneBase
Hides all GeneBase windows.
Quit
GeneBase
Closes the programme.
File
Options...
It is only possible to
set the "Spelling" options.
Change
Password...
There is no default
password set.
Page
Setup...
Standard page set up
command.
Print...
Standard print command.
The appearance will
match the layout
currently displayed on the screen.
Import
Records
This is the general
"Import" function of FileMaker Pro.
Export
Records...
Export command for the found records set in a
given table.
Records are exported in
their current sorting mode.
User can select fields
to be exported, their relative order,
and the separation
character.
Save/Send
Records As...
Saves records to a
specified Excel worksheet.
Send
Sends an intranet or
internet email message (with or without a file attachment)
to one or more recipients.
Email can be sent
through an email application or via SMTP (Simple Mail
Transfer Protocol, a set of criteria for sending and
receiving email).
Save a
Copy as...
Save a copy of the database, compacted,
compressed, as
a clone (database structure with no record present) or as a
self-contained copy.
5.3
'Edit' Menu
(Back to Index)
Undo
Standard "Undo" command.
Redo
Standard "Redo" command.
Cut
Standard "Cut" text
command.
Copy
Standard "Copy" text
command.
Paste
Standard "Paste" command.
Paste
Text Only
Standard "Paste" text
command.
Clear
Deletes the contents of
the specified field in the current record.
Select
all
Selection of all text
present within a selected field
(to select a fieold, click
n the field).
Find/Replace
Utility for
searching/replacing text strings within fields.
Note: Use 'Find' mode
(from 'View' Menu)for
full search and selection of a record set.
Spelling
Function for checking
spelling of text strings within fields.
Export
Field Contents...
Function for exporting the
contents of the selected field to a file.
5.4
'View' Menu
(Back to Index)
Browse
Mode
Switch to the 'Browse
Mode' (see "General Definitions" above).
Find
Mode
Switch to the 'Find
Mode' (see "General Definitions" above).
Preview
Mode
Switch to the 'Preview
Mode' (see "General Definitions" above).
Go to
layout
A possible way to switch
between different layouts.
View
as Form
A possible way to individually display the current record
of a found set
of records.
View
as List
A possible way to
display all the records of a found set in the form of a
list.
View
as Table
A possible way to
display all the records of a found set in the form of a spreadsheet-like table.
Status
Toolbar
To switch on/off the
"Status Toolbar": the toolbar located at the top of the
programme window.
Customize
Status Toolbar
To customize the "Status
Toolbar" buttons.
Formatting
bar
To switch on/off the
"Formatting Toolbar".
Ruler
To switch on/off the
text ruler of the application.
Zoom
in
Used to increase layout
dimensions.
Zoom
out
Used to decrease layout
dimensions.
5.5
'Records' Menu
(Back to Index)
New
Record
Creates a new empty
record in the database.
The new Record will be
the latest in the current record set.
Duplicate
Record
Duplicates the current
record in the database.
The new Record will be
the latest in the current record set.
Delete
All Records
Deletes all the records
in the database.
Delete
Found Records...
Deletes all currently
found records in the database.
Go to
Record
Moves to the selected
record by number, previous or next.
Refresh
Window
Updates the entire
contents of the database window, including any related
records.
Show All
Records
Shows all the records in
the database.
Show
Omitted Only
Shows all the records in
the database not included in the current set found.
Omit
Record
Removes the selected
record out of the current found set, without deleting it.
Omit
Multiple...
Removes more than one
record, selected by numbers, out of the current found set without deleting
them.
Modify
Last Find
Returns to the last
performed search in order to edit it.
Saved
Finds
Saves a set of search
criteria.
Sort
Records...
Sorts the current record
set according to desired criteria.
Unsort
Displays the current
record set according to the order of creation of each
record.
Replace
Field Contents...
Replaces the value of a
field in all sets of records found with the value specified in the
current record, or by calculation.
Relookup
Field Contents
This
command updates the value of a field by reading the matched
value in a related table (the relationship has been
established during database development using a 'key'
field).
Revert
Record...
Restores the value of a
field, discarding any change, before clicking out of that
field.
5.6 'Scripts'
Menu
(Back to Index)
GeneBase_Guide
The
page with the user Guide of the GeneBase software.
Import_Summary_Table_Ontology
Please see section 1.3.
Export_Selected_Records
Please see section 2.1.
Import_Exon_Intron_Sequences
Please see section 2.3.
5.7
'Help' Menu
(Back to Index)
Search
Search a 'Help' system for general commands.
TROUBLESHOOTING
(Back to Index)
Sometimes,
power failure, hardware problems, or other factors can
damage a
FileMaker Pro database file.
When the runtime
application discovers a damaged file, a dialog box appears, prompting the user to
contact the creator.
Even if the dialogue box
does not appear, files can exhibit erratic behaviour.
If you have FileMaker
Pro or FileMaker
Pro Advanced
installed you can recover them using the 'Recover'
command.
Otherwise, to recover a damaged file:
- On Mac OS X machines,
press Command + Option (cmd-alt) while double-clicking the runtime application icon. Hold the keys down until
you see the 'Open Damaged File' dialogue box.
- On Windows machines, press
Ctrl+Shift while double-clicking the runtime application
icon. Hold the keys down until you see the Open Damaged File
dialogue box.
During the recovery
process, the runtime application:
1. Creates a new file;
2. Renames any damaged files
by adding “Old” to the end of the
file name;
3.
Gives the repaired file the original name.