Python and IDLE quick guide

parse_entrezgene.py” and “calculate_sequences.py” are two useful script files for parsing NCBI Gene files and NCBI Nucleotide sequences to create the GeneBase database.

Requirements: any operating system (Linux, Mac OS X, Windows, ...) with Python 2.7 and IDLE.
In the “Applications” folder, verify if Python 2.7 is already present.

Applications.png

If not, download Python 2.7 from http://www.python.org/downloads/. (IDLE is included in this installer). Install the downloaded version of Python as usual.

For example, for Mac users:

Double-click to open the downloaded “.dmg” file. The window represented in the following figure will open automatically: double-click on the “Python.mpkg” file to install Python.

Applications.png

When the Python 2.7 appears in the “Applications” folder, the system is properly configured.
 
Section A - parse_entrezgene.py


1. Go to the GeneBase folder.

2. If not already present, save a copy of the script file of interest in this folder.

GeneBase_Folder.png

Please note that, in order to correctly execute the script, you need to have the following files in the same folder:
- parse_entrezgene.py;
- one or more files downloaded from the NCBI Gene website (usually automatically named gene_result.txt, gene_result(1).txt and so forth if more than one).

3. Double-click on the “parse_entrezgene.py” script file and two windows will appear: the “Python Shell” and “IDLE”.

4. To execute the script, keeping the “parse_entrezgene.py” window active, select “Run Module” from the “Run” menu.

Run_Module.png

5. The programme is finished when the message “XXX gene results processed” appears in the “Python Shell”, where XXX is the number of NCBI's Gene entries downloaded in the first step of GeneBase tutorial.

Results_Processed.png
In the GeneBase folder you will obtain three tab-delimited files: “gene_ontology.txt”, “gene_summary.txt” and “gene_table.txt”.
 
Ontology_Summary_Table.png

To import these files into GeneBase software, please go back to GeneBase tutorial step 1.3.

Section B - calculate_sequences.py

1. Go to the GeneBase folder.

2. If not already present, save a copy of the script file of interest in this folder.

GeneBase_FolderB.png

Please note that, in order to correctly execute the script, you need to have the following files in the same folder:
- calculate_sequences.py;
- exon_intron.txt;
- FASTA file(s) with the downloaded chromosome sequences, in this example sequence.fasta and
  sequence(1).fasta;
- file_list.txt (with a list of FASTA file names).

3. Double-click on the “calculate_sequences.py” script file and two windows will appear: the “Python Shell” and “IDLE”.

4. To execute the script, keeping the “calculate_sequences.py” window active, select “Run Module” from the “Run” menu.

Run_Module_Seq.png

5. The programme is finished when the message “Exon and intron sequences calculated” appears in the “Python Shell” (this could take some hours).

Sequences_calculated.png

In the GeneBase folder you will obtain a tab-delimited file named “exon_intron_seq.tab”.
 
Sequences_tab.png

To import this file into GeneBase software, please go back to GeneBase tutorial step 2.3.