“
parse_entrezgene.py” and “
calculate_sequences.py”
          are two useful script files for parsing NCBI Gene files and
          NCBI Nucleotide sequences to create the 
GeneBase
            database.
          
          Requirements: 
any operating system (Linux, Mac OS X,
          Windows, ...) with 
Python 2.7 and IDLE.
          In the “Applications” folder, verify if 
Python 2.7 is
          already present. 
           
         
        
        
         If not, download 
Python 2.7 from 
http://www.python.org/downloads/.
          (IDLE is included in this installer). Install the downloaded
          version of Python as usual.
          
          For example, for Mac users:
          
          Double-click to open the downloaded “.dmg” file. The window
          represented in the following figure will open automatically:
          double-click on the “Python.mpkg” file to install Python. 
         
        
        
        
        When the Python 2.7 appears in the “Applications” folder, the
        system is properly configured.
          
            Section A - parse_entrezgene.py
        
        1. Go to the GeneBase folder.
        
        2. If not already present, save a copy of the script file of
        interest in this folder.
        
        
        
         Please note that, in order to correctly
          execute the script, you need to have the following files in
          the same folder:
          - parse_entrezgene.py;
          - one or more files downloaded from the NCBI Gene website
          (usually automatically named gene_result.txt,
          gene_result(1).txt and so forth if more than one).
          
          3. Double-click on the “parse_entrezgene.py” script file and
          two windows will appear: the “Python Shell” and “IDLE”.
          
          4. To execute the script, keeping the “parse_entrezgene.py”
          window active, select “Run Module”
          from the “Run” menu.
        
        
        
        
        5. The programme is finished when the message “XXX gene results
        processed” appears in the “Python Shell”, where XXX is the
        number of NCBI's Gene entries downloaded in the first step of 
GeneBase tutorial.
        
        

        In the GeneBase folder you will obtain three tab-delimited
        files: “
gene_ontology.txt”, “
gene_summary.txt” and
        “
gene_table.txt”.
         
        
        
        To import these files into GeneBase software, please go back to
        
GeneBase tutorial step 1.3.
        
        Section B -
                calculate_sequences.py
        
        1. Go to the GeneBase folder.
        
        2. If not already present, save a copy of the script file of
        interest in this folder.
        
        
        
         Please note that, in order to correctly
          execute the script, you need to have the following files in
          the same folder:
          - calculate_sequences.py;
          - exon_intron.txt;
          - FASTA file(s) with the downloaded chromosome sequences, in
          this example sequence.fasta and
            sequence(1).fasta;
          - file_list.txt (with a list of FASTA file names).
          
          3. Double-click on the “calculate_sequences.py” script file
          and two windows will appear: the “Python Shell” and “IDLE”.
          
          4. To execute the script, keeping the “calculate_sequences.py”
          window active, select “Run Module”
          from the “Run” menu.
        
        
        
        
         5. The programme is finished when the message
          “Exon and intron sequences calculated” appears in the “Python
          Shell” (this could take some hours).
        
        
        
        
        In the GeneBase folder you will obtain a tab-delimited file
        named “
exon_intron_seq.tab”.
         
        
        
        To import this file into GeneBase software, please go back to 
GeneBase tutorial step 2.3.