Exercise 4. Accessing GenBank


Accessing Genbank and interpreting its output (adapted from Geneious Tutorials, since only PRO version allows access to tutorials)


Given a nucleotide or protein sequence it is possible to search for similar sequences using BLAST. Blast stands for Basic Local Alignment Search Tool and finds regions of local similarity between a target sequence and a set of stored sequences. Blast also calculates the statistical significance of matches based on both similarity and the number of sequences in the set.


A: Go to NCBI BLAST window, choose nucleotide blast (Search a nucleotide database using a nucleotide query) and copy the unknown DNA sequence (below) and paste it into the query sequence box. Choose search set database "nucleotide collection (nr/nt)" and click the BLAST button on the bottom. Like all databases if many people are accessing it simultaneously then output can be slow. Be patient. An estimate of the approximate search time will appear below the menu buttons.


CAT CCG TTG CCC ACA CAT GTC GTG ATG TAC AGT ACG GCT GAT TAA TCC G


Click on the first Genbank "hit" to display the result in the sequence viewer below. The viewer shows both query and match regions. Note that in this case there is an exact match. Click on the second hit on this list in the document table - note that most of the bases are the same but there are a number of differences.


Now click back to the top match and click on the GenBank acession number. This will open the complete Genbank sequence and all the information related to this record. Read the information in the locus and definition line: from this information you should be able to answer the following questions:


From the closest Genbank match to the query sequence what is the:

A.1. Genus name

A.2. Species name

A.3. Genbank number (this is a unique number assigned to every DNA sequence on Genbank)

A.4. Name of gene

A.5. Number of base pairs of sequence

A.6. Is the origin of the DNA nuclear or mitochondrial?


B. Now click on the ORGANISM link that will lead you to the NCBI/Taxonomy database and search for the genus and species names you recorded above. This information yields the entire taxonomic lineage of the species (phylum, order, family etc...). If you click on the Lineage link it will take you to the NCBI website - on this website click on the species name corresponding to the "animal diversity web": the link will export you to some information about the species in question.


B.1. What is the common name for this species?

B.2. Where was this species found?

B.3. What is the estimated body mass of this species?


C. Now click on the NCBI/Nucleotide database and search for the genus and species names you recorded above.

C.1. How many Genbank entries are there for this species?

C.2. What is the other gene present in Genbank for this species?


D. Search on google scholar the original paper that describes this DNA. Some of you may be familiar with google scholar - it is a google search engine focusing on scholarly articles.

In short Google Scholar does not find as much web information as the normal google search engine, making it a valuable tool to use in the search for academic studies. The google scholar search should have returned a couple of results. Click on the top link (it should refer you to http://www.sciencemag.org/cgi/content/abstract/295/5560/1683) which is the original paper that describes this DNA sequence. The authors of this sequence deposited it on Genbank. There is a way of getting the full paper, but not on the Science page. Try to work that out. In any case, read the first paragraph of the paper, it will give you a little perspective on why researchers conducted this research.


D.1. What is the presumed closest relative of Raphus cucullatus?

D.2. What was the name of the first author on this paper?


E. Now click on the NCBI/PubMed database link and search for the name of the author (first and last name) that you recorded above. PubMed is one of many online databases that records journal articles published in scientific journals. It is possible to download articles from this list into citation software packages (such as Endnote) so that you do not have to enter all the references by hand. Have a read though the list of titles for this author and list two other extinct species (common or scientific names) on which this author has published papers.


E.1. Species 1?

E.2. Species 2?