Bioinformatics at University of Limpopo

Tuesday, October 4, 2011

Finding the Gene Coding Regions in a DNA sequence

You have just isolated your DNA from your sample and want to find out which genes will be encoded in the sequence.
ACTCCGTCTCAATATGTCTCAAGATGGCGGCCAATGTGGGATCGATGTTTCAATATTGGAAGCGCTTTGATTTACAGCAGCTGCAGGATTTGCGCAAGCAGGTAGCGCCGCTGCTGAAGAGTTTCCAAGGAGAGATTGATGCACTGAGTAAAAGAAGCAAGGAAGCTGAAGCAGCTTTCTTGAATGTCTACAAAAGATTGATTGACGTCCCAGATCCCGTACCAGCTTTGGATCTCGGACAGCAACTCCAGCTCAAAGTGCAGCGCCTGCACGATATTGAAACAGAGAACCAGAAACTTAGGGAAACTCTGGAAGAATACAACAAGGAATTTGCTGAAGTGAAAAATCAAGAGGTTACGATAAAAGCACTTAAAGAGAAAATCCGAGAATATGAACAGACACTGAAGAACCAAGCCGAAACCATAGCTCTTGAGAAGGAACAGAAGTTACAGAATGACTTTGCAGAAAAGGAGAGAAAGCTGCAGGAGACACAGATGTCCACCACCTCAAAGCTGGAGGAAGCTGAGCATAAGGTTCAGAGCCTACAAACAGCCCTGGAAAAAACTCGAACAGAATTATTTGACCTGAAAACCAAATACGATGAAGAAACTACTGCAAAGGCCGACGAGATTGAAATGATCATGACGGACCTTGAAAGGGCAAACCAGAGGGCAGAGGTGGCTCAGAGAGAGGCGGAGACCTTAAGGGAACAGCTCTCATCGGCCAATCACTCCCTCCAGCTGGCCTCACAGATCCAGAAGGCACCAGACGTGGCCATAGAGGTGCTGACCCGCTCCAGCCTAGAAGTTGAGTTGGCCGCCAAGGAGCGGGAGATCGCACAGCTGGTGGAGGACGTGCAGAGACTCCAGGCCAGCCTCACCAAGCTGCGGGAGAATTCGGCCAGCCAGATCTCACAGCTTGAGCAGCAGCTGAGCGCCAAAAACAGCACACTCAAACAACTGGAAGAAAAACTCAAAGGCCAGGCTGACTATGAAGAGGTGAAGAAAGAGCTGAACATTCTGAAGTCCATGGAGTTTGCACCGTCCGAGGGCGCTGGGACACAGGATGCGGCCAAGCCCCTGGAGGTGCTGTTGCTGGAGAAGAACCGCTCGCTGCAGTCCGAGAACGCCGCGCTGCGCATCTCCAACAGCGACCTGAGCGGACGCTGTGCAGAGCTGCAAGTCCGTATCACTGAGGCTGTGGCCACAGCCACTGAGCAGAGAGAGCTGATCGCCCGCCTGGAGCAGGACCTGAGCATCATTCAGTCCATCCAGCGGCCCGATGCCGAGGGTGCCGCTGAGCACCGCCTGGAGAAGATCCCAGAGCCCATCAAAGAGGCCACTGCCCTATTCTACGGACCTGCAGCACCAGCCAGCGGTGCCCTCCCAGAGGGCCAGGTGGATTCACTGCTTTCCATCATCTCCAGCCAGAGGGAGCGCTTCCGTGCCCGGAACCAGGAGCTTGAGGCCGAGAACCGCCTGGCCCAGCACACCCTCCAGGCCCTGCAGAGTGAGCTGGACAGCCTGCGCGCCGACAACATCAAGCTCTTTGAGAAGATCAAGTTCCTGCAGAGCTACCCTGGCCGGGGCAGCGGCAGTGATGACACGGAGCTGCGGTACTCGTCCCAGTACGAGGAGCGCCTGGACCCCTTCTCCTCCTTCAGCAAGCGGGAGCGGCAGAGGAAGTACCTGAGCTTGAGTCCCTGGGACAAGGCCACCCTCAGCATGGGGCGTCTGGTTCTCTCCAACAAGATGGCGCGCACCATCGGCTTCTTCTACACACTGTTCCTGCACTGCCTGGTCTTCCTGGTGCTCTACAAGCTGGCATGGAGCGAGAGCATGGAGAGGGACTGTGCCACCTTCTGCGCCAAGAAGTTCGCTGACCACCTGCACAAGTTCCACGAGAATGACAACGGGGCTGCGGCTGGTGACTTGTGGCAGTGATACCCCGGGGCCTCCCCCGTGACAGTGACGGCTGCGCCTCCACCCCGACTGCTCAGTGCATCTAATCACTTAGACTCCCCTGAAGAATCCCCCATGGAAACTGCCCTTATCCGCTGTCCAGCAGCTGCCAGAGGCCCCAGGTCACCTCGGGTCCCCTTGAAAGAATGTCTCGGTCACATCAGGCCCGCTAGGTCCAGAGAGCGAGCCCCCAATGCCCGGCCAGGCTAAGCCGCAGAGACCCTCTCAGCCCCCACCTCAGGTTAGGGCTCTGCCCGCAGCCTGACCTCTAGCCCTGGTGGCAGAGGTCCCTCAGCTGCGAGGCTAATTGGGTGACCACCGATTCCAGCTGCGGTTAATCCAGCTTGGGCCTGTCTGCACTGCGATCCTCTTGGGCTCTCCTAGGATCCCCCCATGCCCCGTAAGAGGTGGAAGACGCTTCCTTCCAGGACAGCAGGCTTTGAGTCCAGCACCCCCAGCCTGCCTTTGCCACCAGCCCCACCCTGCAGAGTATATGAGGCTTGACAGAGTCTGCCCCCTCCCCCACTGCACCCCAAGAGAGAGAGCCCCAGCCAGCGGAACAGTTTCTATTACCCCCTCCCTGCCCCCAGACCCATGTGATTTCTGCTTTCTTCTTTAGCAAGATATTCTGGTTTCTAGATAAGGAAGAGTCTCTAATGAGCCCCCGAGCCCCAGTCTCTTCAGACTCATGGATTGGTCTGAGGGGTCTGAACGTCTCCTAGCCAATCAGAACTGGCTGTGGACCACCCTAGCACGGCCACCTCTCAGGGCCACTGGCAGGCCTTCCTGAGTTAGATTTGTAGTTGCATATTTAGCTTTGCACATTTGAAATAAACCACGGTTGCAGCCAAAAAAAA

In bioinformatics we look for gene-coding sequences or what we call open reading frames(ORF), entrez has a tool called ORF finder (now you know why I like entrez :-).
http://www.ncbi.nlm.nih.gov/projects/gorf/ Let's use this tool to find out what regions will code for a gene in this sequence.

For your Classwork No. 6
ACTTTGCAGGCAGCGGCGGCCGGGGCGGAGCGGGATCGAGCCCTCGCCGCGGCCTGCCAGTCATGGGCCCGCGCCGCCGCCGCCGCCTGCCTCCCGGGCCACGCGGGCCGTGAGCGCCATGGCCGTAGCCCCCGCGGGCGGCCAGCACGCGCCAGCGCTGGAGGCCCTGCTCGGGGCGGGCGCGTTGCGGCTGCTCGACTCCTCGCAGATCGTCATCATCTCCACCGCGCCCGATGTCGGCGCCCCGCAGCTCCCCGCCGCGCCGCCCACTGGCCCTCGCGATTCTGACGTGCTGCTCTTCGCCACGCCGCAGGCGCCCCGACCCGCGCCTAGTGCACCGCGCCCGGCTCTCGGCCGCCCGCCGGTGAAACGGAGGCTGGATCTGGAGACTGACCATCAGTACCTCGCTGGTAGCAGTGGGCCATTCCGGGGCAGAGGCCGCCACCCAGGGAAAGGTGTGAAATCTCCGGGGGAGAAGTCACGCTATGAAACCTCACTAAATCTGACCACCAAACGCTTCTTGGAGCTGCTGAGCCGCTCAGCTGACGGTGTCGTTGACCTGAACTGGGCAGCTGAGGTGCTGAAGGTGCAGAAACGGCGCATCTATGACATCACCAATGTCCTGGAGGGCATCCAGCTCATTGCCAAGAAGTCCAAGAATCATATCCAGTGGCTAGGCAGCCACACCATGGTGGGGATTGGTAAGCGGCTTGAAGGCCTGACCCAGGACCTGCAGCAACTGCAGGAGAGTGAGCAGCAGCTGGATCACCTGATGCACATCTGTACCACACAGCTGCAACTGCTTTCGGAGGACTCCGACACCCAGCGCCTGGCCTATGTGACCTGCCAGGACCTTCGCAGCATTGCAGACCCTGCAGAACAGATGGTCATAGTGATCAAGGCCCCTCCTGAGACCCAACTACAAGCTGTGGATTCTTCAGAGACATTTCAGATCTCCCTTAAGAGCAAACAAGGCCCCATTGATGTTTTCCTGTGCCCGGAGGAGAGTGCAGACGGGATTAGCCCTGGGAAGACCTCATGCCAGGAGACATCCTCTGGGGAGGACCGGACTGCAGACTCTGGCCCAGCAGGGCCTCCACCATCACCTCCCTCCACATCCCCAGCCTTGGATCCCAGTCAATCCCTGTTGGGCCTGGAGCAAGAAGCAGTATTGCCACGGATGGGCCACCTGAGGGTCCCTATGGAAGAGGACCAACTGTCACCACTGGTGGCTGCTGACTCACTCCTGGAGCATGTTAAAGAAGACTTCTCTGGGCTCCTCCCTGGGGAGTTCATCAGCCTCTCCCCACCCCACGAGGCCCTTGACTATCACTTTGGTCTCGAGGAGGGTGAGGGCATTAGAGATCTCTTTGACTGTGACTTTGGGGACCTGACCCCTCTGGATTTCTGACAGAAGCCTAGGGATTCAGGGTGTCTGGAGATGCCCACCTGTCTGCAGCTTTGGAGCCTCCTGCCCTGGGCCATCCTTCCTGCCTCATTGGAATAGCACGATCCATACCCTCTGTCCCAATAGCTTCTAGCTCTGGGGTTTGGTTGCTGCCACATTGAGCAGACCAAAATGGGAAGGATGTTGTACAGTGTGTGTGCATGCACCCCACACTGCGCACTGTGTGCCTGGGGTGTGTGTCTGAGTGTGTGTGTGTGTGTGTGTGTGTGAGTGTGTGTGTGTGTGTGTGTGAGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTATGTATGTGTATGTGCACGTGTGCCCGGGAATGAAGGTGAACACATCTGTATGTGTGCTGCAGACACATCCTGGTGTGTCCACATGTGTGCATGGATCCATGTGTGCGCATTGGGGTGGGGGTGGGCTCTAACTGCACTTTTGGTGTCCTTGCTGCAGGGGCCCTGTGAGGCCCAGGGTGGCTGCCTGCTTTCAGAATCCTGTGTGTCAGCCAGGCCGGGTGGTACAGCTTGCCTGGCTGGGTTTGCAGGGCAGCAAGAGCACTGCTTAAAAGTTTTCCGATCGAAGCTTTAATGGAGCGTTTATTTATTTATCGAGGCCTCTGGCAAGCCTGGGGGGATAAGCAAAGGGTGGGGGGCATGGGTGATACCTTAAGTCCCTGTTCTCTGAAGCAAGGGCAGGATCCCTACCCAAGAGTTGCTGAGGCCCAAGCAGTTTATTTATTGGGAAAGGGAGAGGGAGACAGACTGACAGCCATGGATGGGCTGGAGAAACAGTCCCTTTGTACCAGTACTCCAGCCGCATGTATCCAGGGGATCTGAGATGGGGAGGGTACGTGAGGGCCTTGGCTGACTGCGGCCAGGAGGGGTGGGTATGCGTCCTTCCTATGGCTGGAGTGCTCCTCTGCTGTCCTCCCCACCCTCCAGTCTGCACTTTGATTTGTTTCCTAACAGTTCTGTTCCCTCCTGCTTTGATTTTAATAAATGTTTTGATG

1. Find the ORF regions
2. Which region is most probably the gene coding region if this sequence contains only a single gene?
3. What is the length of this most probable gene?
4. The gene will encode a protein molecule. How long will this protein molecule be?
5. How many Methionines are encoded in the gene-containing region?
6. In which position is the stop codon found?
7. What is the name of this most probable gene?
8. How did you determine the name of the best-matched gene name: i.e. what was your e-value, total score, number of gaps, %identity.

Saturday, October 1, 2011

Functional Analysis of proteins

Today, let's open expassy and use prosite to look at functional characterization of this protein sequence:
MVQRWLYSTNAKDIAVLYFMLAIFSGMAGTAMSLIIRLELAAPGSQYLHGNSQLFNVLVVGHAVLMIFCAPFRLIYHCIEVLIDKHISVYSINENFTVSFWFWLLVVTYMVFRYVNHMAYPVGANSTGTMACHKSAGVKQPAQGKNCPMARLTNSCKECLGFSLTPSHLGIVIHAYVLEEEVHELTKNESLALSKSWHLEGCTSSNGKLRNTGLSERGNPGDNGVFMVPKFNLNKVRYFSTLSKLNARKEDSLAYLTKINTTDFSELNKLMENNHNKTETINTRILKLMSDIRMLLIAYNKIKSKKGNMSKGSNNITLDGINISYLNKLSKDINTNMFKFSPVRRVEIPKTSGGFRPLSVGNPREKIVQESMRMMLEIIYNNSFSYYSHGFRPNLSCLTAIIQCKNYMQYCNWFIKVDLNKCFDTIPHNMLINVLNERIKDKGFMDLLYKLLRAGYVDKNNNYHNTTLGIPQGSVVSPILCNIFLDKLDKYLENKFENEFNTGNMSNRGRNPIYNSLSSKIYRCKLLSEKLKLIRLRDHYQRNMGSDKSFKRAYFVRYADDIIIGVMGSHNDCKNILNDINNFLKENLGMSINMDKSVIKHSKEGVSFLGYDVKVTPWEKRPYRMIKKGDNFIRVRHHTSLVVNAPIRSIVMKLNKHGYCSHGILGKPRGVGRLIHEEMKTILMHYLAVGRGIMNYYRLATNFTTLRGRITYILFYSCCLTLARKFKLNTVKKVILKFGKVLVDPHSKVSFSIDDFKIRHKMNMTDSNYTPDEILDRYKYMLPRSLSLFSGICQICGSKHDLEVHHVRTLNNAANKIKDDYLLGRMIKMNRKQITICKTCHFKVHQGKYNGPGL

Click on: http://www.expasy.ch/
and open PROSITE
Look at the following:
0. Domain structure of the protein
1. Clustal format(1st 3 sequences)• Retrieve the sequence LOGO from the alignment (for 15 aas)

we'll do the other protein databanks tommorow
2. Taxonomic tree view of all Swiss-Prot/TrEMBL entries matching our protein
3. Retrieve a list of all Swiss-Prot/TrEMBL entries matching our protein
4. Scan Swiss-Prot/TrEMBL entries against our protein
5. view ligand binding statistics on our protein
6. Click on sequence ID and retrieve sequence Logo from alignment

For your classwork, here is your sequence
MLDQQTINIIKATVPVLKEHGVTITTTFYKNLFAKHPEVRPLFDMGRQESLEQPKALAMT
VLAAAQNIENLPAILPAVKKIAVKHCQAGVAAAHYPIVGQELLGAIKEVLGDAATDDILD
AWGKAYGVIADVFIQVEADLYAQAVE

Thursday, September 29, 2011

Looking Bioinformatics Tools used to analyse information contained in Biological Databases

Bioinformatic tools are software programs that are designed for extracting the meaningful information from the mass of molecular biology / biological databases & to carry out sequence or structural analysis.

In this insert we look at:
1. Overview of bionformatics tools,
2. Types of bioinformatics tools, and
3. Application of Programmes in Bioinformatics

1. OVERVIEW
Factors that must be taken into consideration when designing bioinformatics tools, software and programmes are:

The end user (the biologist) may not be a frequent user of computer technology
These software tools must be made available over the internet given the global distribution of the scientific research community

Major categories of Bioinformatics Tools :
There are both standard and customized products to meet the requirements of particular projects. There are data-mining software that retrieve data from genomic sequence databases and also visualization tools to analyze and retrieve information from proteomic databases. These can be classified as homology and similarity tools, protein functional analysis tools, sequence analysis tools and miscellaneous tools.

Here is a brief description of a few of these, everyday bioinformatics is done with sequence search programs like BLAST, sequence analysis programs, like the EMBOSS and Staden packages, structure prediction programs like THREADER or PHD or molecular imaging/modelling programs like RasMol and WHATIF.

2. TYPES OF BIOINFORMATICS TOOLS
(a)Homology and Similarity Tools:

Homologous sequences are sequences that are related by divergence from a common ancestor. Thus the degree of similarity between two sequences can be measured while their homology is a case of being either true of false. This set of tools can be used to identify similarities between novel query sequences of unknown structure and function and database sequences whose structure and function have been elucidated.

(b) Protein Function Analysis:

This group of programs allow you to compare your protein sequence to the secondary (or derived) protein databases that contain information on motifs, signatures and protein domains. Highly significant hits against these different pattern databases allow you to approximate the biochemical function of your query protein.

(c) Structural Analysis:

This set of tools allow you to compare structures with the known structure databases. The function of a protein is more directly a consequence of its structure rather than its sequence with structural homologs tending to share functions. The determination of a protein's 2D/3D structure is crucial in the study of its function.

(d) Sequence Analysis:

This set of tools allows you to carry out further, more detailed analysis on your query sequence including evolutionary analysis, identification of mutations, hydropathy regions, CpG islands and compositional biases. The identification of these and other biological properties are all clues that aid the search to elucidate the specific function of your sequence.

3. SOME EXAMLPES OF BIOINFORMATICS TOOLS:

BLAST:
BLAST ( Basic Local Alignment Search Tool) comes under the category of homology and similarity tools. It is a set of search programs designed for the Windows platform and is used to
perform fast similarity searches regardless of whether the query is for protein or DNA. Comparison of nucleotide sequences in a database can be performed. Also a protein database can be searched to find a match against the queried protein sequence. NCBI has also introduced the new queuing system to BLAST (Q BLAST) that allows users to retrieve results at their convenience and format their results multiple times with different formatting options.

Depending on the type of sequences to compare, there are different programs:

blastp compares an amino acid query sequence against a protein sequence database
blastn compares a nucleotide query sequence against a nucleotide sequence database
blastx compares a nucleotide query sequence translated in all reading frames against a protein sequence database
tblastn compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames
tblastx compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.

FASTA:
FAST homology search A ll sequences .An alignment program for protein sequences created by Pearsin and Lipman in 1988. The program is one of the many heuristic algorithms proposed to speed up sequence comparison. The basic idea is to add a fast prescreen step to locate the highly matching segments between two sequences, and then extend these matching segments to local alignments using more rigorous algorithms such as Smith-Waterman.

EMBOSS:
EMBOSS (European Molecular Biology Open Software Suite) is a software-analysis package. It can work with data in a range of formats and also retrieve sequence data transparently from the Web. Extensive libraries are also provided with this package, allowing other scientists to release their software as open source. It provides a set of sequence-analysis programs, and also supports all UNIX platforms.

Clustalw:
It is a fully automated sequence alignment tool for DNA and protein sequences. It returns the best match over a total length of input sequences, be it a protein or a nucleic acid.

RasMol:
It is a powerful research tool to display the structure of DNA, proteins, and smaller molecules. Protein Explorer, a derivative of RasMol, is an easier to use program.

PROSPECT:
PROSPECT (PROtein Structure Prediction and Evaluation Computer ToolKit) is a protein-structure prediction system that employs a computational technique called protein threading to construct a protein's 3-D model.

PatternHunter :
PatternHunter, based on Java, can identify all approximate repeats in a complete genome in a short time using little memory on a desktop computer. Its features are its advanced patented algorithm and data structures, and the java language used to create it. The Java language version of PatternHunter is just 40 KB, only 1% the size of Blast, while offering a large portion of its functionality.

COPIA :
COPIA (COnsensus Pattern Identification and Analysis) is a protein structure analysis tool for discovering motifs (conserved regions) in a family of protein sequences. Such motifs can be then used to determine membership to the family for new protein sequences, predict secondary and tertiary structure and function of proteins and study evolution history of the sequences.

4. Application of Programmes in Bioinformatics:

JAVA in Bioinformatics:
Since research centers are scattered all around the globe ranging from private to academic settings, and a range of hardware and OSs are being used, Java is emerging as a key player in bioinformatics. Physiome Sciences' computer-based biological simulation technologies and Bioinformatics Solutions' PatternHunter are two examples of the growing adoption of Java in bioinformatics.

Perl in Bioinformatics:
String manipulation, regular expression matching, file parsing, data format interconversion etc are the common text-processing tasks performed in bioinformatics. Perl excels in such tasks and is being used by many developers. Yet, there are no standard modules designed in Perl specifically for the field of bioinformatics. However, developers have designed several of their own individual modules for the purpose, which have become quite popular and are coordinated by the BioPerl project.

Bioinformatics Projects:

BioJava:
The BioJava Project is dedicated to providing Java tools for processing biological data which includes objects for manipulating sequences, dynamic programming, file parsers, simple statistical routines, etc.

BioPerl:
The BioPerl project is an international association of developers of Perl tools for bioinformatics and provides an online resource for modules, scripts and web links for developers of Perl-based software.

BioXML:
A part of the BioPerl project, this is a resource to gather XML documentation, DTDs and XML aware tools for biology in one location.

Biocorba:
Interface objects have facilitated interoperability between bioperl and other perl packages such as Ensembl and the Annotation Workbench. However, interoperability between bioperl and packages written in other languages requires additional support software. CORBA is one such framework for interlanguage support, and the biocorba project is currently implementing a CORBA interface for bioperl. With biocorba, objects written within bioperl will be able to communicate with objects written in biopython and biojava (see the next subsection). For more information, see the biocorba project website at http://biocorba.org/ . The Bioperl BioCORBA server and client bindings are available in the bioperl-corba-server and bioperl-corba-client bioperl CVS repositories respecitively. (see http://cvs.bioperl.org/ for more information).

Ensembl :
Ensembl is an ambitious automated-genome-annotation project at EBI. Much of Ensembl\'s code is based on bioperl, and Ensembl developers, in turn, have contributed significant pieces of code to bioperl. In particular, the bioperl code for automated sequence annotation has been largely contributed by Ensembl developers. Describing Ensembl and its capabilities is far beyond the scope of this tutorial The interested reader is referred to the Ensembl website at http://www.ensembl.org/.

bioperl-db:
Bioperl-db is a relatively new project intended to transfer some of Ensembl's capability of integrating bioperl syntax with a standalone Mysql database ( http://www.mysql.com ) to the bioperl code-base. More details on bioperl-db can be found in the bioperl-db CVS directory at http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-db/?cvsroot=bioperl . It is worth mentioning that most of the bioperl objects mentioned above map directly to tables in the bioperl-db schema. Therefore object data such as sequences, their features, and annotations can be easily loaded into the databases, as in $loader->store($newid,$seqobj) Similarly one can query the database in a variety of ways and retrieve arrays of Seq objects. See biodatabases.pod, Bio::DB::SQL::SeqAdaptor, Bio::DB::SQL::QueryConstraint, and Bio::DB::SQL::BioQuery for examples.

Biopython and biojava:
Biopython and biojava are open source projects with very similar goals to bioperl. However their code is implemented in python and java, respectively. With the development of interface objects and biocorba, it is possible to write java or python objects which can be accessed by a bioperl script, or to call bioperl objects from java or python code. Since biopython and biojava are more recent projects than bioperl, most effort to date has been to port bioperl functionality to biopython and biojava rather than the other way around. However, in the future, some bioinformatics tasks may prove to be more effectively implemented in java or python in which case being able to call them from within bioperl will become more important. For more information, go to the biojava http://biojava.org/ and biopython http://biopython.org/ websites.

Wednesday, September 29, 2010

For your Classwork 4

GACCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGATTGGTTGCACTTTAAATTTTCCCATTAGCCCTATTGAGACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAGTAGAAATTTGTACAGAGATGGAAAAGGAAGGGAAAATTTCAAAAATTGGGCCTGAAAATCCATACAATACTCCAGTATTTGCCATAAAGAAAAAAGACAGTACTAAATGGAGAAAATTAGTAGATTTCAGAGAACTTAATAAGAGAACTCAAGACTTCTGGGAAGTTCAATTAGGAATACCACATCCCGCAGGGTTAAAAAAGAAAAAATCAGTAACAGTACTGGATGTGGGTGATGCATATTTTTCAGTTCCCTTAGATGAAGACTTCAGGAAGTATACTGCATTTACCATACCTAGTATAAACAATGAGACACCAGGGATTAGATATCAGTACAATGTGCTTCCACAGGGATGGAAAGGATCACCAGCAATATTCCAAAGTAGCATGACAAAAATCTTAGAGCCTTTTAGAAAACAAAATCCAGACATAGTTATCTATCAATACATGGATGATTTGTATGTAGGATCTGACTTAGAAATAGGGCAGCATAGAACAAAAATAGAGGAGCTGAGACAACATCTGTTGAGGTGGGGACTTACCACACCAGACAAAAAACATCAGAAAGAACCTCCATTCCTTTGGATGGGTTATGAACTCCATCCTGATAAATGGACAGTACAGCCTATAGTGCTGCCAGAAAAAGACAGCTGGACTGTCAATGACATACAGA

1. What is the name of the protein that contain this DNA sequence?
2. In which organism does was this sequence derived from?
3. When was the sequence loaded in the database?
4. What is the Accession number of the sequence?
5. What is the percentage identity of the sequence to the query seuence?
6. What is the Expectation(E)-value of the results?

Thursday, September 16, 2010

Analyse the following DNA material using Bioinformatics Tools

ATGGATTTATCTGCTCTTCGCGTTGAAGAAGTACAAAATGTCATTAATGCTATGCAGAAAATCTTAGAGTGTCCCATCTGTCTGGAGTTGATCAAGGAACCTGTCTCCACAAAGTGTGACCACATATTTTGCAAATTTTGCATGCTGAAACTTCTCAACCAGAAGAAAGGGCCTTCACAGTGTCCTTTATGTAAGAATGATATAACCAAAAGGAGCCTACAAGAAAGTACGAGATTTAGTCAACTTGTTGAAGAGCTATTGAAAATCATTTGTGCTTTTCAGCTTGACACAGGTTTGGAGTATGCAAACAGCTATAATTTTGCAAAAAAGGAAAATAACTCTCCTGAACATCTAAAAGATGAAGTTTCTATCATCCAAAGTATGGGCTACAGAAACCGTGCCAAAAGACTTCTACAGAGTGAACCCGAAAATCCTTCCTTGCAGGAAACCAGTCTCAGTGTCCAACTCTCTAACCTTGGAACTGTGAGAACTCTGAGGACAAAGCAGCGGATACAACCTCAAAAGACGTCTGTCTACATTGAATTGGGATCTGATTCTTCTGAAGATACCGTTAATAAGGCAACTTATTGCAGTGTGGGAGATCAAGAATTGTTACAAATCACCCCTCAAGGAACCAGGGATGAAATCAGTTTGGATTCTGCAAAAAAGGCTGCTTGTGAATTTTCTGAGACGGATGTAACAAATACTGAACATCATCAACCCAGTAATAATGATTTGAACACCACTGAGAAGCGTGCAGCTGAGAGGCATCCAGAAAAGTATCAGGGTAGTTCTGTTTCAAACTTGCATGTGGAGCCATGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACTAAAGACAGAATGAATGTAGAAAAGGCTGAAT

Drawing a dotplot from sequence alignment

Look at the classwork I gave you containing sequences 1, 2, 3 when sequence 2 and 3 were aligned to sequence and answer questions that follow.
1) Draw a dotplot when sequence 2 is aligned to sequence 1 using a window size of 3 and a stringency of 2.
2) If you increase the window size to 5 and keep the stringency to 2, show how the plot would look like.

How to use Bioinformatics tools

Your friend has isolated a DNA material from a human cells and believes that this piece of DNA material contains a functional domain that is responsible for aerly onset of cancer by binding insulin receptors. Describe how you would use bioinformatics tools to process this DNA material and attempt to charecerize the genetic material. In your asnwer give examples of tools to use and how these tools will answer your intended question.