You have just isolated your DNA from your sample and want to find out which genes will be encoded in the sequence.
ACTCCGTCTCAATATGTCTCAAGATGGCGGCCAATGTGGGATCGATGTTTCAATATTGGAAGCGCTTTGATTTACAGCAGCTGCAGGATTTGCGCAAGCAGGTAGCGCCGCTGCTGAAGAGTTTCCAAGGAGAGATTGATGCACTGAGTAAAAGAAGCAAGGAAGCTGAAGCAGCTTTCTTGAATGTCTACAAAAGATTGATTGACGTCCCAGATCCCGTACCAGCTTTGGATCTCGGACAGCAACTCCAGCTCAAAGTGCAGCGCCTGCACGATATTGAAACAGAGAACCAGAAACTTAGGGAAACTCTGGAAGAATACAACAAGGAATTTGCTGAAGTGAAAAATCAAGAGGTTACGATAAAAGCACTTAAAGAGAAAATCCGAGAATATGAACAGACACTGAAGAACCAAGCCGAAACCATAGCTCTTGAGAAGGAACAGAAGTTACAGAATGACTTTGCAGAAAAGGAGAGAAAGCTGCAGGAGACACAGATGTCCACCACCTCAAAGCTGGAGGAAGCTGAGCATAAGGTTCAGAGCCTACAAACAGCCCTGGAAAAAACTCGAACAGAATTATTTGACCTGAAAACCAAATACGATGAAGAAACTACTGCAAAGGCCGACGAGATTGAAATGATCATGACGGACCTTGAAAGGGCAAACCAGAGGGCAGAGGTGGCTCAGAGAGAGGCGGAGACCTTAAGGGAACAGCTCTCATCGGCCAATCACTCCCTCCAGCTGGCCTCACAGATCCAGAAGGCACCAGACGTGGCCATAGAGGTGCTGACCCGCTCCAGCCTAGAAGTTGAGTTGGCCGCCAAGGAGCGGGAGATCGCACAGCTGGTGGAGGACGTGCAGAGACTCCAGGCCAGCCTCACCAAGCTGCGGGAGAATTCGGCCAGCCAGATCTCACAGCTTGAGCAGCAGCTGAGCGCCAAAAACAGCACACTCAAACAACTGGAAGAAAAACTCAAAGGCCAGGCTGACTATGAAGAGGTGAAGAAAGAGCTGAACATTCTGAAGTCCATGGAGTTTGCACCGTCCGAGGGCGCTGGGACACAGGATGCGGCCAAGCCCCTGGAGGTGCTGTTGCTGGAGAAGAACCGCTCGCTGCAGTCCGAGAACGCCGCGCTGCGCATCTCCAACAGCGACCTGAGCGGACGCTGTGCAGAGCTGCAAGTCCGTATCACTGAGGCTGTGGCCACAGCCACTGAGCAGAGAGAGCTGATCGCCCGCCTGGAGCAGGACCTGAGCATCATTCAGTCCATCCAGCGGCCCGATGCCGAGGGTGCCGCTGAGCACCGCCTGGAGAAGATCCCAGAGCCCATCAAAGAGGCCACTGCCCTATTCTACGGACCTGCAGCACCAGCCAGCGGTGCCCTCCCAGAGGGCCAGGTGGATTCACTGCTTTCCATCATCTCCAGCCAGAGGGAGCGCTTCCGTGCCCGGAACCAGGAGCTTGAGGCCGAGAACCGCCTGGCCCAGCACACCCTCCAGGCCCTGCAGAGTGAGCTGGACAGCCTGCGCGCCGACAACATCAAGCTCTTTGAGAAGATCAAGTTCCTGCAGAGCTACCCTGGCCGGGGCAGCGGCAGTGATGACACGGAGCTGCGGTACTCGTCCCAGTACGAGGAGCGCCTGGACCCCTTCTCCTCCTTCAGCAAGCGGGAGCGGCAGAGGAAGTACCTGAGCTTGAGTCCCTGGGACAAGGCCACCCTCAGCATGGGGCGTCTGGTTCTCTCCAACAAGATGGCGCGCACCATCGGCTTCTTCTACACACTGTTCCTGCACTGCCTGGTCTTCCTGGTGCTCTACAAGCTGGCATGGAGCGAGAGCATGGAGAGGGACTGTGCCACCTTCTGCGCCAAGAAGTTCGCTGACCACCTGCACAAGTTCCACGAGAATGACAACGGGGCTGCGGCTGGTGACTTGTGGCAGTGATACCCCGGGGCCTCCCCCGTGACAGTGACGGCTGCGCCTCCACCCCGACTGCTCAGTGCATCTAATCACTTAGACTCCCCTGAAGAATCCCCCATGGAAACTGCCCTTATCCGCTGTCCAGCAGCTGCCAGAGGCCCCAGGTCACCTCGGGTCCCCTTGAAAGAATGTCTCGGTCACATCAGGCCCGCTAGGTCCAGAGAGCGAGCCCCCAATGCCCGGCCAGGCTAAGCCGCAGAGACCCTCTCAGCCCCCACCTCAGGTTAGGGCTCTGCCCGCAGCCTGACCTCTAGCCCTGGTGGCAGAGGTCCCTCAGCTGCGAGGCTAATTGGGTGACCACCGATTCCAGCTGCGGTTAATCCAGCTTGGGCCTGTCTGCACTGCGATCCTCTTGGGCTCTCCTAGGATCCCCCCATGCCCCGTAAGAGGTGGAAGACGCTTCCTTCCAGGACAGCAGGCTTTGAGTCCAGCACCCCCAGCCTGCCTTTGCCACCAGCCCCACCCTGCAGAGTATATGAGGCTTGACAGAGTCTGCCCCCTCCCCCACTGCACCCCAAGAGAGAGAGCCCCAGCCAGCGGAACAGTTTCTATTACCCCCTCCCTGCCCCCAGACCCATGTGATTTCTGCTTTCTTCTTTAGCAAGATATTCTGGTTTCTAGATAAGGAAGAGTCTCTAATGAGCCCCCGAGCCCCAGTCTCTTCAGACTCATGGATTGGTCTGAGGGGTCTGAACGTCTCCTAGCCAATCAGAACTGGCTGTGGACCACCCTAGCACGGCCACCTCTCAGGGCCACTGGCAGGCCTTCCTGAGTTAGATTTGTAGTTGCATATTTAGCTTTGCACATTTGAAATAAACCACGGTTGCAGCCAAAAAAAA
In bioinformatics we look for gene-coding sequences or what we call open reading frames(ORF), entrez has a tool called ORF finder (now you know why I like entrez :-).
http://www.ncbi.nlm.nih.gov/projects/gorf/ Let's use this tool to find out what regions will code for a gene in this sequence.
For your Classwork No. 6
ACTTTGCAGGCAGCGGCGGCCGGGGCGGAGCGGGATCGAGCCCTCGCCGCGGCCTGCCAGTCATGGGCCCGCGCCGCCGCCGCCGCCTGCCTCCCGGGCCACGCGGGCCGTGAGCGCCATGGCCGTAGCCCCCGCGGGCGGCCAGCACGCGCCAGCGCTGGAGGCCCTGCTCGGGGCGGGCGCGTTGCGGCTGCTCGACTCCTCGCAGATCGTCATCATCTCCACCGCGCCCGATGTCGGCGCCCCGCAGCTCCCCGCCGCGCCGCCCACTGGCCCTCGCGATTCTGACGTGCTGCTCTTCGCCACGCCGCAGGCGCCCCGACCCGCGCCTAGTGCACCGCGCCCGGCTCTCGGCCGCCCGCCGGTGAAACGGAGGCTGGATCTGGAGACTGACCATCAGTACCTCGCTGGTAGCAGTGGGCCATTCCGGGGCAGAGGCCGCCACCCAGGGAAAGGTGTGAAATCTCCGGGGGAGAAGTCACGCTATGAAACCTCACTAAATCTGACCACCAAACGCTTCTTGGAGCTGCTGAGCCGCTCAGCTGACGGTGTCGTTGACCTGAACTGGGCAGCTGAGGTGCTGAAGGTGCAGAAACGGCGCATCTATGACATCACCAATGTCCTGGAGGGCATCCAGCTCATTGCCAAGAAGTCCAAGAATCATATCCAGTGGCTAGGCAGCCACACCATGGTGGGGATTGGTAAGCGGCTTGAAGGCCTGACCCAGGACCTGCAGCAACTGCAGGAGAGTGAGCAGCAGCTGGATCACCTGATGCACATCTGTACCACACAGCTGCAACTGCTTTCGGAGGACTCCGACACCCAGCGCCTGGCCTATGTGACCTGCCAGGACCTTCGCAGCATTGCAGACCCTGCAGAACAGATGGTCATAGTGATCAAGGCCCCTCCTGAGACCCAACTACAAGCTGTGGATTCTTCAGAGACATTTCAGATCTCCCTTAAGAGCAAACAAGGCCCCATTGATGTTTTCCTGTGCCCGGAGGAGAGTGCAGACGGGATTAGCCCTGGGAAGACCTCATGCCAGGAGACATCCTCTGGGGAGGACCGGACTGCAGACTCTGGCCCAGCAGGGCCTCCACCATCACCTCCCTCCACATCCCCAGCCTTGGATCCCAGTCAATCCCTGTTGGGCCTGGAGCAAGAAGCAGTATTGCCACGGATGGGCCACCTGAGGGTCCCTATGGAAGAGGACCAACTGTCACCACTGGTGGCTGCTGACTCACTCCTGGAGCATGTTAAAGAAGACTTCTCTGGGCTCCTCCCTGGGGAGTTCATCAGCCTCTCCCCACCCCACGAGGCCCTTGACTATCACTTTGGTCTCGAGGAGGGTGAGGGCATTAGAGATCTCTTTGACTGTGACTTTGGGGACCTGACCCCTCTGGATTTCTGACAGAAGCCTAGGGATTCAGGGTGTCTGGAGATGCCCACCTGTCTGCAGCTTTGGAGCCTCCTGCCCTGGGCCATCCTTCCTGCCTCATTGGAATAGCACGATCCATACCCTCTGTCCCAATAGCTTCTAGCTCTGGGGTTTGGTTGCTGCCACATTGAGCAGACCAAAATGGGAAGGATGTTGTACAGTGTGTGTGCATGCACCCCACACTGCGCACTGTGTGCCTGGGGTGTGTGTCTGAGTGTGTGTGTGTGTGTGTGTGTGTGAGTGTGTGTGTGTGTGTGTGTGAGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTATGTATGTGTATGTGCACGTGTGCCCGGGAATGAAGGTGAACACATCTGTATGTGTGCTGCAGACACATCCTGGTGTGTCCACATGTGTGCATGGATCCATGTGTGCGCATTGGGGTGGGGGTGGGCTCTAACTGCACTTTTGGTGTCCTTGCTGCAGGGGCCCTGTGAGGCCCAGGGTGGCTGCCTGCTTTCAGAATCCTGTGTGTCAGCCAGGCCGGGTGGTACAGCTTGCCTGGCTGGGTTTGCAGGGCAGCAAGAGCACTGCTTAAAAGTTTTCCGATCGAAGCTTTAATGGAGCGTTTATTTATTTATCGAGGCCTCTGGCAAGCCTGGGGGGATAAGCAAAGGGTGGGGGGCATGGGTGATACCTTAAGTCCCTGTTCTCTGAAGCAAGGGCAGGATCCCTACCCAAGAGTTGCTGAGGCCCAAGCAGTTTATTTATTGGGAAAGGGAGAGGGAGACAGACTGACAGCCATGGATGGGCTGGAGAAACAGTCCCTTTGTACCAGTACTCCAGCCGCATGTATCCAGGGGATCTGAGATGGGGAGGGTACGTGAGGGCCTTGGCTGACTGCGGCCAGGAGGGGTGGGTATGCGTCCTTCCTATGGCTGGAGTGCTCCTCTGCTGTCCTCCCCACCCTCCAGTCTGCACTTTGATTTGTTTCCTAACAGTTCTGTTCCCTCCTGCTTTGATTTTAATAAATGTTTTGATG
1. Find the ORF regions
2. Which region is most probably the gene coding region if this sequence contains only a single gene?
3. What is the length of this most probable gene?
4. The gene will encode a protein molecule. How long will this protein molecule be?
5. How many Methionines are encoded in the gene-containing region?
6. In which position is the stop codon found?
7. What is the name of this most probable gene?
8. How did you determine the name of the best-matched gene name: i.e. what was your e-value, total score, number of gaps, %identity.
Tuesday, October 4, 2011
Saturday, October 1, 2011
Functional Analysis of proteins
Today, let's open expassy and use prosite to look at functional characterization of this protein sequence:
MVQRWLYSTNAKDIAVLYFMLAIFSGMAGTAMSLIIRLELAAPGSQYLHGNSQLFNVLVVGHAVLMIFCAPFRLIYHCIEVLIDKHISVYSINENFTVSFWFWLLVVTYMVFRYVNHMAYPVGANSTGTMACHKSAGVKQPAQGKNCPMARLTNSCKECLGFSLTPSHLGIVIHAYVLEEEVHELTKNESLALSKSWHLEGCTSSNGKLRNTGLSERGNPGDNGVFMVPKFNLNKVRYFSTLSKLNARKEDSLAYLTKINTTDFSELNKLMENNHNKTETINTRILKLMSDIRMLLIAYNKIKSKKGNMSKGSNNITLDGINISYLNKLSKDINTNMFKFSPVRRVEIPKTSGGFRPLSVGNPREKIVQESMRMMLEIIYNNSFSYYSHGFRPNLSCLTAIIQCKNYMQYCNWFIKVDLNKCFDTIPHNMLINVLNERIKDKGFMDLLYKLLRAGYVDKNNNYHNTTLGIPQGSVVSPILCNIFLDKLDKYLENKFENEFNTGNMSNRGRNPIYNSLSSKIYRCKLLSEKLKLIRLRDHYQRNMGSDKSFKRAYFVRYADDIIIGVMGSHNDCKNILNDINNFLKENLGMSINMDKSVIKHSKEGVSFLGYDVKVTPWEKRPYRMIKKGDNFIRVRHHTSLVVNAPIRSIVMKLNKHGYCSHGILGKPRGVGRLIHEEMKTILMHYLAVGRGIMNYYRLATNFTTLRGRITYILFYSCCLTLARKFKLNTVKKVILKFGKVLVDPHSKVSFSIDDFKIRHKMNMTDSNYTPDEILDRYKYMLPRSLSLFSGICQICGSKHDLEVHHVRTLNNAANKIKDDYLLGRMIKMNRKQITICKTCHFKVHQGKYNGPGL
Click on: http://www.expasy.ch/
and open PROSITE
Look at the following:
0. Domain structure of the protein
1. Clustal format(1st 3 sequences)• Retrieve the sequence LOGO from the alignment (for 15 aas)
we'll do the other protein databanks tommorow
2. Taxonomic tree view of all Swiss-Prot/TrEMBL entries matching our protein
3. Retrieve a list of all Swiss-Prot/TrEMBL entries matching our protein
4. Scan Swiss-Prot/TrEMBL entries against our protein
5. view ligand binding statistics on our protein
6. Click on sequence ID and retrieve sequence Logo from alignment
For your classwork, here is your sequence
MLDQQTINIIKATVPVLKEHGVTITTTFYKNLFAKHPEVRPLFDMGRQESLEQPKALAMT
VLAAAQNIENLPAILPAVKKIAVKHCQAGVAAAHYPIVGQELLGAIKEVLGDAATDDILD
AWGKAYGVIADVFIQVEADLYAQAVE
MVQRWLYSTNAKDIAVLYFMLAIFSGMAGTAMSLIIRLELAAPGSQYLHGNSQLFNVLVVGHAVLMIFCAPFRLIYHCIEVLIDKHISVYSINENFTVSFWFWLLVVTYMVFRYVNHMAYPVGANSTGTMACHKSAGVKQPAQGKNCPMARLTNSCKECLGFSLTPSHLGIVIHAYVLEEEVHELTKNESLALSKSWHLEGCTSSNGKLRNTGLSERGNPGDNGVFMVPKFNLNKVRYFSTLSKLNARKEDSLAYLTKINTTDFSELNKLMENNHNKTETINTRILKLMSDIRMLLIAYNKIKSKKGNMSKGSNNITLDGINISYLNKLSKDINTNMFKFSPVRRVEIPKTSGGFRPLSVGNPREKIVQESMRMMLEIIYNNSFSYYSHGFRPNLSCLTAIIQCKNYMQYCNWFIKVDLNKCFDTIPHNMLINVLNERIKDKGFMDLLYKLLRAGYVDKNNNYHNTTLGIPQGSVVSPILCNIFLDKLDKYLENKFENEFNTGNMSNRGRNPIYNSLSSKIYRCKLLSEKLKLIRLRDHYQRNMGSDKSFKRAYFVRYADDIIIGVMGSHNDCKNILNDINNFLKENLGMSINMDKSVIKHSKEGVSFLGYDVKVTPWEKRPYRMIKKGDNFIRVRHHTSLVVNAPIRSIVMKLNKHGYCSHGILGKPRGVGRLIHEEMKTILMHYLAVGRGIMNYYRLATNFTTLRGRITYILFYSCCLTLARKFKLNTVKKVILKFGKVLVDPHSKVSFSIDDFKIRHKMNMTDSNYTPDEILDRYKYMLPRSLSLFSGICQICGSKHDLEVHHVRTLNNAANKIKDDYLLGRMIKMNRKQITICKTCHFKVHQGKYNGPGL
Click on: http://www.expasy.ch/
and open PROSITE
Look at the following:
0. Domain structure of the protein
1. Clustal format(1st 3 sequences)• Retrieve the sequence LOGO from the alignment (for 15 aas)
we'll do the other protein databanks tommorow
2. Taxonomic tree view of all Swiss-Prot/TrEMBL entries matching our protein
3. Retrieve a list of all Swiss-Prot/TrEMBL entries matching our protein
4. Scan Swiss-Prot/TrEMBL entries against our protein
5. view ligand binding statistics on our protein
6. Click on sequence ID and retrieve sequence Logo from alignment
For your classwork, here is your sequence
MLDQQTINIIKATVPVLKEHGVTITTTFYKNLFAKHPEVRPLFDMGRQESLEQPKALAMT
VLAAAQNIENLPAILPAVKKIAVKHCQAGVAAAHYPIVGQELLGAIKEVLGDAATDDILD
AWGKAYGVIADVFIQVEADLYAQAVE
Subscribe to:
Posts (Atom)