C’est la séance 2B0 de la formation 2013-2014.

Ce TP met en œuvre la recherche de sous-chaînes et l’utilisation d’heuristiques dans un contexte d’application à la génomique.

Il est inspiré par le document élaboré par Pierrick Bouttier au sein du groupe Algorithmique et Logique de l’IREM, lui même basé sur l’article À la recherche de régions codantes de François Rechenmann sur http://interstices.info.

Introduction

On reprend ici les deux premiers paragraphes de l’article de François Rechenmann, ce qui nous évitera de mal les reformuler.

Données

Pour référence, voici la liste des nucléotides 285000 à 290999 du génome de Bacillus subtilis subsp. subtilis str. 168, tirée de la base de données European Nucleotide Archive (ENA).

TTTAACGACTATCGCGGATATGTCCGCTGTACAGTGACGCCTCACCAAGTGGAAAGCCGA
TTATCGGGTGATGCCATTTGTGACCGAGCCGGGCGCAGCCATTTCCACGCGGGCTTCATT
CGTTTACCAGAAAGACCAAACCGGGTTGAGAAAGGTATCATCCACAACAATCCAAGGCGG
GGTGAAGCAATCCGATGAGGTCGAAGAGGATCGTTTCTTTTCGCACAACAAAGCCCACGA
AAAACAAATGATTAAGAAGCGTGCAAAAATCACGAATTAAGGAGTGGAAATTATGTTTTC
AAACATTGGAATACCGGGCTTGATTCTCATCTTCGTCATCGCCCTCATTATTTTTGGCCC
TTCCAAGCTGCCGGAAATCGGGCGTGCCGCCGGACGGACACTGCTGGAATTTAAAAGCGC
CACAAAATCACTTGTGTCTGGTGATGAAAAAGAAGAGAAATCAGCTGAGCTGACAGCGGT
AAAGCAGGACAAAAACGCGGGCTGAATGCTGATGAGGCAGACACCGGGTCTGCCTCTTTT
TTTATGAAAGGGAGGGCTTTTTTGAATGGATAAAAAAGAAACCCATCTGATCGGGCATTT
AGAAGAGCTTCGCCGCCGGATTATCGTCACCCTTGCGGCATTTTTTCTATTTCTCATCAC
GGCTTTTTTGTTCGTACAGGACATTTATGACTGGCTGATCAGGGATTTGGATGGAAAGCT
GGCTGTGCTAGGACCGAGTGAAATCCTCTGGGTGTATATGATGCTTTCCGGCATTTGTGC
CATTGCGGCTTCTATCCCTGTTGCCGCGTACCAGCTGTGGCGTTTCGTTGCACCGGCGCT
GACTAAAACGGAGCGCAAGGTGACGCTCATGTACATCATGTACATACCAGGTTTATTTGC
GTTGTTTTTGGCGGGCATCTCCTTCGGATACTTTGTCTTGTTTCCGATCGTGCTCAGCTT
TTTGACTCATTTATCCTCCGGCCACTTTGAAACGATGTTTACGGCTGACCGCTACTTTAG
GTTTATGGTGAATTTGAGCCTGCCGTTCGGCTTCTTGTTTGAGATGCCCTTGGTGGTGAT
GTTTTTAACAAGGCTGGGCATCTTAAATCCTTACAGACTGGCCAAAGCGAGAAAGCTTTC
CTATTTTCTGCTGATTGTCGTGTCCATATTGATTACACCGCCTGATTTTATTTCTGATTT
TCTCGTGATGATCCCGCTTCTTGTCCTGTTTGAAGTGAGTGTCACCCTATCGGCGTTTGT
CTACAAAAAGAGGATGAGGGAAGAAACAGCGGCGGCCGCTTAGTGCAGCGTACCACCCGG
TGACTTCACATCCTCATCATATTGTGCGGCCGTAACAGCGGCGATTCTCAATGCCCGGAC
AATCGTGTCCAGGCTGAGGCTCGGCGCTGTTTTGTCGATTGTTTGCTGCGGAATGTAAGG
AATATGAATAAAACCGCCGCGAATGTGTGGGGATGTCCGGCTAATGTGATCCATTAACCC
GTAGAACAAATAGTTGCATACAAAGGTCCCCGCTGTGTAGGAAACCGCAGCTGGAATGCC
GTGTTCCTTCATCTTAGCAGTCATTCGTTTCACGGGAAGCCTTGTCCAGTAAGCGGCGGG
CCCATCTGGAGAAATCTCTTCATCAATCGGCTGATGTCCTTCGTTATCGGGGATTCGCGC
ATCTGCAAGGTTGATTGCCACTCGTTCCGGTGTAATCTGCATCCGTCCTCCTGCTTGGCC
GACACAAATTACGATATCTGGCTGATGTTTTTGAATGGCTTGGCGCAGAGTGTCCAGAGC
GGATCTAAAGACGGTTGGAATTTGTTCCGCTGTAATAATGGCTTCTTCTGTCTCGAAGCC
ATTAAGCCGTTTCGCCGCTTCCCATGATGGATTGACGGTTTCTTTGTCAAAAGGGTCAAA
GCCTGTGATCAGCACTTTTTTTCTCATTCTCCCATCTCCTTTTTCTTTTATTCTATTGTT
TATTTATGGGTTTTTCATCAAAATAATGTAAAGGAGTGAATCATAATGGAGCATTTGCCG
GAGCAGTATCGCCAGTTATTCCCAACCTTGCAGACGCATACGATGCTTGCCAGCTGTTCT
CAGAGCGCATTGGCAGAGCCTGTATCAAGGGCGATCCAGGATTATTATGATAGCCTGCTG
TATAAAGGGACGAACTGGAAAGAAGCGATTGAAAAAACAGAGTTTGCGAGAAACGAGTTT
GCAAAGCTGATCGGGGCTGAACCGGATGAAGTGGCGATTGTGCCGTCAGTTTCTGATGCA
CTGGTTTCTGTAGCATCGTCCTTAACTGCATTTGGAAAGAAGCACGTTGTATATACAGAT
ATGGATTTTCCGGCGGTGCCTCATGTTTGGCAGGCACACTCCGATTATACCGTATCCGTC
ATTCCATCAATAGACGGCGTGCTGCCGCTTGAACAATATGAAACGCATATTTCGGATGAA
ACAGTACTGACGTGTGTTCCTCACGTTCATTATCGTGACGGCTATGTTCAGGATATAAAA
GCGATTGCCGAGATTTCTCAGAGAAAGGGCTCTTTATTGTTTGTAGATGCTTATCAATCA
GCCGGGCATATTCCCATTGATGTGAAGGAATGGGGCGTAGATATGCTGGCAGCAGGCACC
CGGAAGTATTTGCTCGGCATACCGGGTGTGGCGTTTCTTTATGTGAGAAAGGAGCTGGCT
GACGCACTGAAGCCGAAAGCATCAGCTTGGTTCGGAAGAGAGAGCGGATTTGATGGGGCT
TATGCAAAAGTCGCGCGCCGTTTTCAAACGGGCACCCCAGCTTTTATCAGCGTATACGCA
GCTGCAGCGGCTTTATCGCTGCTGAATCATATTGGGGTTTCTCATATCAGGGATCATGTG
AAAACGATCTGTGCCGATGCAGTTCAATATGCCGCTGAAAAAGGCCTGCAGCTGGCGGCG
GCACAAGGTGGGATTCAGCCGGGCATGGTTGCGATCCGGGATGAGCGGGCATCGGAAACG
GCGGGGTTGCTGAAGAAGAAAAAAGTGATTTGCGCGCCGCGGGAAAATGTTATCCGTCTC
GCTCCCCATTTTTATAATACGAAGGAGGAAATGCGGCACGCGATTGATGAAATCGCGGCG
AAAACGATCCACAAGTAAACATGAAAAAGCCCCTGAACACTAGTCAGGGGCTTTTCATAT
TAATGATCTACTTTAACGCGTTTCATAAAGAAAGCGCCAATTAAACCGATAATGGCAACA
ATCATTGCAAACACAAATGCGTGCTGTACGCCTGCTGTCAAAGCTTGCGGGATGACTGCC
GGATCGGCAGGGTTTTTAACTGTACTCATATAATCATGCTGGCCTGCAGCCATAATGCTG
ACCGCAACCGCTGTTCCGATAGCGCCGGCCATTTGCTGCAGCGTGTTCATAATGGCGGTG
CCGTCTGGATAAAATTCACGCGGCAGTTGGTTTAAACCGTTTGTCTGTGCAGGCATCATG
ATCATAGAAATCCCGATCATCAAGCAGGTGTGCAGGATGATAATCAGCACAGCTGTTGAA
GTGGTCGTGACATTTGAGAAGAACCATAGTACAACGGTGACAATCACAAATCCCGGAATG
ACAAGCCATTTCGGCCCGTATTTATCGAACAAGCGGCCTGTAACAGGGGACATAAATCCA
TTTAAAATACCGCCCGGCAAGAGAACAAGACCAGATGCAAATGCAGTGAGGACTAAGCCG
CCTTGCAGATACATCGGCAGAAGCAGCATAGATGACAGAATGACCATCATACAAATGAAC
ACCATGATCACACCCAAAATAAACATCGGGTATTTGAACGCACGGAGGTTCATCATAGGC
TGCTTCATTGTCAGCTGGCGGATTGAAAATAAGATAAGGCCGACAACGCCGACAATCAGC
GACACGATAACAGTCGGGCTGGACCATCCCCCGGAGCCTTCACCCGCGTTGCTGAATCCG
AATACAATGCCGCCGAAGCCAATCGTCGACAGGATGATAGACAATACATCGATTTTCGGC
TTTGTCGTTTCAGATACATTTTGCATATATGCGATACCGAAAACAAGCGCCAGCACAAGG
AATGGAAGAGAGATCCAGAAAATCCAGTGCCAGTTGAGATGCTCCAGAACCAATCCTGAG
AAAGTTGGGCCGATGGCGGGCGCGAACATAATGACAAGCCCGATCGTTCCCATTGCGGCA
CCCCGTTTATGAGGCGGGAAAATCACCAAGATTGTGTTAAACATCAGCGGCAGTAAAAGA
CCGGTTCCAAGTGCCTGAACGATCCTTGCCGCTAATAAAAACGAGAAGCTCGGCGCAAGC
GCCGCAATGAATGTACCTAAAATTGAAAAGATAAGTGACACGGTAAAAAGCTGTCTTGTT
GTGAACCACTGCAACAGCAGTCCTGAAACAGGAACAAGGATACCGAGTACAAGCAGGTAG
CCCGTCGTTAACCATTGGACGGTTGCCGCTGTAATGTTCAATTCCTTCATAAGGTCGGTT
AACGCAATATTCAGCGCTGTTTCACTGAACATGCCGATAAAACCGGCCAACAGCAAGGAA
ATCATAATCGGCATCACTTTGTATTGCTGAGATGCTTTAGCTGTTGTTTCCAAAATCATT
TCCCCTCTCTATCAACTGCATGTAGTATGTCGTTTTTTTTATCTCTTCAGCAGGTCAGGA
ATGCAGCTGGAGATATGAAGGAGCGGCGTACTGTTTTTTGCCGTCAAAGATAAAAGGATG
CCGCCTTCAATCATCGCGTTAACCACAGTGCTGGCTTCTTTTGCACGGCTCTCGCTGCAG
CCAGTCTGCCGCAGTTTTTCCTCATACACAGAGGCCCATTCTTTGTAGGCTTCATGACAG
GCTTCGCGCAACGGTTCGCTTTTCAATGACGTCTCAGCCGCTAGCAAGCCCACAGGCAAG
CCTTCAATGTCTTCCGTACATGAAAACTGGCAGGAGAGCTCCTTCAAAAAGGCTTGAATG
CCTTCCGCTGGATCGGTGCAGGCTTCCATGCAGTCCGCGATTTTCTGACGGATATACTCC
TTCATCTCATTCACGGCTTCGATCGCAAGCTGTTCTTTACCCCCGGGAAAGTGGTAGTAA
AGAGAGCCTTTAGGCGCGCCGCTTTCCTTTATAATCTGGTTCAGCCCCGTGCCGTAATAC
CCTTGCAGCTGAAAAAGCCGGGTAGCTGCCGAAAGGATTTTCTCACGGGAATCTCCATAA
CTCATAACATTCCCACCTTACTGAATTGCAATCAAAAATATAGTGACTGGTCTATTATCT
TGATTCAATCATCAATTGTCAAGAAAAATTCATTGTATGAAAAGACAAAAAAAGAAGGAT
ATGACAACAAAAAATACTGAGAGAAAAGCTGACTGATCTTTTGACTGAATAGATAAAATG
TACAATGATTAATCATCATATGGATGTAAGGAGAGAAATAGATGAAAAAACAACGAATGC
TCGTACTTTTTACCGCACTATTGTTTGTTTTTACCGGATGTTCACATTCTCCTGAAACAA
AAGAATCCCCGAAAGAAAAAGCTCAGACACAAAAAGTCTCTTCGGCTTCTGCCTCTGAAA
AAAAGGATCTGCCAAACATTAGAATTTTAGCGACAGGAGGCACGATAGCTGGTGCCGATC
AATCGAAAACCTCAACAACTGAATATAAAGCAGGTGTTGTCGGCGTTGAATCACTGATCG
AGGCAGTTCCAGAAATGAAGGACATTGCAAACGTCAGCGGCGAGCAGATTGTTAACGTCG
GCAGCACAAATATTGATAATAAAATATTGCTGAAGCTGGCGAAACGCATCAACCACTTGC
TCGCTTCAGATGATGTAGACGGAATCGTCGTGACTCATGGAACAGATACATTGGAGGAAA
CCGCTTATTTTTTGAATCTTACCGTGAAAAGTGATAAACCGGTTGTTATTGTCGGTTCGA
TGAGACCTTCCACAGCCATCAGCGCTGATGGGCCTTCTAACCTGTACAATGCAGTGAAAG