Analyse
bioinformatique des séquences
Principes généraux : Tous les programmes s'utilisent en tapant la commande Fichiers de séquences : au format PHYLIP (entrelacés) Passage par READSEQ (ou sortie de CLUSTALW) Les programmes lisent TOUJOURS un fichier infile Les programmes génèrent les fichiers outfile résultats treefile fichiers de représentations des topologies (parenthésées) plotfile fichier graphique Comme les programmes utilisent les sorties de programmes comme entrées d'autres, il est INDISPENSABLE de renommer les fichiers outfile (en infile) à chaque étape Parcimonie dnapars (acides nucléiques) ou protpars (proteines) Nécessité de fichiers PHYLIP (de séquences alignées) utilisation de l'option de fichiers PHYLIP en sortie de CLUSTALW Your choice: 9 ********* Format of Alignment Output ********* 1. Toggle CLUSTAL format output = ON 2. Toggle NBRF/PIR format output = OFF 3. Toggle GCG/MSF format output = OFF 4. Toggle PHYLIP format output = ON 5. Toggle GDE format output = OFF 6. Toggle GDE output case = LOWER 7. Toggle output order = INPUT FILE 8. Create alignment output file(s) now? 9. Toggle parameter output = OFF H. HELP fichier.aln CLUSTAL W(1.6) multiple sequence alignment CHKHBA_J00 ----------------------ACACAGAGGTGCAACCATGGTGCTGTCCGCTGCTGACA DUKHBADWP CGCAACCCCGTCAGTTGCCAGCCTGCCACACCGCTGCCGCCATGCTGACCGCCGAGGACA SMRHBAA_M1 -------------------------AACCACCGCAAACATGAAGCTGACTGCCGAAGATA XELHBA_J00 -----------------TGCACAACACAAACAGGAACCATGCTTCTTTCAGCCGATGACA DAVAGL_M14 -----------------------------------------GTGCTCTCGGATGCTGACA ** * * * ** * CHKHBA_J00 AGAACAACGTCAAGGGCATCTTCACCAAAATCGCCGGCCATGCTGAGGAGTATGGCGCCG DUKHBADWP AGAAGCTCATCACGCAGTTGTGGGAGAAGGTGGCTGGCCACCAGGAGGAATTCGGAAGTG SMRHBAA_M1 AACATAATGTGAAGGCCATCTGGGATCATGTCAAAGGACATGAAGAGGCGATTGGTGCAG XELHBA_J00 AGAAACACATCAAGGCAATTATGCCTCCTATCGCTGCCCATGGCGACAAATTTGGGGGAG DAVAGL_M14 AGACTCACGTGAAAGCCATCTGGGGTAAGGTGGGAGGCCACGCCGGTGCCTACGCAGCTG * * * * * * ** * * * fichier.phy lovelace$ more tofasta.phy 5 589 CHKHBA_J00 ---------- ---------- --ACACAGAG GTGCAACCAT GGTGCTGTCC DUKHBADWP CGCAACCCCG TCAGTTGCCA GCCTGCCACA CCGCTGCCGC CATGCTGACC SMRHBAA_M1 ---------- ---------- -----AACCA CCGCAAACAT GAAGCTGACT XELHBA_J00 ---------- -------TGC ACAACACAAA CAGGAACCAT GCTTCTTTCA DAVAGL_M14 ---------- ---------- ---------- ---------- -GTGCTCTCG GCTGCTGACA AGAACAACGT CAAGGGCATC TTCACCAAAA TCGCCGGCCA GCCGAGGACA AGAAGCTCAT CACGCAGTTG TGGGAGAAGG TGGCTGGCCA GCCGAAGATA AACATAATGT GAAGGCCATC TGGGATCATG TCAAAGGACA GCCGATGACA AGAAACACAT CAAGGCAATT ATGCCTCCTA TCGCTGCCCA GATGCTGACA AGACTCACGT GAAAGCCATC TGGGGTAAGG TGGGAGGCCA TGCTGAGGAG TATGGCGCCG AGACCTTGGA AAGGATGTTC ACCACCTACC CCAGGAGGAA TTCGGAAGTG AAGCTCTGCA GAGGATGTTC CTCGCCTACC TGAAGAGGCG ATTGGTGCAG AAGCTCTTTA CAGGATGTTC TGTTGTATGC TGGCGACAAA TTTGGGGGAG AAGCTTTGTA CAGGATGTTC ATAGTCAACC CGCCGGTGCC TACGCAGCTG AAGCTCTTGC CAGAACCTTC CTCTCCTTCC lovelace$ protpars protpars: can't read infile Please enter a new filename>fmts.phy Protein parsimony algorithm, version 3.55c Setting for this run: U Search for best tree? Yes J Randomize input order of sequences? No. Use input order O Outgroup root? No, use as outgroup species 1 T Use Threshold parsimony? No, use ordinary parsimony M Analyze multiple data sets? No I Input sequences interleaved? Yes 0 Terminal type (IBM PC, VT52, ANSI)? ANSI 1 Print out the data at start of run No 2 Print indications of progress of run Yes 3 Print o8 janvier, 2008o 5 Print sequences at all nodes of tree No 6 Write out trees onto tree file? Yes Are these settings correct? (type Y or the letter for one to change) Y Adding species: CHKHBA_J00 DUKHBADWP SMRHBAA_M1 XELHBA_J00 DAVAGL_M14 Doing global rearrangements !---------! ......... Output written to output file Trees also written onto file l Protein parsimony algorithm, version 3.55c One most parsimonious tree found: +-----XELHBA_J00 +--3 ! ! +--DAVAGL_M14 +--2 +--4 ! ! +--SMRHBAA_M1 --1 ! ! +--------DUKHBADWP ! +-----------CHKHBA_J00 remember: this is an unrooted tree! requires a total of 1400.000 lovelace$ more treefile (((XELHBA_J00,(DAVAGL_M14,SMRHBAA_M1)),DUKHBADWP),CHKHBA_J00); lovelace$ arbre sans distances Distances lovelace$ dnadist dnadist: can't read infile Please enter a new filename>tofasta.phy Nucleic acid sequence Distance Matrix program, version 3.55c Settings for this run: D Distance (Kimura, Jin/Nei, ML, J-C)? Kimura 2-parameter T Transition/transversion ratio? 2.0 C One category of substitution rates? Yes L Form of distance matrix? Square M Analyze multiple data sets? No I Input sequences interleaved? Yes 0 Terminal type (IBM PC, VT52, ANSI)? ANSI 1 Print out the data at start of run No 2 Print indications of progress of run Yes Are these settings correct? (type Y or letter for one to change) Y Distances calculated for species CHKHBA_J00 .... DUKHBADWP ... SMRHBAA_M1 .. XELHBA_J00 . DAVAGL_M14 Distances written to file lovelace$ more outfile 5 CHKHBA_J00 0.0000 0.5962 0.9649 0.7203 0.6094 DUKHBADWP 0.5962 0.0000 1.0130 0.7741 0.5435 SMRHBAA_M1 0.9649 1.0130 0.0000 0.9289 0.9209 XELHBA_J00 0.7203 0.7741 0.9289 0.0000 0.8969 DAVAGL_M14 0.6094 0.5435 0.9209 0.8969 0.0000 lovelace$mv outfile infile lovelace$ fitch Fitch-Margoliash method version 3.55c Settings for this run: U Search for best tree? Yes P Power? 2.00000 - Negative branch lengths allowed? No O Outgroup root? No, use as outgroup species 1 L Lower-triangular data matrix? No R Upper-triangular data matrix? No S Subreplicates? No G Global rearrangements? No J Randomize input order of species? No. Use input order M Analyze multiple data sets? No 0 Terminal type (IBM PC, VT52, ANSI)? ANSI 1 Print out the data at start of run No 2 Print indications of progress of run Yes 3 Print out tree Yes 4 Write out trees onto tree file? Yes Are these settings correct? (type Y or the letter for one to change) y Adding species: CHKHBA_J00 DUKHBADWP SMRHBAA_M1 XELHBA_J00 DAVAGL_M14 Output written to output file Tree also written onto file lovelace$ more outfile 5 Populations Fitch-Margoliash method version 3.55c __ __ 2 \ \ (Obs - Exp) Sum of squares = /_ /_ ------------ 2 i j Obs Negative branch lengths not allowed +----------------DAVAGL_M14 +---3 ! +--------------DUKHBADWP ! ! +----------------------XELHBA_J00 --1-----2 ! +--------------------------------SMRHBAA_M1 ! +---------------CHKHBA_J00 remember: this is an unrooted tree! Sum of squares = 0.03950 Average percent standard deviation = 4.68447 examined 15 trees Between And Length ------- --- ------ 1 3 0.06233 3 DAVAGL_M14 0.28139 3 DUKHBADWP 0.26211 1 2 0.09924 2 XELHBA_J00 0.37775 2 SMRHBAA_M1 0.55115 1 CHKHBA_J00 0.26879 lovelace$ more treefile ((DAVAGL_M14:0.28139,DUKHBADWP:0.26211):0.06233,(XELHBA_J00:0.37775, SMRHBAA_M1:0.55115):0.09924,CHKHBA_J00:0.26879); lovelace$ lovelace$ neighbor Neighbor-Joining/UPGMA method version 3.5 Settings for this run: N Neighbor-joining or UPGMA tree? Neighbor-joining O Outgroup root? No, use as outgroup species 1 L Lower-triangular data matrix? No R Upper-triangular data matrix? No S Subreplicates? No J Randomize input order of species? No. Use input order M Analyze multiple data sets? No 0 Terminal type (IBM PC, VT52, ANSI)? ANSI 1 Print out the data at start of run No 2 Print indications of progress of run Yes 3 Print out tree Yes 4 Write out trees onto tree file? Yes Are these settings correct? (type Y or the letter for one to change) y CYCLE 2: OTU 3 ( 0.54903) JOINS OTU 4 ( 0.37987) CYCLE 1: OTU 1 ( 0.27209) JOINS NODE 3 ( 0.10606) LAST CYCLE: NODE 1 ( 0.05896) JOINS OTU 2 ( 0.26461) JOINS OTU 5 ( 0.27889) Output written on output file Tree written on tree file lovelace$ more outfile 5 Populations Neighbor-Joining/UPGMA method version 3.55c Neighbor-joining method Negative branch lengths allowed +---------------DUKHBADWP ! --3----------------DAVAGL_M14 ! ! +---------------CHKHBA_J00 +---2 ! +--------------------------------SMRHBAA_M1 +-----1 +----------------------XELHBA_J00 remember: this is an unrooted tree! Between And Length ------- --- ------ 3 DUKHBADWP 0.26461 3 DAVAGL_M14 0.27889 3 2 0.05896 2 CHKHBA_J00 0.27209 2 1 0.10606 1 SMRHBAA_M1 0.54903 1 XELHBA_J00 0.37987 lovelace$ more treefile (DUKHBADWP:0.26461,DAVAGL_M14:0.27889,(CHKHBA_J00:0.27209, (SMRHBAA_M1:0.54903,XELHBA_J00:0.37987):0.10606):0.05896); lovelace$ lovelace$ kitsch Fitch-Margoliash method with contemporary tips, version 3.55c Settings for this run: U Search for best tree? Yes P Power? 2.00000 - Negative branch lengths allowed? No L Lower-triangular data matrix? No R Upper-triangular data matrix? No S Subreplicates? No J Randomize input order of species? No. Use input order M Analyze multiple data sets? No 0 Terminal type (IBM PC, VT52, ANSI)? ANSI 1 Print out the data at start of run No 2 Print indications of progress of run Yes 3 Print out tree Yes 4 Write out trees onto tree file? Yes Are these settings correct? (type Y or the letter for one to change) y Adding species: CHKHBA_J00 DUKHBADWP SMRHBAA_M1 XELHBA_J00 DAVAGL_M14 Doing global rearrangements !---------! ......... Output written to output file Tree also written onto file lovelace$ more outfile 5 Populations Fitch-Margoliash method with contemporary tips, version 3.55c __ __ 2 \ \ (Obs - Exp) Sum of squares = /_ /_ ------------ 2 i j Obs negative branch lengths not allowed +---------------DAVAGL_M14 +--4 +-----1 +---------------DUKHBADWP ! ! +----3 +-----------------CHKHBA_J00 ! ! --2 +-----------------------XELHBA_J00 ! +----------------------------SMRHBAA_M1 Sum of squares = 0.059 Average percent standard deviation = 5.73593 examined 72 trees From To Length Time ---- -- ------ ---- 4 DAVAGL_M14 0.27175 0.47712 1 4 0.02958 0.20537 4 DUKHBADWP 0.27175 0.47712 3 1 0.09078 0.17580 1 CHKHBA_J00 0.30133 0.47712 2 3 0.08501 0.08501 3 XELHBA_J00 0.39211 0.47712 2 SMRHBAA_M1 0.47712 0.47712 lovelace$ more treefile ((((DAVAGL_M14:0.27175,DUKHBADWP:0.27175):0.02958,CHKHBA_J00:0.30133):0.09078, XELHBA_J00:0.39211):0.08501,SMRHBAA_M1:0.47712); lovelace$ dnaml Nucleic acid sequence Maximum Likelihood method, version 3.55c Settings for this run: U Search for best tree? Yes T Transition/transversion ratio: 2.0000 F Use empirical base frequencies? Yes C One category of substitution rates? Yes G Global rearrangements? No J Randomize input order of sequences? No. Use input order O Outgroup root? No, use as outgroup species 1 M Analyze multiple data sets? No I Input sequences interleaved? Yes 0 Terminal type (IBM PC, VT52, ANSI)? ANSI 1 Print out the data at start of run No 2 Print indications of progress of run Yes 3 Print out tree Yes 4 Write out trees onto tree file? Yes Are these settings correct? (type Y or the letter for one to change) Y Adding species: CHKHBA DUKHBADWP SMRHBAA XELHBA DAVAGL Output written to output file Tree also written onto file lovelace$ more outfile Nucleic acid sequence Maximum Likelihood method, version 3.55c Empirical Base Frequencies: A 0.25368 C 0.29449 G 0.23346 T(U) 0.21838 Transition/transversion ratio = 2.000000 (Transition/transversion parameter = 1.523022) +--------------DAVAGL +-----3 ! +-----------------DUKHBADWP ! ! +--------------------XELHBA --1---------2 ! +-------------------------------SMRHBAA ! +--------------CHKHBA remember: this is an unrooted tree! Ln Likelihood = -3145.55232 Examined 15 trees Between And Length Approx. Confidence Limits ------- --- ------ ------- ---------- ------ 1 3 0.09292 ( 0.03404, 0.15218) ** 3 DAVAGL 0.26355 ( 0.19312, 0.33542) ** 3 DUKHBADWP 0.30752 ( 0.23199, 0.38496) ** 1 2 0.16329 ( 0.09148, 0.23605) ** 2 XELHBA 0.34539 ( 0.25789, 0.43510) ** 2 SMRHBAA 0.53168 ( 0.42197, 0.64816) ** 1 CHKHBA 0.25619 ( 0.18690, 0.32797) ** * = significantly positive, P < 0.05 ** = significantly positive, P < 0.01 lovelace$ more treefile ((DAVAGL:0.26355,DUKHBADWP:0.30752):0.09292,(XELHBA:0.34539, SMRHBAA:0.53168):0.16329,CHKHBA:0.25619); lovelace$ more fmt.phy 5 340 ECFMT_2 MSESLRIIFA GTPDFAARHL DALLS-SGHN VVGVFTQPDR PAGRGKKLMP HI32745_2 -MKSLNIIFA GTPDFAAQHL QAILN-SQHN VIAVYTQPDK PAGRGKKLQA TTDEFFMT_3 ----MRVAFF GTPLWAVPVL DALR--KRHQ VVLVVSQPDK PQGRGLRPAP MG39721_2 ---MFKIVFF GTSTLSKKCL EQLFYDNDFE ICAVVTQPDK INHRNNKIVP SSCPNC ---MMKTVFF GTPDFAVPTL EALLGHPDID VLAVVSQPDR RRGRGSKLIP SPVKVLAEEK GLPVFQP-VS LRPQENQQLV AELQADVMVV VAYGLILPKA SPVKQLAEQN NIPVYQP-KS LRKEEAQSEL KALNADVMVV VAYGLILPKA SPVARYAEAE GLPLLRP-AR LREEAFLEAL RQAAPEVAVV AAYGKLIPKE SDVKSFCLEK NITFFQP--K QS-ISIKADL EKLKADIGIC VSFGQYLHQD SPVKEVAVQA GIPVWQPERV KRCQETLAKL KNCQADFFVV VAYGQLLSPE lovelace$ seqboot lovelace$ cp fmt.phy infile lovelace$ seqboot Random number seed (must be odd)? 11 Bootstrapped sequences algorithm, version 3.55c Settings for this run: D Sequence, Morph, Rest., Gene Freqs? Molecular sequences J Bootstrap, Jackknife, or Permute? Bootstrap R How many replicates? 100 I Input sequences interleaved? Yes 0 Terminal type (IBM PC, VT52, ANSI)? ANSI 1 Print out the data at start of run No 2 Print indications of progress of run Yes Are these settings correct? (type Y or the letter for one to change) Are these settings correct? (type Y or the letter for one to change) R Number of replicates? 10 .. completed replicate number 1 completed replicate number 2 completed replicate number 3 completed replicate number 4 completed replicate number 5 Output written to output file lovelace$ mv outfile infile lovelace$ protdist Protein distance algorithm, version 3.55c Settings for this run: P Use PAM, Kimura or categories model? Dayhoff PAM matrix M Analyze multiple data sets? No I Input sequences interleaved? Yes 0 Terminal type (IBM PC, VT52, ANSI)? ANSI 1 Print out the data at start of run No 2 Print indications of progress of run Yes Are these settings correct? (type Y or the letter for one to change) M How many data 10 Y Computing distances: ECFMT_2 HI32745_2 . TTDEFFMT_3 .. MG39721_2 ... SSCPNC .... Output written to output file Data set # 2: Computing distances: ECFMT_2 HI32745_2 . TTDEFFMT_3 .. MG39721_2 ... SSCPNC .... Output written to output file Data set # 3: Computing distances: ECFMT_2 ... Data set # 5: Computing distances: ECFMT_2 HI32745_2 . TTDEFFMT_3 .. MG39721_2 ... SSCPNC .... Output written to output file lovelace$ mv outfile infile lovelace$ neighbor Neighbor-Joining/UPGMA method version 3.5 Settings for this run: N Neighbor-joining or UPGMA tree? Neighbor-joining O Outgroup root? No, use as outgroup species 1 L Lower-triangular data matrix? No R Upper-triangular data matrix? No S Subreplicates? No J Randomize input order of species? No. Use input order M Analyze multiple data sets? No 0 Terminal type (IBM PC, VT52, ANSI)? ANSI 1 Print out the data at start of run No 2 Print indications of progress of run Yes 3 Print out tree Yes 4 Write out trees onto tree file? Yes Are these settings correct? (type Y or the letter for one to change) M How many data sets? 10 ... Output written on output file Tree written on tree file Data set # 10: CYCLE 2: OTU 1 ( 0.15957) JOINS OTU 2 ( 0.31701) CYCLE 1: NODE 1 ( 0.29776) JOINS OTU 3 ( 0.57794) LAST CYCLE: NODE 1 ( 0.11937) JOINS OTU 4 ( 1.38576) JOINS OTU 5 ( 0.68429) Output written on output file Tree written on tree file lovelace$ mv treefile infile lovelace$ consense Majority-rule and strict consensus tree program, version 3.55c Settings for this run: O Outgroup root? No, use as outgroup species 1 R Trees to be treated as Rooted? No 0 Terminal type (IBM PC, VT52, ANSI)? ANSI 1 Print out the sets of species Yes 2 Print indications of progress of run Yes 3 Print out tree Yes 4 Write out trees onto tree file? Yes Are these settings correct? (type Y or the letter for one to change) Y Output written to output file Tree also written onto file lovelace$ more outfile Majority-rule and strict consensus tree program, version 3.55c Species in order: HI32745 2 TTDEFFMT 3 MG39721 2 SSCPNC ECFMT 2 Sets included in the consensus tree Set (species in order) How many times out of 10.00 .***. 10.00 ..**. 8.00 Sets NOT included in consensus tree: Set (species in order) How many times out of 10.00 .**.. 2.00 CONSENSUS TREE: the numbers at the forks indicate the number of times the group consisting of the species which are to the right of that fork occurred among the trees, out of 10.00 trees +---------TTDEFFMT 3 +-10.0 ! ! +----SSCPNC +--9.0 +--8.0 ! ! +----MG39721 2 ! ! ! +--------------ECFMT 2 ! +-------------------HI32745 2 remember: this is an unrooted tree! lovelace$ more treefile (((TTDEFFMT_3:10.0,(SSCPNC:10.0,MG39721_2:10.0):8.0):10.0,ECFMT_2:10.0):9.0, HI32745_2:10.0); lovelace$ |