Rhomboid proteasesΒΆ

Load the dataset

The loaded MSA has 2767 sequences and 135       positions.
After filtering, we have 135 remaining positions.
After filtering, we have 2767 remaining sequences.
computing weight of seq 1/2767
computing weight of seq 101/2767
computing weight of seq 201/2767
computing weight of seq 301/2767
computing weight of seq 401/2767
computing weight of seq 501/2767
computing weight of seq 601/2767
computing weight of seq 701/2767
computing weight of seq 801/2767
computing weight of seq 901/2767
computing weight of seq 1001/2767
computing weight of seq 1101/2767
computing weight of seq 1201/2767
computing weight of seq 1301/2767
computing weight of seq 1401/2767
computing weight of seq 1501/2767
computing weight of seq 1601/2767
computing weight of seq 1701/2767
computing weight of seq 1801/2767
computing weight of seq 1901/2767
computing weight of seq 2001/2767
computing weight of seq 2101/2767
computing weight of seq 2201/2767
computing weight of seq 2301/2767
computing weight of seq 2401/2767
computing weight of seq 2501/2767
computing weight of seq 2601/2767
computing weight of seq 2701/2767
Number of effective sequences 1792

import numpy as np
from cocoatree.datasets import load_rhomboid_proteases
import cocoatree.msa as c_msa
import cocoatree.statistics.position as c_pos


dataset = load_rhomboid_proteases()

loaded_seqs = dataset["alignment"]
loaded_seqs_id = dataset["sequence_ids"]
n_loaded_pos, n_loaded_seqs = len(loaded_seqs[0]), len(loaded_seqs)

print(f"The loaded MSA has {n_loaded_seqs} sequences and {n_loaded_pos} \
      positions.")

sequences, sequences_id, positions = c_msa.filter_sequences(
    loaded_seqs, loaded_seqs_id, gap_threshold=0.4, seq_threshold=0.2)
n_pos = len(positions)
print(f"After filtering, we have {n_pos} remaining positions.")
print(f"After filtering, we have {len(sequences)} remaining sequences.")

seq_weights, m_eff = c_pos.compute_seq_weights(sequences)
print('Number of effective sequences %d' %
      np.round(m_eff))

Total running time of the script: (0 minutes 1.020 seconds)

Gallery generated by Sphinx-Gallery