cocoatree.msa.compute_seq_similarity

cocoatree.msa.compute_seq_similarity(sequences, subst_matrix='BLOSUM62', gap_penalty=-4, n_jobs=1, verbose_parallel=0)[source]

Computes a similarity matrix using a precalculated substitution matrix.

The similarity score for a pair of sequences is obtained as the sum of the substitution scores at each position of the sequence pair.

Parameters

sequenceslist of str,

list of Nseq MSA sequences.

subst_matrixstr, default=’BLOSUM62’

name of the substitution matrix. Type Bio.Align.substitution_matrices.load() to obtain a list of available substitution matrices.

gap_penaltyint, default=-4

penalty score for gaps. You can adjust this parameter to reflect biological assumptions (e.g., -1 for mild, -10 for harsh).

n_jobsint, default=1 (no parallelization)

the maximum number of concurrently running jobs (-1 uses all available cores)

verbose_parallelint, default=0

verbosity level for parallelization (see joblib doc)

Returns

similarity_matrixnp.ndarray,

a (Nseq, Nseq) array of similarity scores.

Examples using cocoatree.msa.compute_seq_similarity

Plot a similarity heatmap of a XCoR along the phylogenetic tree

Plot a similarity heatmap of a XCoR along the phylogenetic tree