cocoatree.msa.compute_normalized_seq_similarity

cocoatree.msa.compute_normalized_seq_similarity(sequences, subst_matrix='BLOSUM62', gap_penalty=-4, n_jobs=1, verbose_parallel=0)[source]

Computes a normalized similarity matrix using a precalculated substitution matrix.

Each pairwise similarity score is normalized by the maximum possible score for the pair of sequences (i.e., the score we would obtain by comparing the sequence to itself).

Parameters:

sequenceslist of str,

list of Nseq MSA sequences.

subst_matrixstr, default=’BLOSUM62’

name of the substitution matrix. Type Bio.Align.substitution_matrices.load() to obtain a list of available substitution matrices.

gap_penaltyint, default=-4

Penalty score for gaps. You can adjust this parameter to reflect biological assumptions (e.g., -1 for mild, -10 for harsh).

n_jobsint, default=1 (no parallelization)

the maximum number of concurrently running jobs (-1 uses all available cores)

verbose_parallelint, default=0

verbosity level for parallelization (see joblib doc)

Returns:

similarity_matrixnp.ndarray,

a (Nseq, Nseq) array of normalized similarity scores (0.0 to 1.0).

Examples using cocoatree.msa.compute_normalized_seq_similarity

Plot a similarity heatmap of a XCoR along the phylogenetic tree

Plot a similarity heatmap of a XCoR along the phylogenetic tree