cocoatree.msa.compute_seq_weights

cocoatree.msa.compute_seq_weights(sequences, threshold=0.8, verbose_every=0, n_jobs=1, verbose_parallel=5)[source]

Compute sequence weights

Each sequence s is given a weight ws = 1/Ns where Ns is the number of sequences with an identity to s above a specified threshold.

Parameters

sequences : list of sequences

thresholdfloat, optional, default: 0.8

percentage identity above which the sequences are considered identical (default=0.8)

verbose_everyint

if > 0, verbose every {verbose_every} sequences

n_jobsint, default=1 (no parallelization)

the maximum number of concurrently running jobs (see joblib doc)

verbose_parallelint

verbosity level for parallelization (see joblib doc)

Returns

weights : np.array (nseq, ) of each sequence weight

m_efffloat

number of effective sequences