cocoatree.msa.filter_ref_seq

cocoatree.msa.filter_ref_seq(sequences, sequences_id, delta=0.2, refseq_id=None, verbose=False)[source]

Filter the alignment based on identity with a reference sequence

Remove sequences r with Sr < delta, where Sr is the fractional identity between the sequence r and a specified reference sequence.

Parameters

sequences : list of sequences in the MSA

sequences_id : list of sequence identifiers in the MSA

delta : identity threshold (default=0.2)

refseq_ididentifier of the reference sequence, if ‘None’, a reference

sequence is choosen as the sequence that has the mean pairwise sequence identity closest to that of the entire sequence alignment (default ‘None’)

Returns

filt_seqs : filtered list of sequences

filt_seqs_id : corresponding list of sequence identifiers