cocoatree.statistics.compute_all_frequencies

cocoatree.statistics.compute_all_frequencies(sequences, seq_weights=None, freq_regul=0.03)[source]

Compute frequencies on sequences

Parameters

sequences : list of sequences

seq_weights{None, np.ndarray (n_seq)}

if None, will re-compute the sequence weights.

freq_regul : regularization parameter (default=__freq_regularization_ref)

Returns

aa_freqsnp.ndarray (nseq, 21)

A (nseq, 21) ndarray containing the amino acid frequencies at each positions.

bkgd_freqsnp.ndarray (21, )

A (21,) np.array containing the background amino acid frequencies at each position; it is computed from the mean frequency of amino acid a in all proteins in the NCBI non-redundant database (see Rivoire et al., https://dx.plos.org/10.1371/journal.pcbi.1004817)

aa_joint_freqsnp.ndarray (nseq, nseq, 21, 21)

An ndarray containing the pairwise joint frequencies of amino acids for each pair of positions in the list of provided sequences.