Given a library of L sequences, where each sequence is chosen at
random from a set of V equiprobable variants, we wish to calculate
the expected number of distinct (i.e. unique) sequences represented in
the library. Alternatively, given a set of V equiprobable variants,
we wish to calculate the library size L necessary to obtain a given
percentage completeness, or to have a given probability of being 100% complete.
(Typically assuming V >> 1, e.g. V > 10.)
Example ( show)
For the default values on the web server, there are a total of one million
possible variants. This is roughly equivalent to an oligonucleotide directed
randomization experiment involving four NNS codons (which gives V =
324 = 1048576).
In the first panel, the experimenter has constructed a library of three
million transformants and wishes to estimate how many of the one million
possible variants are represented in the library. The answer is ~95.02%
or 950200 variants.
In the second panel, s/he knows that there are one million possible variants,
and wants to know how big her/his library should be in order to ensure
that ~95% of them (i.e. 950000 variants) are represented. The answer is
that a library of ~2.996 million transformants is required.
In the ideal situation her/his library would contain all one million variants.
In the third panel s/he calculates the library size required in order to
be 95% sure of complete representation. The answer is ~16.79 million transformants.
Programme to find the expected amino acid completeness of a given library
(not counting any variants with introduced stop codons) where the sequences
in the library are chosen at random from a set of XYZ codon variants (where
X,Y,Z are standard
nucleotide codes chosen by the user, e.g. XYZ
= NNS [N = A/C/G/T; S = C/G; 32 possible equiprobable codon variants; 20
+ 1 possible non-equiprobable amino acid/stop codon variants]). Up to six
codons may be independently varied.
To calculate the library size required in order to obtain a given completeness,
or to have a given probability of being 100% complete, just try entering
different library sizes and check the resulting library statistics until
you home in on your required values.
Links to academic resources and academic social media presence of the authors