Shotgun sequencing: How many clones?

Outline, <--- Previous Page, Next Page --->

How many clones needed to to insure you have all the human genome represented by at least one clone?

Consider a specific 200kB fragment

Consider the probability that this fragment is not represented in any clone:

let f = K/G.

If we select a fragment at random from the clone library the probability of it not being the correct fragment is 1-f

If we select N clones the probability of the fragment not being in any of the N clones is (1-f)N

Let P be the probability that the fragment is represented at least once in N clones. The probability that the fragment is not represented in N clones is then

1 - P = (1-f)N

N = log(1 -P)/Log(1-f)

Suppose you want to insure that a specific fragment will be represented at least once in a human clone library with a probability of 0.99.

By substitution of P =0.99 and f = 6.666... * 10-5

We find that N = 69,075 clones

But the BAC fragments are still too big for DNA sequencing so mechanical shearing is used on the whole genome to produce smaller fragments say of 10kB.

The number of clones to include a specific fragment with 99% coverage is about 140,000

A number of runs with different sized fragments down to 2kB were used.

Took 20,000 CPU hours to sequence the human genome on a super computer.

Outline, <--- Previous Page, Next Page --->

The Entangled Bank