Bracing against the wind  

Friday, October 14, 2011

Non "allelic" variation - thinking out loud

(Later Note: I wrote this before I knew what it's called. The term most people use is "Somatic Mosaicism". Apparently this is a pretty well researched topic... so I can go back to all the biologists that looked at me like I was crazy and tell them .... hmm.)

Here's a link to a good article on the topic:

Original rant below...

Much of genetics is concerned with "alleles" and "variations". That is, an organism is assumed to be comprised of "one kind of dna". That DNA has 1) inherited alleles from it's parents, and 2) de-novo alleles (also called somatic mutations). The 1000 genomes project estimates at about 30 or so per person. This is probably a very conservative number considering the nature of the 1000 genomes project. IE: all of these mutations are whole-organism, detectible, validated mutations. E-coli error rate estimates would put the range at 30-300, and this might be a better estimate because of how the study was done. Human blastoma cells have has error estimates accurately measured at 10 times that rate. Individual organs may be less sensitive to immune response correction.

But let's assume 30 is our number. It's nice and small. And it's good to have a lower-bound.

That is only the set of variants that went into the "first cell" (fertilized egg) of an organism.

When that egg divides, half the organism has another, different, set of mutations. So 100% of the organism has 30 de-novo mutations and 50% of the organism will have *another* 30 de-novo mutations (30 new ones in that 50%, plus 30 original).

But wait, there's more. When those 2 cells divide in half again, you now get 60 new mutations, 30 from each cell. These 60 will be detectible at the "25%" level ... IE: 25% of the final organism will have them.

High-throughput sequencing can readily detect variation at the "1%" admixture level. That is, commonly detect variation when as little as 1% of the cells have that variation.

So how much variation can we expect, based on a low de-novo mutation rate, detectible at the 1% level?

100%->30, 50% ->30, 25% ->60, 12.5% ->120, 7.25% ->240, 3.12% ->480, 1.6% -> 960

So we can expect about 1000 de-novo variants in a healthy individual, or 32 times the mutation rate. But what if the somatic mutation rate is higher, say, 3000 variants per replication? This may be the case in some organ development.

Thus, at the 1% level, would that be 96000 non-inherited detectable variants. I would call that my "upper bound". In real pileup data... I see around 30% "non-allelic" variation. So if, say, you've got 15000 SNPS (a reasonable number), we would expect 5000 "background" snps.....putting the mutation rate at "156.25" (5000/32). That's smack in the middle of the e-coli based estimate.

Lots of variant callers filter these out.... but I'm interested in them... i think they may be a lot more important than people think.

[View/Post Comments] [Digg] [] [Stumble]

Home | Email me when this weblog updates: | View Archive

(C) 2002 Erik Aronesty/DocumentRoot.Com. Right to copy, without attribution, is given freely to anyone for any reason.

Listed on BlogShares | Bloghop: the best pretty good | Blogarama | Technorati | Blogwise