Speeding Up the Hunt for Causal Trait Mutations in Cattle and Other Agrigenomic Species

Presenter: Dr. Matthew McClure, USDA-ARS

Date: December 7, 2011

Presentation

Josh Forsythe: Well good morning everyone.  Thank you all so much for joining us today.  We've had over 180 people sign up for this webcast so it should be great today. 

My name is Josh Forsythe; I'm the vice president of sales and marketing here at Golden Helix.  For the past 12 years Golden Helix has been accelerating life science research by providing analytic software to services for a range of genetic applications including genome-wide association studies, copy number analysis, next gen sequencing, predictive modeling for diagnostics and more.

I'm grateful today to be joined by one of our customers, Dr. Matthew McClure of the U.S. Department of Agriculture's Agricultural Research Service.  Matt will be presenting his latest research today on identifying causal mutations in agrigenomic species, more specifically in Weaver syndrome for cattle. 

A little more background on Matt and ARS before we start.  ARS is the principle in-house research agency of the USDA with a workforce of over 6,500 employees, 2,200 of which are scientists working on nearly 850 research projects at over 100 locations across the U.S. in four overseas laboratories.

Dr. McClure is a member of the Bovine Functional Genomics Laboratory of the USDA ARS whose mission is to improve the genetic and productive efficiency of cattle through research on gene expression and marker-assisted genetic selection.  A little bit more on Matt:  Matt received his PhD in genetics from the University of Missouri Columbia.  Most of his research has a producer application side to it with a focus on identifying ways to apply cutting edge genomics in a manner that benefits agricultural producers and ultimately consumers.  His research has ranged from showing that TNA from FDA cards is high enough quality for SNP chips, to identifying new haplotypes to predict meat tenderness to today's talk of trying to identify the causative mutation for Weaver syndrome so producers have an improved genetic test they can use to select breeding stock.

Before I hand the presentation over to Matt I'd like to point out the question and answer pane on your right.  If you have any questions during the presentation please feel free to enter them there.  At the end of the presentation we will answer as many questions as we can in the time allotted. If you're experiencing technical difficulty please answer or submit your questions via the Q&A pane as well.

One question that always comes up is will we be sending out the slides and link to the recording.  The answer is yes barring any technical problems with the recording; we usually get that out within a day or two.  With that I'll hand the presentation over to Matt.

Matthew McClure: All right, thank you very much, Josh.  So as he mentioned I am a research geneticist specifically a post-doc with the United States Department of Agriculture Agricultural Research Service.  And today I will talk to you about how with all the advancements in genetics how we've been able to speed up the hunt for both causal mutations in cattle and how it's being applied to the species.

So ARS is the chief scientific research agency of the Department of Agriculture.  We have over 800 research projects, over 8,000 employees and we work all across the United States and in 100 different research locations.  We have four overseas locations.  Before the ARS pretty much all our work is farm to table research.  So all of it has applications both to producers and ultimately to consumers. 

I am located at the Beltsville, Maryland, at the Henry A. Wallace Beltsville Agricultural Research Center.  The BARC, as it's called, it is the largest and most diversified agricultural research center in the world.  We have over 6,000 acres and just under 2,000 employees here.  Of our employees over 300 of them are permanent PhD scientists; we have over 100 post-docs and visiting scientists from around the world

I am part of the Bovine Functional Genomics Lab here in Building 200; we have over 12 PhD scientists and our research covers everything from bioinformatics in cattle to mastitis genomic selection to parasite resistance.  Pretty much any way that we can us genomics to help the dairy and beef producers that's what we're interested in.

Just to give people that are not familiar with the USDA some background here are some of the highlights from some of our research.  So about 150 years ago the USDA was created and in the 1900’s the first live vaccine for cholera for hogs was developed here.  Starting in 1910 the ARS started to develop studies to improve dairy cattle.  We also domesticated the wild blueberry.  So if anyone has blueberry on their cereal in the morning we were part of that.

In the 1920’s ARS scientists helped found population genetics in animal breeding.  In the 1930s the Line 1 Hereford program was started.  So the animal L1 dominant that was used for sequencing of the cattle genome here pedigree dates back to this Line 1 Hereford program.  In the Sixties our scientists helped to discover the molecular structure of t-RNA.  In the 1990’s we helped develop genetic maps of blueberry, cattle, swine, chickens.  And in the 2000’s we helped to develop the dairy cattle genomic selection and we've also been sequencing lots of genomes: sequencing cattle, chicken, swine, soybeans, turkey, the list keeps growing.

So today I'm going to talk to you about how the technology advancements that have been used in genomics, what they are and how that can be applied to cattle.  So some of the advancements in genomics are computing, genetic maps, sequencing, advanced genotyping and eventually an analysis.

So in computing almost 100 years ago the way we did computing in genomic research was by hand.  Here at the USDA we actually had hundreds of people employed where by hand they calculated how good a dairy bull was.  Luckily today we have moved on to using blades and servers for that research.

For genetic mapping, especially in cattle we have been able to move from having the linkage maps and marker maps of the Nineties onto having radiation hybrid maps and then eventually we have the sequence of the animal.

Genotyping sequencing -- there's still research done today where we use RFPLs for genotyping.  We also have research where we're using microsatellites.  This takes a lot of time for my PhD study; we ran hundreds of microsatellites for thousands of animals and it took quite a few years.  Today we can use the high density SNP chips, which are used by scientists around the world that can genotype thousands of animals in a month for thousands of genotypes.



We also are lucky to have advancement sequencing.  It used to be that you could have your entire PhD project based upon sequencing one gene. With the advancement Sanger sequencing and now with the next generation technology, you can sequence an individual's entire genome in a week.

And finally analysis.  With all these other advancements in computing and the genotyping sequencing we are really lucky that there have been advancements in the analysis from the algorithms to the models to the software.  And this is where Golden Helix comes into play for us.  And all this analysis, what it is doing of course is just taking all this data and then filtering it into usable results.

So a classical example is identifying a QTL for a trait, putting the QTL on the map, eventually sequencing and then finding the causal mutation that's either causing a disease or influencing traits, anything from cattle to corn to humans.

So how has all this been used in cattle?  Where is some of the advancements in cattle genetics?  All of these advancements have helped quantitative genetics.  All these advancements have helped quantitative genetics, phylogenetics, genomic selection and identification of causal mutations in cattle.

If we look at the number of cattle QTLs that have been published over the years we can see a large spike in the past couple years.  If we overlay some of this technology we can see that a lot of this increase has come about because of the advancements in our genotyping and sequence maps. 

We've also been able to use the advancements in genotyping for phylogenetics.  Jared Decker at the University of Missouri used the Bovine 50K panel and he was able to improve the phylogenetics of the peccary family and not only was able to improve the phylogenetics across species, he was able also able to improve the phylogenetics across breeds.  So he actually was able to show how the different breeds are related genetically.  And what was nice with the phylogenetics is this means that you can take an animal and genotype it, and if you believe that there is an admix breed you can actually pull out what cattle breeds were in its ancestry.

Finally genetic selection.  So for the genetic selection in cattle the idea is how can we find the best cattle, either the best dairy bull that we will have the best mill qualities or the best beef animal that will produce the best steaks for us.  Traditionally the way that this was done for the past 30, 40 years has been used in qualitative testing.  So for the dairy industry you want to know how well his daughters will produce milk.

So what this means is that when a bull calf was born you had to wait five years, you had to feed it this entire time and wait until we was sexually mature so that you could breed him to a random group of females, and then you had to wait for those females to give birth.  You had to select the females from that, wait two years until they're sexually mature, breed them and then finally start collecting milk phenotypes.  So this took five years and about $50,000 per animal. 

Because of genetic testing with genomic selection with advances that have been performed by our lab and some other labs in the country you can now genotype and animal; in about a month and a half, you can have estimates to see how well his progeny will be.  And it will cost you less than $300.

The advantages of this genetic testing is that the moment an animal is born you can take an ear punch and you can send it out for testing and you automatically get information as to how well his offspring will be.  The best way to explain this is that the information you get back, it's like you're gaining the information on his future daughters.  So at birth for a net marriage the amount of information you gain it's like he automatically had 12 daughters that were producing milk.

So this genomic selection of using the SNP chips was started about three years ago in the dairy industry.  Pretty much today any top dairy bull is genetically tested for the SNPs and has genomic selection on it and the benefits is instead of paying $50,000 today you can use the 50K chip, the high-density SNP chip or the new 6K SNP chip and you can spend anywhere from $50 on up to $250.

One of the benefits is not only are you getting that information out of the daughter equivalents but using this genetic test you're also increasing the reliability of the accuracy of the test.  So if an animal is born you take the average of a trait, of his parents, the reliability is 42%.  If you take that animal and you take the parent average and you do a 50K SNP chip on it the reliability for any trait goes up to 72%.  So automatically not only are you increasing the number of daughter equivalents data you're increasing the reliability for any of the phenotypes.

And finally all this genetic advancements has also helped us find the causal mutation for diseases.  So looking at 1985 the first causal mutation for genetics used in cattle, inherited goiter was found.  It took five years later for the next one and pretty much another six years before I really took off.  Once we had the direct assembly and then started having all the SNP chips the rate of identification of these causal mutations has really taken off.  And this has really helped in my research today on the Weaver cattle.

So with that I'll get into my current research on looking for the causal mutation of Weaver syndrome in Brown Swiss cattle.  To first of all start I'll give everybody a little bit of background on the disease, talk about the study design and what our current results are.

So Weaver syndrome, or progressive degenerative myelo-encephalophy is a inherited recessive disorder in a Brown Swiss breed.  There is destruction of the spinal cord and changes in the brain. This was first identified in the U.S. cattle herd of Brown Swiss in the 1920’s.  So the clinical diagnosis of Brown Swiss cattle basically adds six to 18 months of age.  They start having onset of -- they have weakness in their back legs.  It's been described as though they have trouble finding their legs.

As it progresses over the next year to two years they have more and more trouble walking and eventually they can't get up.  So eventually about two to three years of age they actually die because they can't eat enough food.

One of the unique things is that the rest of their motor skills and sensory reflexes are fine.  The symptoms from the disease -- there's actually a degeneration of the nerve passages to the spinal cord in the brain and from the best that we can tell this degeneration prevents nerve impulses from their brain to the back leg muscles.

The nerve degeneration is actually comparable to Lou Gehrig's disease.  So there's been hope that eventually the advancements in the Lou Gehrig's disease research would help Weaver syndrome.



This disease has actually been studied for quite a while.  Michel Georges actually mapped it back in 1993. So he actually mapped it to chromosome four in cattle and he actually was able to develop a diagnostic test.  He found out that the microsatellite marker TLGA116 was in linkage disequilibrium with the causal mutation and he estimated at that time in '93 that the microsatellite marker was three centimeters away on the causal mutation.

About five years later the locus was redefined and it's actually down to a 10 megabase region now which are at microsatellite marker BMS2646 and MAF 50.  And the genetic test that has been developed, animals are actually called at the 90 percent confidence level so they're called either a carrier or a non-carrier.  We actually still identify animals if they have infected offspring as a carrier as well.

Now the issue with this is that it's not perfect but it works well.  So this is an example of you don't have to have the gene; you don't have to have the causal mutation to have an impact.  This microsatellite marker test really helped clean up the incidence of Weaver syndrome in the Brown Swiss breed.  So even though it still wasn't perfect it was extremely helpful and used in the industry.

But there have been some historical problems with Weaver.  First off it's an imperfect commercial test.  Because as best as they could tell it was a microsatellite marker test, it was a little bit away from the causal mutation you will have recombinations.  And because of the recombination events you will have false positives and false negatives.

One of the other problems is historically few if any females were tested.  The males were tested, especially the males that would be used for artificial insemination.  But because very few females were tested that meant that the causal mutation could still hide out in the herds. 

Now the other problem is we had the potential for unaffected calves to not be identified.  So as I said previously, the first symptoms of this disease aren't really shown until, you know, eight to fourteen months of age.  Well if you look at the cattle industry if a calf is born for the dairy industry, if he's male, unless he's from a really good line there's a good chance he's either going to be sold as a veal calf or he's going to be sold for beef.  Now if he's sold as a veal calf he's going to be harvested before eight months of age.  If he's sold as beef, well he's sold through a sale barn and he eventually goes to a feed lot. 

Currently in the United States cattle are harvested about 14, 15 months of age.  So there's a potential that an affected calf could be slaughtered before he even shows the symptoms.  And if he's in a field lot and he begins to show the symptoms then he's going to show that he doesn't walk as well, he's having trouble getting around -- there's a good chance that the feed lot operator, looking across his thousands of animals, he's not going to say, "Oh this calf has Weaver syndrome."  He's going to think the calf got his, he's got a bruise, something's wrong, and either he'll try to give him some medication or he'll just decide, "Let's just go ahead and harvest this animal earlier."

Even if he thinks it is a Weaver calf, because of the size and the scope of the beef industry it's going to be really hard for that feedlot operator to tell the dairy farm, "Hey, I've got a calf from you and it has Weaver syndrome."  So besides all those problems where we have a test but it doesn't work well and we still have the allele hiding out in populations we have some different problems currently. 

I talked to Dave Kendall from the U.S. Brown Swiss Association and he said, they don't use this microsatellite test anymore.  Because it was developed so long ago and there's been so much recombination their viewpoint is it's not effective.  You have no idea if you have a false positive or false negative and it's pretty much not useful.  Also there's been almost no research or very little research on this disease for the past 20 years. 

The reason for this is twofold:  one, because of that microsatellite test it was kind of viewed as if the problem was under control.  So why do you research on a disease where we already have a test?  The other thing is that there were other diseases that came out.  There's SMA in Brown Swiss where it was a new hot disease.  So that's where the research money went.  So historically even though we knew that the Weaver syndrome wasn't solved, there’s very little research on it. 

And currently the Brown Swiss Association views Weaver syndrome that there's a potential that it's a ticking time bomb.  There's over ten million head of Brown Swiss in the world and there's a viewpoint that the Weaver syndrome, the allele frequency for it is increasing.  So we're not really seeing an increase of infected animals currently but there's a fear that in a couple years we're going to start seeing a whole lot of infected animals and suddenly they'll be back to where they were again in the 1960s and '70s.

Another problem is that Brown Swiss have been used to develop and upgrade other breeds.  So any of those breeds that has used Brown Swiss genetics there's a chance that the Weaver allele is now in those breeds.  There's actually a paper in 2008 where there was a Gir calf in Brazil that had Weaver-like syndrome.  Dave Kendall let me know that in the 1980s and '90s there was a lot of Brown Swiss semen shipped to Brazil.  There were some Gir farms that used the Brown Swiss; they were hoping to improve the milk traits of it.  And so there's a chance that in the Gir breed populations that there is the Weaver syndrome, and it's just lying underneath the horizon.

So because of all these problems the Brown Swiss Association asked us if we could relook at the Weaver syndrome and try to find the causal mutation.  We said we would be more than happy to.  The way we approached it is that initially we took some high density SNP genotyping.  So we used the Illumina bovine HD SNP, it has just under 800,000 SNPs, and we actually got the HD genotypes on 20 Weaver carrier animals and 51 "normal" Brown Swiss animals.  We call these normal because some of them weren't genetically tested and so we think they're normal but we're not sure.

We're actually lucky, some of these genotypes came from animals that were already tested for the genomic selection and some of them we went out and purchased the semen ourselves and did the genotyping.



Once we had the genotyping we did some quality control on it.  We want to make sure all the SNPs had a minor allele frequency above 0.05 and we also selected on call rate.  When we initially selected on a call rate of 95 percent we only lost one animal but later we went back and we looked at the GWAS results we decided to add back one animal. 

So when we looked at the GWAS the reason we used all 71 animals was that we got the highest peak on chromosome 4, which is to be expected; that's where it was before.  When we looked at the call rate on chromosome 4 every animal had above a 95 percent call rate.  Because we only had carriers and we had controls and we had normal we did a case control GWAS.

As I mentioned we got a really nice, strong peak; we've actually been able to refine the locus from what it was historically.  So historically it was a 10 megabase region with the microsatellite markers; we've now been able to narrow that down to about a 5 megabase region.  And as expected this 5 megabase region lies within the 10 megabase area.

So then we ask ourselves, well there's been a lot of dairy cattle; pretty much every dairy bull in the country has been genotyped on the Illumina 50k chip for genomic selection.  So we wondered well could we actually use animals that had 50k data, do a GWAS and get even better resolution?

So to answer that I actually went in and I just stripped out the high density SNPs and only left the SNPs that are on the 50k panel.  Luckily all the 50k SNPs are represented on the HD chips, so I went back and I looked at just those 71 animals again and found out if you only use 50k SNPs you actually get the same resolution as with the microsatellite markers.  So you're back to about a 10 megabase region.

The other problem is that your peak is actually moved to the side.  So with eight high density SNPs we actually have a peak that's kind of on the left side of the 10 megabase region but if you use all the 50k SNPs what is looks like is that your peak is more to the right.

We also wondered, after looking at SNPs in the GWAS we were wondering if this region of chromosome 4 is misassembled.  Pulled out LD on about 20 breeds and then I just pulled up and I said, "Okay, I want to look at just this region on chromosome 4.  So if we had had a misassembly our thought is that we would have a bunch of LD, so bright reds all the way down here on all the breeds.  If there was a misassembly, if there was a region that's flipped the LD should show that there's two regions that are far apart in high LD but the regions in between are not NLD.  We really did not see that on any other breeds so we have reasonable confidence that this region of chromosome 4 has been assembled correctly.

So now that we're confident that everything is assembled we know the area to look at, now we decide to move into trying to identify the causal variation.  So we want to know what mutation causes a normal Brown Swiss cattle to be a Weaver calf.  And for that we have to do sequencing.  So the question is how we're going to do sequencing.  We're lucky today to have a lot of options.  There's whole genome sequencing, there's exome sequencing, there's target sequencing. 

So we actually decided to use whole genome sequencing. At the time of this study we had just got an Illumina high sequencing but the indexing had not been commercially released yet.  So we decided to use a pooling technique.  So we took ten normal animals and ten Weaver carriers.  All the carrier animals were actually carrier identified from progeny testing.  So we didn't have to worry about are these false positive animals.

We made 300 base pair insertion libraries and we sequenced them on a -- we did a 200 x 300 paired-end sequencing on the Illumina HiSeq 2000 and then we had a couple of lanes open up on our Illumina GHUX.  So the Weaver carrier animals also had reads on that.

Steve Schroeder, one of our bioinformatists, took all that sequence data and he mapped it to the University of Maryland's 3.1 assembly.  He used BWA to map it and then he used what's called GATX for a local realignment.  The GATX is a program from the Broad Institute, and what it helps with is to have proper local alignments with indels, with insertions and deletions.  After he mapped it we had about a 25 to 30 X coverage of the entire genome and then we looked within that 5 megabase region.  Within that 5 megabase region we had over 53,000 sequence variations.

Most of these were single nucleotide variations; there's some insertions and some deletions.  So now we have 53,000 variations; we need to find the one mutation that actually causes Weaver.  So for that we have to filter.  Since we had a pool the way I filtered these was I said, "Okay, I need to have the minor allele in the normal pool needs to be at or above 9%.  And the alternative allele I can see once."  The reason I didn't say the minor allele frequency had to be zero, I couldn't see any of the alternative allele is I wanted to allow for any sequencing errors or potential misassembly on one or two reads.  I didn't want to be too conservative and potentially toss out the causal mutation.

For the Weaver pool, since this was the pool we said the minor allele frequency had to be at or below 70%, just in case when we did sequencing instead of capturing the causal mutation we were capturing the historic Brown Swiss allele most of the time.  So we allowed for that minor allele frequency to be a little bit higher than most people would have done.

And then we said, "Well Weaver syndrome is only in Brown Swiss cattle, so if we see an allele and it's present in other cattle breeds that can't be the cause of the mutation.”  And of course part of the logic behind this is that all the different cattle breeds they diverged a long time ago.  The Weaver mutation in the U.S., we assume it came about in the early 1900’s; all the cattle breeds today they have been diverged for 200 to1,000 years.  So we actually took animals that had been sequenced for some discovery projects that we’ve done - some dairy breeds, some beef breeds, and some bos indicus breeds and we said if I have an allele from the sequencing and it's found in the sequence from these other breeds toss it out.

And then finally we filtered against repeat regions.  From all of the studies the causal mutation is only located on chromosome 4; if there's a repeat region that's spread across the genome that is not going to be it, plus if it's a repeat region sad to say it's going to be really hard to develop a diagnostic test for it.

After indel filtering we identified 116 Weaver syndrome or Weaver carrier unique sequence variations.  So we went from 53,000 variations down to 116.  After that we passed it through a program called SNP F to estimate the effect of those variations on genes.

Through SNP F, of the 116 sequence variations it estimated that based upon the current University of Maryland 3.1 annotation available at that time 71 of those were intergenic, 16 were flanking, so potentially in the untranslated region of the gene and 3 were microsatellites.

Within a gene 22 of them were intronic, 1 was a synonymous amino acid change and 3 of them were predicted to have a non-synonymous amino acid change.  So from this we decided to hedge our bets and we decided you know what, most likely, given the history of the Weaver syndrome the mutation that's most likely to cause it is going to be a mutation that is within or close to a gene.



So we took all those ones within a gene and through a collaboration with GeneSeek we developed a multiplex SNP genotyping.  Sadly one of the intronic SNPs the reaction failed for it.  But we developed this multiplex assay and we decided to test Brown Swiss animals.

Throughout the past couple months we have been collecting Brown Swiss samples from across the U.S. and from Europe through some of our collaborations.  So we had over 800 Brown Swiss animals to test.  We had been lucky to identify two affected animals, a whole bunch more carriers and a whole bunch of normals.

We also wanted to make sure that this mutation is not in other animals, in other breeds, so we identified some non-Brown Swiss animals, we decided to test them as well.  So after we did all the genotyping we re-imported the genotypes into Golden Helix.  We updated our marker map and we did a combined GWAS study.  So the combined GWAS is a submittal graph and we got some great results we're up at a negative log 10 p of above 70.  That's just nice; it looks like we have everyone looking at the correct region.  We initially saw this, it's like, "Yes, we potentially have the causal mutation." 

But when we actually went back and looked at the actual genotypes.  So I know that's a little busy but we actually pulled out is that there's three SNPs that look promising.  So we're looking at the two animals that are Weaver affecteds:  WW, animals that are Weaver carriers from progeny testing, animals that are identified as Weaver carriers from the genetic testing, non-carriers from the genetic testing, non-carriers from genetic testing and then a non-carrier just because they hadn't been tested.

Everything looked great; these three SNPs they're homozygous for one allele in the affecteds, they're pretty much heterozygous in all the carriers, they're pretty much homozygous for the other allele in the non-carriers.  But there's a slight problem:  one of our carrier animals is homozygous and three of our normal are homozygous. 

So why is this?  Why is it we have four animals that are not supposed to be affected animals but they're homozygous?  Well we looked at there could be a couple explanations.  One of them:  there's just a genotyping error.  So we went back and we re-genotyped those animals and then we looked at the signal strength of those SNPs.

In turn we got the re-genotypes back:  they genotyped the same way.  They're homozygous for that A allele.  And looking at the signal strength everything looked good, so we believe these are true genotypes. 

So another possibility was that these animals are misidentified.  Either these are animals that the wrong DNA got put into the wrong well or these are animals that actually were affected but something happened and they were harvested or they died before the syndromes showed themselves.  Or it could have been that they had the disease but it wasn't reported.  This is a possibility and we're looking into it.

The other possibility is -- you know, we had 116 variations and we only tested those that were predicted to be in or close to this.  There is a good possibility that some of these intergenic genes could be the causal mutation.  It could be that there's actually a marker RNA there or it could be that there's actually another exon of a gene that's farther downstream.  It could be that there is a small, open reading frame. 

There's a paper that just came out in fruit flies where they improved the algorithm for predicting genes and for identifying the small open reading frames, and in fruit flies they identified there's 400 to 1,000 new small open reading frames.  So these are genes that encode proteins that are less than 100 amino acids.  There's a really good chance that one of these predicted intergenic genes is actually a mutation within a small gene that just hasn't been annotated yet.  So we actually are going -- we currently are developing multiplex assays for those intergenic SNPs and the microsatellite SNPs.  And we're going to retest over all those animals and hopefully one of those will be the causal mutation.  If it's 100% concurrent with the phenotypes, that's great; if it's not then we will see if we can develop a diagnostic haplotype.  And that haplotype can then be used as a genetic test for the farmers.

So the benefit of if we either identify the causal mutation or diagnostic haplotype is we will be able to present to producers a diagnostic test that they can use.  If we identify the causal mutation this can be used in all future generations and it can be used in other breeds.  And because of this a potential ticking time bomb will be removed.

The other benefit is that this will allow producers to manage their herds better.  They won't have to toss the baby out with the bath water.  If they have an animal that is a Weaver carrier they actually can then go out and test all their females.  They can identify females within their herd that are normal, that are non-carriers and use that animal, use that bull to mate with them.  Yes, this will still produce about 50% carriers as an offspring but if that bull is a really good producer for milk, for milk fat, for reproduction, that way you don't have to toss out all those good genetics just because of one bad gene.  Because of this you'll be able to manage it properly.

With that, that's the end of my talk.  I'd like to thank everyone that has helped with this project, both within the USDA, also with the Brown Swiss Associations, our European counterparts that have helped us get some of the Italian DNAs.  We'd like to thank GeneSeek for their help with the Sequenom testing.  Also like to thank Frank Nicholas from the University of Sydney for some advice and we'd really like to thank Golden Helix.  Throughout this presentation what I didn't really hit on was how quickly we were able to do the analysis.  I was able to do a GWAS in five minutes once I had all the data in there.  Once I had the other targeted genotypes I was able to quickly put all the genotypes into the program and rerun the analysis.  It's really been helpful having these advanced softwares help analyze the data quickly.  And with that I'll take any questions.



Josh Forsythe: All right.  Well thanks Matt, again, for the presentation.  As I said at the beginning, there is a question and answer pane on the Go To Meeting window -- should be on the right side of your screen unless you moved it.  Go ahead and enter your questions there.  We have been getting quite a few questions throughout the presentation so I'll start with those, but it looks like we've got another 15 minutes here so we can answer plenty of questions in time.

Matt: Okay.

Josh: First question for you, Matt, the person asks:  "Breeders experienced in the last 50 years that selection based on theory of population genetics is very effective.  Do you think that finding several causal mutations will increase genetic profit or profit rate in breeding programs?"

Matt: If it's causal mutations for diseases or causal mutations for phenotypes?  If it's causal mutations for disease it'll help their profit somewhat but that's mainly because they won't have these horrible diseases.  Case in point in the Angus Association there's a curly calf disease.  Jon Beever within a year identified the causal mutation.  And they are now able to properly manage that.  The benefit of the producers is they don't have calves that are born dead so that increases their profits.

As far as phenotypes, as far as milk yield or the size of your steak, to quote my boss Curt Van Tassel:  "We don't need no stinking genes."  We are actually able to make genomic selection without using the causal mutations.  We're able to use the SNPs and build prediction models that tell us how well an animal will perform, either for production phenotype or even a future for disease resistance.  In that case in the genomic selection that will improve a producer's paycheck, either because the animal will be able to have better milk yields or because the animal will be selected to have better feed efficiencies.

But the way we look at it is we're providing tools and data for the producers so we'll tell them how an animal will perform for certain phenotypes, then they are able to use that data to select what is best for their region and their management.

Josh: Okay.  A follow-on question to that is, "Most of the SNP effects on quantitative traits, for example milk yield, are extremely small.  Is there any reliable statistical method proving that certain SNP is significant in influencing the genetic variation of the trait?"

Matt: I'm not sure how to answer that.  I would say yes, it's been proved.  These have been used for the past three years in the dairy industry, the SNP effects.  If you look at any quantitative trait, from height to milk yield, my viewpoint is that there are hundreds of variations that affect it.  If you look at growth pretty much any gene in your body is going to affect growth.  If you can't eat as well, if you can't digest food as well you won't grow as well.

So what we have is we have a SNP that has an association with a trait.  So we can show that yes, it's there.  There are studies being done currently I know in the beef cattle and I believe in the dairy industry, showing the improvements of genomic selection, showing that a herd that has not had genomic selection and a group that has had genomic selection done on it there are multiyear studies going on right now, showing -- hopefully at the end showing that yes, the genomic selection really does work.  My viewpoint is it's good that we had these but I think they're going to be real.  These SNPs are going to be associated with some causal mutation, or a causal region.  We don't actually have to have the causal mutation for phenotype, for milk yield.  We can use SNPs that are associated with it that are in high LD.  So yes, they work.  There will be data that come out that shows it works.  But for quantitative traits I see no reason not to trust it.

Josh: Okay.  Here's a rather simple one:  "When you show the diagrams of SNP effects what values are presented on the Y scale?"  I believe that's with the GWAS plots.

Matt: Oh that is negative log 10 p for the -- yeah, so for the SNP effects if you're looking at the GWAS on the Y axis this is negative log 10 p.

Josh: Okay.  Question here is, "What about derived and ancestral alleles?"  That's the extent of the question.

Matt: Can you identify them?  Yes. So for the phylogenetic studies Jared Decker, he was able to identify the ancestral allele.  That's why he was able to use the phylogenetics.  I know that there are people using these SNP chips to identify admixture, to identify Bos Taurus integration.  And for that they identify the ancestral allele.  So if you take two breeds, if you take a water buffalo and bos Taurus you can identify which allele is the common one, which is the ancestral allele and from that you can identify which allele is the non-ancestral allele. 

And the benefit from that, then, if you have any population you think has admixture, let's say has a low level of bos Taurus inaggression and you want to remove that you actually then can identify and select animals that have the lowest level of inaggression.

Josh: Moving on here for the GWAS you did, how related to each other were your samples and did you do any correction for possible population stratification?  Is this capability built into the software?

Matt: For the GWAS?  Pretty much what it was at that time is that we had 71 animals with genotypes.  So yes, they're related.  Any dairy animal is going to be related.  When I did sequencing I selected the animals that were as unrelated as possible.  I did that to increase the amount of genetic variation.  But for the GWAS we had 71 animals and I used them all.

For the stratification I think Josh you can answer this better but if I recall correctly yes it can -- Golden Helix can take into effect stratification, is that correct?

Josh: Yes.  So there are a number of quality assurance and other statistical analysis beyond just the association tested in the software.  Population stratification is one of those, and we have a number of researchers who are doing not just human genetics but plant/animal genetics that do use the population stratification within the software to correct for that.

Another question here:  "Would you stick with the HD chip if you were to do the same project over again or would you do more animals on the 50k chip to potentially capture more historical recombination events?"

Matt: For causative disease mutation, or for disease -- so you had a talk with Jon Beever a couple months ago kind of addressing this.  So he's had a lot of success using the 50k chip.  And from my talks with him what I believe is that if you have a newer disease, to where the causal mutation appeared relatively recently, I think you can get some good results with the 50k chip.  If you're studying an older mutation, and it is at least 100 years old if not older, I think the high density chip gives you better resolution.

The 50k chip, as I showed with the slides up here -- it'll work.  It does show you the right region.  It gets you close.  But the problem is then you're looking at a much bigger region.  So for here with the 50k chip results I'd be looking at a 10 megabase region; I could potentially even be saying, "Well I think it's probably over to the side."  With the HD at least I can narrow that down.  So yes you can use the 50k; the HD's just going to give me better resolution.  So that just comes down to your time and money and your studies.



Josh: I just have a couple questions here on copy number analysis.  One person asks:  "Have you looked for copy number or other structural variants from both HD and sequencing data?"  Another person asks, "Human studies are examining the role of copy number variation in ALS or Lou Gehrig's disease.  What are your comments on looking for copy number variations?"

Matt: We did look at that a couple months ago.  So one of the scientists in our group, George Lee, his group is one of the leaders of copy number variation in cattle.  And we actually asked them, "Hey, do you see any copy number variations in this region?"  They took a good look at the data they said and said, "Nope."  So from what they saw there was not a CNV in this region in Brown Swiss.  They were looking at both 50k data and more from the sequencing stuff.  So they really didn't see anything in that.  That was a couple months ago so it may have more, but from what we've seen so far it's not CNV.

These next SNPs that we test, if none of them are 100 percent concordant, yes, that's something we'll come back and look at.  Unfortunately because we did pooled sequencing we're kind of eager to do the sequencing; we didn't wait a couple of months until the indexing came out.  Because of that we really can't look at a CNV as much because our sequences are not individual animals; it's a pool.  But yes, is we don't get anything, then CNV analysis will be something else to look at later on down the road.

Josh: Another question and I guess I can answer this one is, "Can the Golden Helix software be used for other species beyond humans in cattle?"  The answer to that one is yes, we do have a number of -- we can really support almost any genome; our software is really designed for diploid organizations but we do support a number of plant/animal as well as the human species and we do that all within the software.  If there is a particular genome that we can't support, automatically with the software there are ways to import the genome and curate the annotations as well if you're not studying cattle or humans or some of the other genomes that we do provide.  I believe in the USDA there are a number of people studying other, obviously then other species beyond cattle.  I'm not sure, Matt, has anybody in your group used it for other species beyond cattle at this point?

Matt: We actually have a visiting Brazilian scientist who is actually using it for sheep.  Two months ago she started -- at that time Golden Helix did not have the sheep genome up on their website.  Sent them the email and, I think in two days, they have the Illumina SNP chips for sheep, that map up, along with the annotation.  And we do have people on our web collaborators looking at goats.  So once the goat stuff comes out.  Yeah, pretty much my experience has been if your annotation is available and it's not up, just ask Golden Helix.  If you bought the software and are using it in a couple days they will have it up and running.

Josh: Okay we've got a couple minutes here left for the presentation.  I've got many, many more questions.  We're not going to be able to get to them all, maybe ask one more.  But we do have these questions on file so we'll make sure that we either send them to Matt to get them answered or we can answer some of them ourselves.  I guess the last question here for you Matt is, "Have you published these results yet?" somebody asks.

Matt: No, we have not.  We have not and we hope to.  As some of you may have noticed there were no gene IDs up there.  We do know there are other labs in the world that are still immersed in this so we intentionally did not put up some of the gene names.  We hope to publish this as soon as possible.  As soon as we get back to these next genotype results we'll decide from there.  We'll look at it and if we think we have a good enough story, we'll publish as soon as possible.

Josh: Actually so a follow on question to that is how "Would you leverage the commercial value from the work that you were doing?  Would you ordinarily patent your SNPs?  Would you wait to identify the causal mutations or would you use these other methods not including patenting?"

Matt: So what's been done with other people I know there are other groups that they have found causal mutations and they have kept them as company secrets, they’ll have genetic tests, somebody will patent it.  Our viewpoint is at the USDA is that we want the research to have the largest impact as quick as possible.  So that's -- right now if I recall correctly we're not looking at patenting it; we just want to get the test out.  We've had a lot of help from Brown Swiss Association and we'd like to pay that back to them. 

As far as for patenting causal mutations commercially there is a need for it; it does help the research, but in my viewpoint if there's a way to really help their producers whether they patent it or not, get that data out there for them, let them use it.  So that's kind of my answer.

Josh: So we've ran up to the end of the hour here and I want to thank you again, Matt, for the presentation today.  Again, there are a number of questions that we couldn't get to and we'll try to do those by email.  Of course, if you have any questions that you would like answered I believe Matt people can call you up if they wish and we're certainly willing here at Golden Helix to answer any questions you have on the software or perhaps what other people are doing research-wise with our software; we'd be happy to answer those questions as well.

So again, thanks, Matt.  Thanks everybody for coming and hopefully it was helpful for you.  With that I'll go ahead and sign off.

Matt: All right.  Thank you much.



© 2012 Golden Helix, Inc     Facebook     Twitter     Linked In     Blog   YouTube

Site Map   |   Privacy Policy   |   Contact Us