Genome reference quality an essential resource in an age of genome editing

With the recent exploration of how we may improve livestock production and meet growing demand for animal protein products using genome editing technology, we argue that exemplary genome references will be required to ensure that the proposed edits are specific and carefully evaluated for any potentially harmful side effects. We explore in this short review the status of existing genome references for the major food producing animals (cattle, chicken, pigs, goat and sheep) and summarise best practice for creating future higher quality genome references. Each will serve as a central conduit in the study of genetic manipulation outcomes, and provide a computational workflow for how the edited genome could be evaluated for no other unexpected base changes in the rest of the genome. CONFERENCE PAPER A significant contribution to experimental model systems permeates the history of domestic animal studies (Megens and Groenen 2012). Many reproductive success stories in human fertility were first pioneered in cattle; the transgenic cow is used to produce proteins in their milk for human therapeutics, and a long history of collecting animal tissues from abattoirs for the purification of various biologicals continues today.

The finding that injections of tumour filtrate into healthy chickens reproduced observed tumours initiated the field of viral oncology (Rubin 2011).These are but a few examples that highlight the many contributions that food-producing animals have made to advances in biomedical science.However, their most significant contribution to society is as a food source; and suffice to say that without a safe and efficient supply of foodproducing animals, a significant percentage of our world population would be severely malnourished and possibly starve to death.The ability to feed the world is even more urgent today, with a world population predicted to reach 9.7 billion by 2050 (UN DESA Report; https://esa.un.org/unpd/wpp/).With the recent exploration of how we may improve livestock production and meet growing demand for animal protein products using genome editing technology, we argue that exemplary genome references will be required to ensure that the proposed edits are specific and carefully evaluated for any potentially harmful side effects.We explore in this short review the status of existing genome references for the major food producing animals (cattle, chicken, pigs, goat and sheep), summarise best practice for creating future higher quality genome references.
Each will serve as a central conduit in the study of genetic manipulation outcomes, and provide a computational workflow for how the edited genome could be evaluated for no other unexpected base changes in the rest of the genome.
Today we are fortunate to have access to sequencing technology that can advance our ability to obtain near complete DNA sequences of each food-producing genome.At present, moderate-quality genome references are available for all food-producing species including cattle, chicken, sheep and pig that can serve as a computational starting point to ensure the traits we wish to protect, enhance or suppress are studied with a relatively small loss of information (Table 1).We label these references as 'moderate' quality since the most realistic measure of completeness is the number of contigs or "gap-free sequences" being equal to the expected total chromosome count.
However, in each case, total contig numbers are far higher in these animals compared to the human genome.Advances in sequencing technology and physical mapping of chromosomes, specifically those producing longer reads, have brought on the eager expectation that we will elevate each of these references to near human quality, hopefully, single scaffolds per chromosome with a small number of contigs per scaffold.
The recently assembled goat genome provides validation of this expectation with 31 assembled scaffolds equivalent to the expected number of chromosomes (Derek M Bickhart 2016).Moreover, we are aware of recent assemblies of the chicken, pig and bovine genomes using this same path of long read technology that promises to offer the community high genome reference quality for future computational and genomic studies.
A variety of approaches can be used to address sequence connectivity deficits observed in these genome references ( Using one such assembly algorithm (Berlin et al. 2015), the goat (Capra hircus) genome reached unprecedented levels of sequence continuity (Table 1), thus demonstrating the clear advantages of recent technological advances in genome assembly.
Starting with the most contiguous assembly possible, the next step is to apply highresolution mapping/phasing technology, such as chromatin sequence maps that will produce a proximity-guided assembly, thus creating chromosome-scale scaffolds that in theory should match total chromosome count.Fortunately, recent methodological advances have mostly overcome prior assembly connectivity bottlenecks by either adapting a chromosome conformation capture technique (Selvaraj et al. 2013) or utilizing restriction enzyme cuts of long DNA strands that are separated on nanochannel arrays (Hastie et al. 2013).By using a combination of these scaffolding methods the 3,074 assembled goat contigs were connected to a final count of 31 scaffolds, the known number of chromosomes for goat (Bickhart 2016).In the chicken, genome-wide study designs continue to be incomplete due to missing autosomes, in particular, the high GCcontent microchromosomes (cite G3 paper).Utilization of long read assemblies and highresolution maps will resolve most of these deficiencies.
In the final phase of genome assembly curation, its accuracy is typically judged by the following metrics: the appearance of homologous reference differences compared to called single base, small (<6bp) insertion or deletions, all from same source DNA sequences, and if available long mate pair sequences that display alignment discordance in order or orientation of scaffolds or contigs within scaffolds.For the latter, a conundrum is few automated tools can make genome-wide decisions on assembly order or orientation without manual review as these often involve repeats, segmental duplications or tandem arranged gene families.
Over a decade ago Georges and Andersson highlighted the excitement and promise of dissecting quantitative trait loci (QTL) of economic value (Andersson and Georges 2004).
cattle, sheep, pigs, goats and chickens are now being used to generate large volumes of genotype data that link natural nucleotide variation to phenotypic variation within the context of a production environment.A consequence of access to higher resolution SNP panels and whole genome sequencing (WGS) methods is that QTL are now often resolved to the limits of linkage disequilibrium, even with a keen focus on the more interpretable coding variation.Today some of these loci have been subjected to selection that further advance trait averages with monetary benefits.However, this process is still slow, and beneficial variation can be inadvertently removed or perhaps worse, deleterious variants propagated by linkage.It remains a major challenge to unravel the genes and the regulatory elements that control specific traits before we even consider specific target sequences for genome editing.Should high-value targets be identified, targeted genome editing offers a method to alleviate the disadvantages of selective breeding, mainly the time required to reach a selection target.Despite the advances in genome editing, it is not the sole answer to advance traits of value, but when combined with genomic selection and assisted reproductive technologies, it could transform current livestock improvement strategies.
In application of significance to animal welfare and human safety when handling cattle is to generate hornless cattle (Carlson et al. 2016).Using knowledge of naturally polled genetic variation (Medugorac et al. 2012), the locus responsible was edited to produce hornless cattle thus improving the welfare of cattle by avoiding painful dehorning procedures.
These are just a few examples that demonstrate how genome editing can introduce highly valuable natural variants, even those that would be outside of the available breeding population, onto the best genetic backgrounds in one generation without compromising the years of selection of such elite genetic stocks.
The simplicity, scope, and accuracy of genome editing technologies are truly astounding.
In fact, our knowledge of the sequences/regions to edit in food producing animals with thousands of QTLs (see http://www.animalgenome.org/cgi-bin/QTLdb/index)already identified for simple monogenic and complex polygenic traits, presents a conundrum as to which targets do we apply this editing capability.An added caveat is that few of these QTLs have definitive causative alleles identified.However, given the economic impact of many of these traits, the incentive to remove or replace associated alleles will eventually lead to genome targets.Of course, this is a simple picture with extensive experimentation required to pinpoint the genes or regulatory regions that will alter the phenotypic outcome.Advances in the characterisation of genes, transcripts and their regulatory regions (a core goal within the FAANG consortium; http://www.faang.org)are likely to underpin the prediction of genes and genetic variant causally linked to simple and complex traits.Genome editing is likely to be an essential tool in our armory to test these predictions in either cell, tissues, organoid or even whole animals.Ultimately, specific genome targets will come into focus and editing experiments will follow.It is expected that equal rigor will be devoted to safety assessments to ensure animal well-being and long-term germplasm diversity, since substantial financial investment will create fewer founders to pass the trait to future generations and, perhaps most importantly, to determine whether the edit meets phenotypic expectations.http://dx.doi.org/10.2218/natlinstbiosci.1.2016.1745 It is generally underappreciated that genome editing is just breaking the chromosomal DNA and then allowing the cells natural ability to fix the break precisely and thus incorporate the intended sequence (Segal and Meckler 2013).The basic process is to identify a target sequence to be edited, computationally design a single guide RNA (sgRNA) to introduce the base(s) change, inject the sgRNA and associated reagents into the stem cell, transfer the embryo to the host and if the pregnancy is successful validate the expected edit, and perhaps most importantly start monitoring animal health and performance.The design of sgRNAs has been simplified in the past few years with several bioinformatic pipelines offered (Wong et al. 2015) (Doench et al. 2016), but if reference errors occur sgRNA design will be flawed and lead to missed targets.Also, to cope with genetic variants and polymorphisms in target genomes, it is necessary to re-sequence many animals in the population and compare them to the reference, again to avoid unwanted off-target sgRNA design errors.Protein-coding gene annotation of food producing genomes is mostly sufficient for sgRNA design to target coding regions.
However, paralogs, copy number variants, and non-coding RNAs require further attention in each assembly.Newly available transcript sequencing technology such as Iso-Seq (http://www.pacb.com/applications/rna-sequencing/)will rectify many gene annotation deficiencies (Kuo et al., submitted), especially the characterisation of all alternate transcripts and for long non-coding RNA annotation, the most in need of improvement.
Also, the functional annotation of animal genomes (FAANG Consortium) will aid annotation of regulatory regions that may be targeted for change once experimental validation catches up.
Most evidence indicates that genome editing, specifically CRISPR methodology, is precise template for evaluating genome edited food-producing animals, we briefly outline the computational steps using the chicken genome as an example (Figure 1).Our process overview is mostly based on many previously established cancer genome analysis pipelines that compile genetic differences among the genomes of normal and cancer genomes within the patient.Once the genome edited animal is confirmed to contain the targeted base change(s), typically a PCR strategy (Carlson et al. 2016), an iterative series of steps is proposed: DNA is extracted from the pre-and post-edited genomes, PCR-free libraries are constructed of short fragment size (~450bp), the genome is sequenced to a minimum of 30x coverage using an X10 Illumina instrument (recommended for cost efficiency) and all sequences (150bp length) are filtered for quality using the PICARD software package module CollectWGSMetrics then mapped using the BWA-MEM aligner to the appropriate animal genome reference for several computational measures.First, any sequences associated with the targeting sgRNA can be identified with fast alignment tools such as BLAT.This step also serves to validate the prior PCR results for base(s) modification.From previous sequence alignments, all single nucleotide polymorphisms (SNPs) and small insertions and deletions (<10bp) are called with two independent callers, such as VarScan2 (Koboldt et al. 2013) and Strelka (Saunders et al. 2012).Currently the best practice is to converge independent SNP or indel calls to reduce false positives.The converged SNP and indel variant files can be imported into various software tools to evaluate many pre-and post-edited genome properties, for example, we recommend the use of the Ensembl VEP tool (McLaren et al. 2010) to catalogue putative loss of function variants within protein coding genes that may impact animal health, although these events could be unrelated to the editing process.
Although it is clear that structural or copy number variants are a major source of variation among humans, their accurate ascertainment is still challenging.The use of physical mapping methods based on whole genome restriction maps is likely to make this easier.
We suggest a standard copy number variant analysis, such as CopyCat (Sehn et al. 2014), be executed to reveal any significant genome aberrations, i.e. expansions or contractions, that in some cases can merit further investigation.The tools for this analysis are ever changing, but we offer some choices based on ease of use, accuracy, and sensitivity (Figure 1).Taking advantage of fully developed computational pipelines that generate concise reports of mutation burden in cancer patients will allow these same best practices to be implemented for examining pre-and post-edits to the food-producing genome.Of course, some modifications will be needed.Also, genome editing reports can be modified to account for the regulatory standards that are not clear at this point for food producing animals.
It is exciting to see reference genome assembly completeness and accuracy for many organisms is now nearly reaching quality standards found in the human genome.This development is largely the result of long reads spanning repeats and complete physical maps of chromosomes that allow for de novo assembly as never found before.Not surprisingly, we conclude accurate genome assembly and annotation (Not covered here; but an equally important task to define all coding and non-coding transcripts, and their regulatory regions) is required for the success of genome editing experiments.Assuming the genomes of food-producing animals will continue to be edited, we expect standardised methods will be developed and validated to compare genomes before and after genetic manipulation.Measured perturbations to genome integrity or the possibility of finding foreign DNA sequences in animal genomes destined for food consumption compelled us to provide an overview of computational methods and to start discussions of best practices to assure the public that attempts are being made to alleviate concerns about animal welfare or food safety.

National
Institutes of Bioscience Journal 2016, Vol. 1 http://www.nibjournal.ed.ac.uk/ http://dx.doi.org/10.2218/natlinstbiosci.1.2016.1745 his 2005 review of domestic animal genomics, Womack said: "RNA interference may soon find its way into animal improvement, likely in conjunction with cloning from modified somatic cells."Since this time, genome editing has come of age and been applied to a limited number of food animals(Whitworth et al. 2016) (Choi et al. 2016)(Carlson et al. 2016) (Dimitrov et al. 2016).One of the most exciting applications of genome editing is the control of infectious disease, which is a critical need facing livestock producers throughout the world(Smith et al. 2016).The host-pathogen relationship have become essential to the spread of new viral strains with major international impact such as new strains of avian influenza on poultry production.To protect future pig populations from devastating viral outbreaks Prather et al. edited an entry receptor for porcine reproductive and respiratory syndrome virus infection(Whitworth et al. 2016).Another National Institutes of Bioscience Journal 2016, Vol. 1 http://www.nibjournal.ed.ac.uk/ http://dx.doi.org/10.2218/natlinstbiosci.1.2016.1745 and not off-target(O'Geen et al. 2015).However, concerns remain that the edited genome can contain foreign DNA not detected with standard PCR and Southern blot techniques(Kim and Kim 2016).Given the high value of these edited founder animals and the need to ensure a thorough investigation of unexpected off-site effects, we suggest some measures of post-editing genome integrity be implemented.To provide a starting National Institutes of Bioscience Journal 2016, Vol. 1 http://www.nibjournal.ed.ac.uk/ http://dx.doi.org/10.2218/natlinstbiosci.1.2016.1745