Preface
Why reference book?
With the newly gained information over the last two decades, it is time to re-evaluate how we infer and analyze genes and alleles in the expressed repertoire. IGHV alleles hold valuable information about the function of the adaptive immune system. This book uses IGH repertoires from VDJbase.org (Omer et al. 2020) and describes a new way of exploring the expressed alleles and inferring genotypes. The book:
Shows the expressed repertoire data of four projects from VDJbase.org.
Includes interactive and dynamic visualizations.
Integrated with applications that allow you to “play” with the data to gain new insights.
Express yourself! Each page includes a comment section where you can add insights and thoughts about the expressed alleles and visualizations (under development).
Genotyping is summarizing the genes and alleles an individual carries (Yaari and Kleinstein 2015). Though capturing this information from genomic data is technically feasible, this is a very challenging task due to the highly repetitive nature of these genomic loci Watson et al. (2013). In recent years several methods have been developed to deduce this information from expressed B cell receptor repertoires Slabodkin et al. (2021). These methods rely on a pre-set maximum number of alleles per gene that can appear in the genotype and the overall expression of the alleles. Although the current methods have their merits, they do possess several cavities that can bias the inferred genotypes Zhang et al. (2015). One major pitfall is the goodness of the alignment. Each aligned sequence is assigned with one or more matching allele annotations. Several factors can cause errors in the final annotations for each given sequence, the major ones are somatic hypermutations, the length of the sequenced V(D)J segments, and the quality of the sequencing. The existing genotype inference methods attempt to overcome these factors but are not very successful. In this book, we raise the idea of abandoning the concept of genes for inferring a genotype and rather determining the alleles directly. To better understand the influence of the alleles on one another, the alleles were divided into allele similarity clusters (ASCs). These ASCs were clustered based on the nucleotide level similarity between the allele’s germline sequence (For more information see chapter 1). This concept of allele clustering is not new and was previously performed by IMGT (Giudicelli and Lefranc 1999). Yet, with gained information from the last two decades on the expressed alleles, we sought to re-group the alleles. This re-grouping into the alleles by similarity clusters is consistent with the observed multiple annotations in sequences within the expressed repertoires.
How to read this book?
Each chapter of this book showcases a different IGHV ASC. The chapters include information on the expression of the alleles assigned to these ASCs in several projects. Our ASC-based genotype method has set allele thresholds and options to interactively “play” with the data, to gain insights into the ASC’s expressed alleles and the genotype combinations.
We included the allele groups for full and partial (BIOMED-2 (Van Dongen et al. 2003)) length V segments. The ASC numbers differ between the V lengths as the clustering is done based on the given germline length.
Who are we?
We are part of the AIRR community and members of the IARC committee. Our target is to make immune repertoire exploration more accessible and fun! We want to create a space where we can discuss and explore the IGH reference and gain insights from the community.