Avsnitt

  • Link to bioRxiv paper:
    http://biorxiv.org/cgi/content/short/2020.08.13.249870v1?rss=1

    Authors: Halstead, M. M., Kern, C., Saelao, P., Wang, Y., Chanthavixay, G., Medrano, J. F., Van Eenennaam, A. L., Korf, I., Tuggle, C. K., Ernst, C. W., Zhou, H., ROSS, P. J.

    Abstract:
    Background: Although considerable progress has been made towards annotating the noncoding portion of the human and mouse genomes, regulatory elements in other species, such as livestock, remain poorly characterized. This lack of functional annotation poses a substantial roadblock to agricultural research and diminishes the value of these species as model organisms. As active regulatory elements are typically characterized by chromatin accessibility, we implemented the Assay for Transposase Accessible Chromatin (ATAC-seq) to annotate and characterize regulatory elements in pigs and cattle, given a set of eight adult tissues. Results: Overall, 306,304 and 273,594 active regulatory elements were identified in pig and cattle, respectively. 71,478 porcine and 47,454 bovine regulatory elements were highly tissue-specific and were correspondingly enriched for binding motifs of known tissue-specific transcription factors. However, in every tissue the most prevalent accessible motif corresponded to the insulator CTCF, suggesting pervasive involvement in 3-D chromatin organization. Taking advantage of a similar dataset in mouse, open chromatin in pig, cattle, and mice were compared, revealing that the conservation of regulatory elements, in terms of sequence identity and accessibility, was consistent with evolutionary distance; whereas pig and cattle shared about 20% of accessible sites, mice and ungulates only had about 10% of accessible sites in common. Furthermore, conservation of accessibility was more prevalent at promoters than at intergenic regions. Conclusions: The lack of conserved accessibility at distal elements is consistent with rapid evolution of enhancers, and further emphasizes the need to annotate regulatory elements in individual species, rather than inferring elements based on homology. This atlas of chromatin accessibility in cattle and pig constitutes a substantial step towards annotating livestock genomes and dissecting the regulatory link between genome and phenome.

    Copy rights belong to original authors. Visit the link for more info

  • Link to bioRxiv paper:
    http://biorxiv.org/cgi/content/short/2020.08.13.249698v1?rss=1

    Authors: Gal, C., Carelli, F. N., Appert, A., Cerrato, C., Huang, N., Dong, Y., Murphy, J., Ahringer, J.

    Abstract:
    The DREAM (DP, Retinoblastoma [Rb]-like, E2F, and MuvB) complex controls cellular quiescence by repressing cell cycle and other genes, but its mechanism of action is unclear. Here we demonstrate that two C. elegans THAP domain proteins, LIN-15B and LIN-36, co-localize with DREAM and function by different mechanisms for repression of distinct sets of targets. LIN-36 represses classical cell cycle targets by promoting DREAM binding and gene body enrichment of H2A.Z, and we find that DREAM subunit EFL-1/E2F is specific for LIN-36 targets. In contrast, LIN-15B represses germline specific targets in the soma by facilitating H3K9me2 promoter marking. We further find that LIN-36 and LIN-15B differently regulate DREAM binding. In humans, THAP proteins have been implicated in cell cycle regulation by poorly understood mechanisms. We propose that THAP domain proteins are key mediators of Rb/DREAM function.

    Copy rights belong to original authors. Visit the link for more info

  • Saknas det avsnitt?

    Klicka här för att uppdatera flödet manuellt.

  • Link to bioRxiv paper:
    http://biorxiv.org/cgi/content/short/2020.08.13.249656v1?rss=1

    Authors: Feng, S., Lu, S., Grueber, W. B., Mann, R. S.

    Abstract:
    We describe a simple and efficient technique that allows scarless engineering of Drosophila genomic sequences near any landing site containing an inverted attP cassette, such as a MiMIC insertion. This 2-step method combines phiC31 integrase mediated site-specific integration and homing nuclease mediated resolution of local duplications, efficiently converting the original landing site allele to modified alleles that only have the desired change(s). Dominant markers incorporated into this method allow correct individual flies to be efficiently identified at each step. In principle, single attP sites and FRT sites are also valid landing sites. Given the large and increasing number of landing site lines available in the fly community, this method provides an easy and fast way to efficiently edit the majority of the Drosophila genome in a scarless manner. This technique should also be applicable to other species.

    Copy rights belong to original authors. Visit the link for more info

  • Link to bioRxiv paper:
    http://biorxiv.org/cgi/content/short/2020.08.13.250092v1?rss=1

    Authors: Katsevich, E., Roeder, K.

    Abstract:
    Mapping gene-enhancer regulatory relationships is key to unraveling molecular disease mechanisms based on GWAS associations in non-coding regions. Recently developed CRISPR regulatory screens (CRSs) based on single cell RNA-seq (scRNA-seq) are a promising high-throughput experimental approach to this problem. However, the analysis of these screens presents significant statistical challenges, including modeling cell-level gene expression and correcting for sequencing depth. Using a recent large-scale CRS and its original analysis as a case study, we demonstrate weaknesses in existing analysis methodology, which lead to false positives as well as false negatives. To address these challenges, we propose SCEPTRE: analysis of single cell perturbation screens via conditional resampling. This novel method infers gene-enhancer associations by modeling the stochastic assortment of CRISPR gRNAs among cells instead of the gene expression, remaining valid despite arbitrary misspecification of the gene expression model. Applying SCEPTRE to the large-scale CRS, we demonstrate improvements in both sensitivity and specificity. We also discover 217 regulatory relationships not found in the original study, many of which are supported by existing functional data.

    Copy rights belong to original authors. Visit the link for more info

  • Link to bioRxiv paper:
    http://biorxiv.org/cgi/content/short/2020.08.13.249078v1?rss=1

    Authors: Keenan, C. R., Coughlan, H. D., Iannarella, N., Johanson, T. M., Chan, W. F., Garnham, A. L., Smyth, G. K., Allan, R. S.

    Abstract:
    H3K9me3-dependent heterochromatin is critical for the silencing of repeat-rich pericentromeric regions and also has key roles in repressing lineage-inappropriate protein-coding genes in differentiation and development. Here, we investigate the molecular consequences of heterochromatin loss in cells deficient in both Suv39h1 and Suv39h2 (Suv39DKO), the major mammalian histone methyltransferase enzymes that catalyse heterochromatic H3K9me3 deposition. Unexpectedly, we reveal a predominant repression of protein-coding genes in Suv39DKO cells, with these differentially expressed genes principally in euchromatic (DNaseI-accessible, H3K27ac-marked) rather than heterochromatic (H3K9me3-marked) regions. Examination of the 3D nucleome reveals that transcriptomic dysregulation occurs in euchromatic regions close to the nuclear periphery in 3-dimensional space. Moreover, this transcriptomic dysregulation is highly correlated with altered 3-dimensional genome organization in Suv39DKO cells. Together, our results suggest that the nuclear lamina-tethering of Suv39-dependent H3K9me3 domains provides an essential scaffold to support euchromatic genome organisation and the maintenance of gene transcription for healthy cellular function.

    Copy rights belong to original authors. Visit the link for more info

  • Link to bioRxiv paper:
    http://biorxiv.org/cgi/content/short/2020.08.12.247692v1?rss=1

    Authors: Malukiewicz, J., Cartwright, R. A., Curi, N. H., Dergam, J. A., Igayara, C. S., Moreira, S. B., Molina, C. V., Nicola, P. A., Noll, A., Passamani, M., Pereira, L. C., Pissinatti, A., Ruiz-Miranda, C. R., Silva, D. L., Stone, A. C., Zinner, D., Roos, C.

    Abstract:
    Background Callithrix marmosets are a relatively young non-human primate radiation, whose phylogeny is not yet fulllly resolved. These primates are naturally para- and allopatric, but three species with highly invasive potential have been introduced into the southeastern Brazilian Atlantic Forest by the pet trade. There, these species hybridize with each other and endangered, native congeners. We aimed in this study to reconstruct a robust Callithrix phylogeny and divergence time estimates, as well as identify autochthonous and allochthonous Callithrix mitogenome lineages across Brazil. We sequenced 49 mitogenomes from four species (C. aurita, C. geoffroyi, C. jacchus, C. penicillata) and anthropogenic hybrids (C. aurita x Callithrix sp., C. penicillata x C. jacchus, Callithrix sp. x Callithrix sp., C. penicillata x C. geoffroyi) via Sanger and whole genome sequencing. We combined these data with previously published Callithrix mtDNA genomes to analyze five Callithrix species in total. Results We report the complete sequence and organization of the C. aurita mtDNA genome. Phylogenetic analyses showed that C. aurita was the first to diverge within Callithrix 3.54 million years ago (MYA), while C. jacchus and C. penicillata lineages diverged most recently 0.5 MYA as sister clades. MtDNA clades of C. aurita, C. geoffroyi, and C. penicillata show intraspecific geographic structure, but C. penicillata clades appear polyphyletic. Hybrids, which were identified by phenotype, possessed mainly C. penicillata or C. jacchus mtDNA plotypes. The geographic origins of mtDNA haplotypes from hybrid and allochthonous Callithrix were broadly distributed across natural Callithrix ranges. Our phylogenetic results also evidence introgression of C. jacchus mtDNA into C. aurita. Conclusion Our robust Callithrix mitogenome phylogeny shows C. aurita lineages as basal and C. jacchus lineages among the most recent within Callithrix. We provide the first evidence that parental mtDNA lineages of anthropogenic hybrid and allochtonous marmosets are broadly distributed inside and outside of the Atlantic Forest. We also show evidence of cryptic hybridization between allochthonous Callithrix and autochthonous C. aurita. Our results encouragingly show that further development of genomic resources will allow to more clearly elucidate Callithrix evolutionary relationships and understand the dynamics.

    Copy rights belong to original authors. Visit the link for more info

  • Link to bioRxiv paper:
    http://biorxiv.org/cgi/content/short/2020.08.12.248526v1?rss=1

    Authors: Lowther, C., Valkanas, E., Giordano, J. L., Wang, H. Z., Currall, B. B., O'Keefe, K., Collins, R. L., Zhao, X., Austin-Tse, C. A., Evangelista, E., Aggarwal, V., Lucente, D., Gauthier, L. D., Tolonen, C., Sahakian, N., An, J.-Y., Dong, S., Norton, M. E., MacKenzie, T., Devlin, B., Gilmore, K., Powell, B., Brandt, A., Vetrini, F., DiVito, M., Goldstein, D. B., Sanders, S. J., MacArthur, D. G., Hodge, J. C., O'Donnell-Luria, A., Rehm, H., Vora, N., Levy, B., Brand, H., Wapner, R., Talkowski, M. E.

    Abstract:
    Current prenatal and pediatric genetic evaluation requires three tests to capture balanced chromosomal abnormalities (karyotype), copy number variants (microarray), and coding variants (whole exome sequencing [WES] or targeted gene panels). Here, we explored the sensitivity, specificity, and added value of whole genome sequencing (WGS) to displace all three conventional approaches. We analyzed single nucleotide variants, small insertions and deletions, and structural variants from WGS in 1,612 autism spectrum disorder (ASD) quartet families (n=6,448 individuals) to benchmark the diagnostic performance of WGS against microarray and WES. We then applied these WGS variant discovery and interpretation pipelines to 175 trios (n=525 individuals) with a fetal structural anomaly (FSA) detected on ultrasound and pre-screened by karyotype and microarray. Analyses of WGS in ASD quartets identified a diagnostic variant in 7.5% of ASD probands compared to 1.1% of unaffected siblings (odds ratio=7.5; 95% confidence interval=4.5-13.6; P=2.8x10-21). We found that WGS captured all diagnostic variants detected by microarray and WES as well as five additional diagnoses, reflecting a 0.3% added yield over WES and microarray when combined. The WGS diagnostic yield was also inversely correlated with ASD proband IQ. Implementation in FSA trios identified a diagnostic variant not captured by karyotype or microarray in 12.0% of fetuses. Based on these data and prior studies, we estimate that WGS could provide an overall diagnostic yield of 47.6% in unscreened FSA referrals. We observed that WGS was sensitive to the detection of all classes of pathogenic variation captured by three conventional tests. Moreover, diagnostic yields from WGS were superior to any individual genetic test, warranting further evaluation as a first-tier diagnostic approach.

    Copy rights belong to original authors. Visit the link for more info

  • Link to bioRxiv paper:
    http://biorxiv.org/cgi/content/short/2020.08.12.247528v1?rss=1

    Authors: Wang, S., Tao, Z., Wu, T., Liu, X.-S.

    Abstract:
    Summary: Mutational signatures are recurring DNA alteration patterns caused by distinct mutational events during the evolution of cancer. In recent years, several bioinformatics tools are available for mutational signature analysis. However, most of them focus on specific type of mutation, or have limited scope of application. A pipeline tool for comprehensive mutational signature analysis is still lacking. Here we present Sigflow pipeline, which provides an one-stop solution for de novo signature extraction, reference signature fitting, signature stability analysis, sample clustering based on signature exposure in different types of genome DNA alterations including single base substitution (SBS), doublet base substitution (DBS), small insertion and deletion (INDEL), and copy number alteration. A Docker image is constructed to solve the complex and time-consuming installation issues, and this enables reproducible research by version control of all dependent tools along with their environments. The Sigflow pipeline can be applied to both human and mouse genomes. Availability and implementation: Sigflow is an open source software under academic free license (AFL) v3.0 and it is freely available at https://github.com/ShixiangWang/sigminer.workflow or https://hub.docker.com/r/shixiangwang/sigflow. Contact: [email protected]

    Copy rights belong to original authors. Visit the link for more info

  • Link to bioRxiv paper:
    http://biorxiv.org/cgi/content/short/2020.08.12.248286v1?rss=1

    Authors: Kelley, E. R., Sleith, R. S., Matz, M. V., Wright, R. M.

    Abstract:
    Rampant coral disease, exacerbated by climate change and other anthropogenic stressors, threatens reefs worldwide, especially in the Caribbean. Physically isolated yet genetically connected reefs such as Flower Garden Banks National Marine Sanctuary (FGBNMS) may serve as potential refugia for degraded Caribbean reefs. However, little is known about the mechanisms and trade-offs of pathogen resistance in reef-building corals. Here we measure pathogen resistance in Montastraea cavernosa from FGBNMS. We identified individual colonies that demonstrated resistance or susceptibility to Vibrio spp. in a controlled laboratory environment. Long-term growth patterns suggest no trade-off between disease resistance and calcification. Predictive (pre-exposure) gene expression highlights subtle differences between resistant and susceptible genets, encouraging future coral disease studies to investigate associations between resistance and replicative age and immune cell populations. Predictive gene expression associated with long-term growth underscores the role of cation transporters and extracellular matrix remodelers, contributing to the growing body of knowledge surrounding genes that influence calcification in reef-building corals. Together these results demonstrate that coral genets from isolated sanctuaries such as FGBNMS can withstand pathogen challenges and potentially aid restoration efforts in degraded reefs. Furthermore, gene expression signatures associated with resistance and long-term growth help inform strategic assessment of coral health parameters.

    Copy rights belong to original authors. Visit the link for more info

  • Link to bioRxiv paper:
    http://biorxiv.org/cgi/content/short/2020.08.10.245308v1?rss=1

    Authors: Du, X., Li, L., Liang, F., Liu, S., Zhang, W., Sun, S., Sun, Y., Fan, F., Wang, L., Liang, X., Qiu, W., Fan, G., Wang, O., Yang, W., Zhang, J., Xiao, Y., Wang, Y., Wang, D., Qu, S., Chen, F., Huang, J.

    Abstract:
    The importance of structural variants (SVs) on phenotypes and human diseases is now recognized. Although a variety of SV detection platforms and strategies that vary in sensitivity and specificity have been developed, few benchmarking procedures are available to confidently assess their performances in biological and clinical research. To facilitate the validation and application of those approaches, our work established an Asian reference material comprising identified benchmark regions and high-confidence SV calls. We established a high-confidence SV callset with 8,938 SVs in an EBV immortalized B lymphocyte line, by integrating four alignment-based SV callers [from 109x PacBio continuous long read (CLR), 22x PacBio circular consensus sequencing (CCS) reads, 104x Oxford Nanopore long reads, and 114x optical mapping platform (Bionano)] and one de novo assembly-based SV caller using CCS reads. A total of 544 randomly selected SVs were validated by PCR and Sanger sequencing, proofing the robustness of our SV calls. Combining trio-binning based haplotype assemblies, we established an SV benchmark for identification of false negatives and false positives by constructing the continuous high confident regions (CHCRs), which cover 1.46Gb and 6,882 SVs supported by at least one diploid haplotype assembly. Establishing high-confidence SV calls for a benchmark sample that has been characterized by multiple technologies provides a valuable resource for investigating SVs in human biology, disease, and clinical diagnosis.

    Copy rights belong to original authors. Visit the link for more info

  • Link to bioRxiv paper:
    http://biorxiv.org/cgi/content/short/2020.08.10.245258v1?rss=1

    Authors: Hao, Y., Mabry, M. E., Edger, P., Freeling, M., Zheng, C., Jin, L., VanBuren, R., Colle, M., An, H., Abrahams, R. S., Qi, X., Barry, K., Daum, C., Shu, S., Schmutz, J., Sankoff, D., Barker, M. S., Lyons, E., Pires, J. C., Conant, G. C.

    Abstract:
    The members of the tribe Brassiceae share an ancient whole genome triplication (WGT), and plants in this tribe display extraordinarily high within-species morphological diversity. One proposed model for the formation of these hexaploid Brassiceae is that they result from a "two-step" pair of hybridizations. However, direct evidence supporting this model of formation has been lacking; meanwhile, the evolutionary and functional constraints that drove evolution after the hexaploidy are even less understood. Here we report a new genome sequence of Crambe hispanica, a species sister to most sequenced Brassiceae. After adding this new genome to three others that are also descended from the ancient hexaploidy, we traced the history of gene loss after the WGT using a phylogenomic pipeline called POInT (the Polyploidy Orthology Inference Tool). This approach allowed us to confirm the two-step model of hexaploidy formation and to assign statistical confidence to our parental "subgenome" assignments for >90,000 individual genes. We show that each subgenome has a statistically distinguishable rate of homeolog losses. Moreover, our modeling allowed us to infer that there was a significant temporal gap between the two allopolyploidizations, with about one third of the total shared gene losses between the four analyzed Brassiceae species in the first two subgenomes prior to the arrival of the third subgenome. There is little indication of functional distinction between the three subgenomes: the individual subgenomes show no patterns of functional enrichment, no excess of shared protein-protein or metabolic interactions between their members, and no biases in their likelihood of having experienced a recent selective sweep. We propose a "mix and match" model of allopolyploidy, where subgenome origin drives homeolog loss propensities but where genes from different subgenomes function together without difficulty.

    Copy rights belong to original authors. Visit the link for more info

  • Link to bioRxiv paper:
    http://biorxiv.org/cgi/content/short/2020.08.12.248740v1?rss=1

    Authors: Rahman, S. R., Cnaani, J., Kinch, L. N., Grishin, N. V., Hines, H. M.

    Abstract:
    Background: In the model bumble bee species B. terrestris, both males and females exhibit black coloration on the third thoracic and first metasomal segments. We discovered a fortuitous lab-generated mutant in which this typical black coloration is replaced by yellow. As this same color variant is found in several sister lineages to B. terrestris within the Bombus s.s. subgenus, this could be a result of ancestral allele sorting. Results: Utilizing a combination of RAD-Seq and whole-genome re-sequencing approaches, we localized the color-generating variant to a single SNP in the protein-coding sequence of a homeobox transcription factor, cut. Sanger sequencing confirmed fixation of this SNP between wildtype and yellow mutants. Protein domain analysis revealed this SNP to generate an amino acid change (Ala38Pro) that modifies the conformation of coiled-coil structural elements which lie outside the characteristic DNA binding domains. We found all Hymenopterans including B. terrestris sister lineages possess the non-mutant allele, indicating different mechanism(s) are involved in the same black to yellow transition in nature. Conclusions: Cut is a highly pleiotropic gene important for multiple facets of development, yet this mutation generated no noticeable external phenotypic effects outside of setal characteristics. Reproductive capacity was observed to be reduced, however, with queens being less likely to mate and produce female offspring, in a manner similar to workers. Our research implicates a novel developmental player in pigmentation, and potentially caste as well, thus contributing to a better understanding of the evolution of diversity in both of these processes.

    Copy rights belong to original authors. Visit the link for more info

  • Link to bioRxiv paper:
    http://biorxiv.org/cgi/content/short/2020.08.11.247122v1?rss=1

    Authors: Heldrich, J., Markowitz, T. E., Vale-Silva, L. A., Hochwagen, A.

    Abstract:
    Meiotic chromosomes organize around a cohesin-dependent axial element, which promotes meiotic recombination and fertility. In the absence of cohesin, axial-element proteins instead accumulate in poorly understood genomic regions. Here, we show in S. cerevisiae that these regions are particularly enriched for axis proteins even on wild-type chromosomes and thus reflect a cohesin-independent recruitment mechanism. By contrast, other organizers of chromosome structure, including cohesin, condensin, and topoisomerases, are depleted from the same regions. This spatial patterning is observable before meiotic entry and therefore independent of meiotic recombination. Indeed, the regional density of gene-coding sequences is sufficient to predict a large fraction of cohesin-independent axis protein binding, suggesting that the gene-associated chromatin landscape plays a role in modulating axis protein deposition. The increased accumulation of axis proteins in these regions corresponds to a greater potential for initiation of recombination and progression to crossovers.

    Copy rights belong to original authors. Visit the link for more info

  • Link to bioRxiv paper:
    http://biorxiv.org/cgi/content/short/2020.08.10.235309v1?rss=1

    Authors: Simbaqueba, J., Rodriguez, E. A., Burbano, D. M., Gonzalez, C., Caro, A.

    Abstract:
    The vascular wilt disease caused by the fungus Fusarium oxysporum f. sp. physali (Foph) is one of the most limiting factors for the production and export of cape gooseberry (Physalis peruviana) in Colombia. A previous study of the transcriptomic profile of a highly virulent strain of F. oxysporum in cape gooseberry plants, from a collection of 136 fungal isolates obtained from wilted cape gooseberry plants, revealed the presence of secreted in the xylem (SIX) effector genes, known to be involved in the pathogenicity of other F. oxysporum formae speciales (ff. spp.). This pathogenic strain was named Foph, due to its specificity for cape gooseberry hosts. Here, we sequenced the genome of Foph, using the Illumina MiSeq platform. We analyzed the assembled genome, focusing on the confirmation of the presence of homologues of SIX effectors and the identification of novel candidates of effector genes unique of Foph. By comparative and phylogenomic analyses based on single-copy orthologues, we identified that Foph is closely related to F. oxysporum ff. spp., associated with solanaceous hosts. We confirmed the presence of highly identical homologous genomic regions between Foph and Fol, that contain effector genes and identified seven new effector gene candidates, specific to Foph strains. We also conducted a molecular characterization of a panel of 29 F. oxysporum additional stains associated to cape gooseberry crops isolated from different regions of Colombia. These results suggest the polyphyletic origin of Foph and the putative independent acquisition of new candidate effectors in different clades of related strains. The novel effector candidates identified by sequencing and analyzing the genome of Foph, represent new sources involved in the interaction between Foph and cape gooseberry. These resources could be implemented to develop appropriate management strategies of the wilt disease caused by Foph in the cape gooseberry crop.

    Copy rights belong to original authors. Visit the link for more info

  • Link to bioRxiv paper:
    http://biorxiv.org/cgi/content/short/2020.08.10.244541v1?rss=1

    Authors: Schneider, C., Woehle, C., Greve, C., D'Haese, C. A., Wolf, M., Janke, A., Balint, M., Hüttel, B.

    Abstract:
    Genome sequencing of all known eukaryotes on Earth promises unprecedented advances in evolutionary sciences, ecology, systematics and in biodiversity-related applied fields such as environmental management and natural product research. Advances in DNA sequencing technologies make genome sequencing feasible for many non-genetic model species. However, genome sequencing today relies on large quantities of high quality, high molecular weight (HMW) DNA which is mostly obtained from fresh tissues. This is problematic for biodiversity genomics of Metazoa as most species are small and yield minute amounts of DNA. Furthermore, briging living specimens to the lab bench not realistic for the majority of species. Here we overcome those difficulties by sequencing two species of springtails (Collembola) from single specimens preserved in ethanol. We used a newly developed, genome-wide amplification-based protocol to generate PacBio libraries for HiFi long-read sequencing. The assembled genomes were highly continuous. They can be considered complete as we recovered over 95% of BUSCOs. Genome-wide amplification does not seem to bias genome recovery. Presence of almost complete copies of the mitochondrial genome in the nuclear genome were pitfalls for automatic assemblers. The genomes fit well into an existing phylogeny of springtails. A neotype is designated for one of the species, blending genome sequencing and creation of taxonomic references. Our study shows that it is possible to obtain high quality genomes from small, field-preserved sub-millimeter metazoans, thus making their vast diversity accessible to the fields of genomics.

    Copy rights belong to original authors. Visit the link for more info

  • Link to bioRxiv paper:
    http://biorxiv.org/cgi/content/short/2020.08.10.244624v1?rss=1

    Authors: Canakoglu, A., Pinoli, P., Bernasconi, A., Alfonsi, T., Melidis, D. P., Ceri, S.

    Abstract:
    ViruSurf, available at http://gmql.eu/virusurf/, is a large public database of viral sequences and integrated and curated metadata from heterogeneous sources (GenBank, COG-UK and NMDC); it also exposes computed nucleotide and amino acid variants, called from original sequences. A GISAID-specific ViruSurf database, available at http://gmql.eu/virusurf_gisaid/, offers a subset of these functionalities. Given the current pandemic outbreak, SARS-CoV-2 data are collected from the four sources; but ViruSurf contains other virus species harmful to humans, including SARS-CoV, MERS-CoV, Ebola, and Dengue. The database is centered on sequences, described from their biological, technological, and organizational dimensions. In addition, the analytical dimension characterizes the sequence in terms of its annotations and variants. The web interface enables expressing complex search queries in a simple way; arbitrary search queries can freely combine conditions on attributes from the four dimensions, extracting the resulting sequences. Several example queries on the database confirm and possibly improve results from recent research papers; results can be recomputed over time and upon selected populations. Effective search over large and curated sequence data may enable faster responses to future threats that could arise from new viruses.

    Copy rights belong to original authors. Visit the link for more info

  • Link to bioRxiv paper:
    http://biorxiv.org/cgi/content/short/2020.08.11.243840v1?rss=1

    Authors: Sengupta, D., Choudhury, A., Fortes-Lima, C., Aron, S., Whitelaw, G., Bostoen, K., Gunnink, H., Chousou-Polydouri, N., Delius, P., Tollman, S., Casas, F. G.-O., Norris, S., Mashinya, F., Alberts, M., Hazelhurst, S., Schlebusch, C. M., Ramsay, M.

    Abstract:
    South Eastern Bantu-speaking (SEB) groups constitute more than 80% of the population in South Africa. Despite clear linguistic and geographic diversity, the genetic differences between these groups have not been systematically investigated. Based on genome-wide data of over 5000 individuals, representing eight major SEB groups, we provide strong evidence for fine-scale population structure that broadly aligns with geographic distribution and is also congruent with linguistic phylogeny (separation of Nguni, Sotho-Tswana and Tsonga speakers). Although differential Khoe-San admixture plays a key role, the structure persists after Khoe-San ancestry-masking. The timing of admixture, levels of sex-biased gene flow and population size dynamics also highlight differences in the demographic histories of individual groups. The comparisons with five Iron Age farmer genomes further support genetic continuity over ~400 years in certain regions of the country. Simulated trait genome-wide association studies further show that the observed population structure could have major implications for biomedical genomics research in South Africa.

    Copy rights belong to original authors. Visit the link for more info

  • Link to bioRxiv paper:
    http://biorxiv.org/cgi/content/short/2020.08.10.244657v1?rss=1

    Authors: Kellaway, S. G., Keane, P., Edginton-White, B., Kakkad, R., Kennett, E., Bonifer, C.

    Abstract:
    Mutations of the hematopoietic master regulator RUNX1 cause acute myeloid leukaemia, familial platelet disorder and other haematological malignancies whose phenotypes and prognoses depend upon the class of RUNX1 mutation. The biochemical behaviour of these oncoproteins and their ability to cause unique diseases has been well studied, but the genomic basis of their differential action is unknown. To address this question we compared integrated phenotypic, transcriptomic and genomic data from cells expressing four types of RUNX1 oncoproteins in an inducible fashion during blood development from embryonic stem cells. We show that each class of mutated RUNX1 deregulates endogenous RUNX1 function by a different mechanism, leading to specific alterations in developmentally controlled transcription factor binding and chromatin programming. The result is distinct perturbations in the trajectories of gene regulatory network changes underlying blood cell development that are consistent with the nature of the final disease phenotype. The development of novel treatments for RUNX1-driven diseases will therefore require individual consideration.

    Copy rights belong to original authors. Visit the link for more info

  • Link to bioRxiv paper:
    http://biorxiv.org/cgi/content/short/2020.08.10.243543v1?rss=1

    Authors: Tian, L., Jabbari, J. S., Thijssen, R., Gouil, Q., Amarasinghe, S. L., Kariyawasam, H., Su, S., Dong, X., Law, C. W., Lucattini, A., Chung, J. D., Naim, T., Chan, A., Ly, C. H., Lynch, G. S., Ryall, J. G., Anttila, C. J. A., Peng, H., Anderson, M. A., Roberts, A. W., Huang, D. C. S., Clark, M. B., Ritchie, M. E.

    Abstract:
    Alternative splicing shapes the phenotype of cells in development and disease. Long-read RNA-sequencing recovers full-length transcripts but has limited throughput at the single-cell level. Here we developed single-cell full-length transcript sequencing by sampling (FLT-seq), together with the computational pipeline FLAMES to overcome these issues and perform isoform discovery and quantification, splicing analysis and mutation detection in single cells. With FLT-seq and FLAMES, we performed the first comprehensive characterization of the full-length isoform landscape in single cells of different types and species and identified thousands of unannotated isoforms. We found conserved functional modules that were enriched for alternative transcript usage in different cell populations, including ribosome biogenesis and mRNA splicing. Analysis at the transcript-level allowed data integration with scATAC-seq on individual promoters, improved correlation with protein expression data and linked mutations known to confer drug resistance to transcriptome heterogeneity. Our methods reveal previously unseen isoform complexity and provide a better framework for multi-omics data integration.

    Copy rights belong to original authors. Visit the link for more info

  • Link to bioRxiv paper:
    http://biorxiv.org/cgi/content/short/2020.08.09.243402v1?rss=1

    Authors: Patel, S., Howard, D., French, L.

    Abstract:
    Porphyromonas gingivalis, a keystone species in the development of periodontal disease, is a suspected cause of Alzheimer's disease. This bacterium is reliant on gingipain proteases, which cleave host proteins after arginine and lysine residues. To characterize gingipain susceptibility, we performed enrichment analyses of arginine and lysine proportion proteome-wide. Proteins in the SRP-dependent cotranslational protein targeting to membrane pathway were enriched for these residues and previously associated with periodontal and Alzheimer's disease. These ribosomal genes are up-regulated in prefrontal cortex samples with detected P. gingivalis sequences. Other differentially expressed genes have been previously associated with dementia (ITM2B, MAPT, ZNF267, and DHX37). For an anatomical perspective, we characterized the expression of the P. gingivalis associated genes in the mouse and human brain. This analysis highlighted the hypothalamus, cholinergic neurons, and the basal forebrain. Our results suggest markers of neural P. gingivalis infection and link the gingipain and cholinergic hypotheses of Alzheimer's disease.

    Copy rights belong to original authors. Visit the link for more info