Bioinformatics

The Bioinformatics Unit develops and establishes computer-aided methods for the identification and verification of new biomarkers for personalized diagnosis and prognosis of diseases as well as for the detection of novel therapeutic targets.

It has only been known for a few years that a multitude of RNA molecules are not translated into proteins. The latest scientific findings show that those non-coding RNAs (ncRNAs) perform fine-regulatory tasks in gene regulation and are therefore suitable as markers for individual disease stages as well as for disease progression. The unit develops strategies for the efficient processing and (statistical) analysis of molecular biological data that has been gained from extensive clinical cohorts based on next-generation sequencing, microarrays, as well as by DNA-, RNA-, and epigenetic analytics in order to detect disease relevant ncRNAs. The gene regulatory mechanisms of ncRNAs are modeled by using methods from systems biology and RNA computational biology.

Our objective is to analyze the potential of these innovative RNA molecules as biomarkers or therapeutic targets and to establish them as appropriate clinical markers or targets.

  • Analysis of next-generation sequencing data by means of a specially developed pipeline for structured and documented processing and evaluation
  • RNA computational biology
  • Design of custom expression microarrays
  • Analysis of expression microarray data
  • Statistical learning methods for detecting biomarkers
  • Systems biology to uncover the gene regulatory mechanism of long non-protein coding RNAs

RNA biomarker discovery

The Bioinformatics Unit is a member of RIBOLUTION – an integrated platform for the identification and validation of innovative RNA-based biological markers for personalized medicine – a research association supported by the Fraunhofer-Zukunftsstiftung (Fraunhofer Future Foundation). We detect and establish RNA-based biological markers that are suitable as reliable indicators for a disease or its course. In this context, we are responsible for the storage, computer-aided processing and statistical analysis of the molecular-biological high-throughput data obtained by state-of-the-art measurement methods. The processes we implement cover the entire data life cycle in the biological marker discovery field, beginning with data creation, through primary and secondary analysis, up to medical knowledge generation. All software solutions have been implemented taking standards of quality managemnt into consideration. Access to a high-performance computing cluster ensures that computer-intensive solutions which have accrued because of the quantity and variety of data, can be efficiently realized.

Computational RNA biology

It has been known for a number of years that RNA molecules not only exclusively convey hereditary information of the DNA into amino acid sequences, but also perform extensive regulatory functions themselves. Non-protein coding RNAs are thereby subdivided into two rough groups, ncRNAs with a nucleotide sequence length of less than 200 nt (short ncRNAs) and the novel long ncRNAs, which have a sequence length of more than 200 nt. The gene regulatory mechanisms of the short ncRNAs, such as miRNAs and snoRNAs, are usually very well explained, while functions are only described exemplarily for the group of long ncRNAs. Studies on individual long ncRNAs have shown that they control central cellular processes such as transcription and translation. Furthermore, they are also involved in sub-cellular localization, in the organization of cellular spatial structures and in the control of epigenetic modifications. We and others were able to show that long ncRNAs in various tissues and signal pathways associated with disease are specifically regulated. Novel therapies based on long ncRNAs could then have specific impact and produce smaller side effects than traditional approaches. With methods from the RNA computational biology and systems biology, such as the prediction, modelling and classification of RNA secondary structure motifs, as well as by evolutionary and transcription studies, we address the topic of which gene regulatory mechanisms control cellular processes by long ncRNAs that have been identified as biomarkers, and to what extent these are suitable as therapeutic targets.

Optimization of the processing and analysis of sequencing data for routine clinical applications

Next-generation sequencing technologies produce genome- or transcriptome-wide data within days. This data is usually processed and analyzed by invoking a variety of bioinformatics software in sequential order. While the time required for data generation is reduced continuously due to enhanced sequencing methods, such optimizations have barely been achieved for data analysis. The effect on clinical routine applications is disadvantageous, because waiting times until therapy decisions are unnecessarily long. Our objective is to optimize the analysis of high-throughput sequencing data, such that it can be applied in clinical routine applications. Our in-house analysis pipeline meets the highest quality criteria because at all times it ensures the availability, integrity, confidentiality and authenticity of the data.

Selected completed projects

Development of custom expression microarrays for an efficient and cost-effective analysis of the tumor-associated expression pattern of long non-coding RNAs. With the aid of the custom expression microarrays, we could show that a multitude of long non-coded RNAs in the mammary carcinoma and glioblastoma are significantly regulated and are therefore suitable as biomarkers.

  • Hackermüller J, Reiche K, Otto C, Hösler N, Blumert C, Brocke-Heidrich K, Böhlig L, Nitsche A, Kasack K, Ahnert P, Krupp W, Engeland K, Stadler PF, Horn F. Cell cycle, oncogenic and tumor suppressor pathways regulate numerous long and macro non-protein-coding RNAs. Genome Biol. 2014 Mar 4;15(3):R48.
  • Reiche K, Kasack K, Schreiber S, Lüders T, Due EU, Naume B, Riis M, Kristensen VN, Horn F, Børresen-Dale AL, Hackermüller J, Baumbusch LO. Long non-coding RNAs differentially expressed between normal versus primary breast tumor tissues disclose converse changes to breast cancer-related protein-coding genes. PLoS One. 2014 Sep 29;9(9):e106076.
  • Arnold C, Externbrink F, Hackermüller J, Reiche K. CEMDesigner: Design of custom expression microarrays in the post-ENCODE Era. Journal of Biotechnology. 2014 Nov 10;189:154-6. DOI dx.doi.org/10.1016/j.jbiotec.2014.09.012.

 

The analysis of transcriptome-wide expression studies showed that non-coding RNAs are not only specifically expressed, but are also to a larger extend than protein-coding genes specifically regulated by disease-relevant signal pathways.

  • Hackermüller J, Reiche K, Otto C, Hösler N, Blumert C, Brocke-Heidrich K, Böhlig L, Nitsche A, Kasack K, Ahnert P, Krupp W, Engeland K, Stadler PF, Horn F. Cell cycle, oncogenic and tumor suppressor pathways regulate numerous long and macro non-protein-coding RNAs. Genome Biol. 2014 Mar 4;15(3):R48.

 

We developed an algorithm (TileShuffle) for the efficient analysis of transcriptome-wide expression data measured by means of tiling arrays. -Using a permutation approach we were able to estimate the background signals more precisely with regard to probe-specific artefacts than other methods. We thus achieve a greater sensitivity at the same specificity.

  • Otto C, Reiche K, Hackermüller J. Detection of differentially expressed segments in tiling array data. Bioinformatics. 2012 Jun 1;28(11):1471-9.

  • High-performance computing cluster

  • Helmholtz Centre for Environmental Research – UFZ, Leipzig
  • Helmholtz Centre for Environmental Research – UFZ, Department of Bioanalytical Ecotoxicology, Leipzig
  • Helmholtz Centre for Environmental Research – UFZ, Department of Molecular Systems Biology, Proteomics, Leipzig
  • Oslo University Hospital, Institute for Cancer Research, Oslo, Norwegen
  • University of Leipzig, Chair for Bioinformatics, Leipzig
  • University of Leipzig / Technical University Dresden, Competence Center for Scalable Data Services and Solutions ScaDS, Leipzig
  • Universitäty Hospital Carl Gustav Carus Dresden, Clinic and Polyclinic for Urology, Dresden
  • University of Oslo, Faculty of Medicine, Institute of Basic Medical Sciences, Oslo, Norwegen
  • Fraunhofer Institute for Interfacial Engineering and Biotechnology, Department of Molecular Biotechnology, Stuttgart

  • Kirsten H, Al-Hasani H, Holdt L, Gross A, Beutner F, Krohn K, Horn K, Ahnert P, Burkhardt R, Reiche K, Hackermüller J, Löffler M, Teupser D, Thiery J, Scholz M. Dissecting the genetics of the human transcriptome identifies novel trait-related trans-eQTLs and corroborates the regulatory relevance of non-protein coding loci. Hum Mol Genet. 2015 May 27.
  • Nitsche A, Rose D, Fasold M, Reiche K, Stadler PF. Comparison of splice sites reveals that long noncoding RNAs are evolutionarily well conserved. RNA. 2015 May;21(5):801-12.
  • Arnold C, Externbrink F, Hackermüller J, Reiche K. CEMDesigner: Design of custom expression microarrays in the post-ENCODE Era. Journal of Biotechnology. 2014 Nov 10;189:154-6. DOI http://dx.doi.org/10.1016/j.jbiotec.2014.09.012
  • Hackermüller J, Reiche K, Otto C, Hösler N, Blumert C, Brocke-Heidrich K, Böhlig L, Nitsche A, Kasack K, Ahnert P, Krupp W, Engeland K, Stadler PF, Horn F. Cell cycle, oncogenic and tumor suppressor pathways regulate numerous long and macro non-protein-coding RNAs. Genome Biol. 2014 Mar 4;15(3):R48.
  • Reiche K, Kasack K, Schreiber S, Lüders T, Due EU, Naume B, Riis M, Kristensen VN, Horn F, Børresen-Dale AL, Hackermüller J, Baumbusch LO. Long non-coding RNAs differentially expressed between normal versus primary breast tumor tissues disclose converse changes to breast cancer-related protein-coding genes. PLoS One. 2014 Sep 29;9(9):e106076.
  • Boll K, Reiche K, Kasack K, Mörbt N, Kretzschmar AK, Tomm JM, Verhaegh G, Schalken J, von Bergen M, Horn F, Hackermüller J. MiR-130a, miR-203 and miR-205 jointly repress key oncogenic pathways and are downregulated in prostatecarcinoma. Oncogene. 2013 Jan 17;32(3):277-85.
  • Otto C, Reiche K, Hackermüller J. Detection of differentially expressed segments in tiling array data. Bioinformatics. 2012 Jun 1;28(11):1471-9.
  • Tramontano A, Donath A, Bernhart SH, Reiche K, Böhmdorfer G, Stadler PF, Bachmair A. Deletion analysis of the 3' long terminal repeat sequence of plant retrotransposon Tto1 identifies 125 base pairs redundancy as sufficient for first strand transfer. Virology. 2011 Mar 30;412(1):75-82.
  • Sharma CM, Hoffmann S, Darfeuille F, Reignier J, Findeiss S, Sittka A, Chabas S, Reiche K, Hackermüller J, Reinhardt R, Stadler PF, Vogel J. The primary transcriptome of the major human pathogen Helicobacter pylori. Nature. 2010 Mar 11;464(7286):250-5.
  • Kaczkowski B, Torarinsson E, Reiche K, Havgaard JH, Stadler PF, Gorodkin J. Structural profiles of human miRNA families from pairwise clustering. Bioinformatics. 2009 Feb 1;25(3):291-4.
  • Rose D, Hertel J, Reiche K, Stadler PF, Hackermüller J. NcDNAlign: plausible multiple alignments of non-protein-coding genomic sequences. Genomics. 2008 Jul;92(1):65-74.
  • Rose D, Jöris J, Hackermüller J, Reiche K, Li Q, Stadler PF. Duplicated RNA genes in teleost fish genomes. J Bioinform Comput Biol. 2008 Dec;6(6):1157-75.
  • Athanasius F Bompfünewerer Consortium, Backofen R, Bernhart SH, Flamm C, Fried C, Fritzsch G, Hackermüller J, Hertel J, Hofacker IL, Missal K, Mosig A, Prohaska SJ, Rose D, Stadler PF, Tanzer A, Washietl S, Will S. RNAs everywhere: genome-wide annotation of structured RNAs. J Exp Zool B Mol Dev Evol. 2007 Jan 15;308(1):1-25.
  • ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007 Jun 14;447(7146):799-816.
  • Reiche K, Stadler PF. RNAstrand: reading direction of structured RNAs in multiple sequence alignments. Algorithms Mol Biol. 2007 May 31;2:6.
  • Rose D, Hackermüller J, Washietl S, Reiche K, Hertel J, Findeiss S, Stadler PF, Prohaska SJ. Computational RNomics of drosophilids. BMC Genomics. 2007 Nov 8;8:406.
  • Snyder M, Gerstein MB, Reymond A, Hofacker IL, Stadler PF. Structured RNAs in the ENCODE selected regions of the human genome. Genome Res. 2007 Jun;17(6):852-64.
  • Washietl S, Pedersen JS, Korbel JO, Stocsits C, Gruber AR, Hackermüller J, Hertel J, Lindemeyer M, Reiche K, Tanzer A, Ucla C, Wyss C, Antonarakis SE, Denoeud F, Lagarde J, Drenkow J, Kapranov P, Gingeras TR, Guigó R, Will S, Reiche K, Hofacker IL, Stadler PF, Backofen R. Inferring noncoding RNA families and classes by means of genome-scale structure-based clustering. PLoS Comput Biol. 2007 Apr 13;3(4):e65.
  • Hertel J, Lindemeyer M, Missal K, Fried C, Tanzer A, Flamm C, Hofacker IL, Stadler PF; Students of Bioinformatics Computer Labs 2004 and 2005. The expansion of the metazoan microRNA repertoire. BMC Genomics. 2006 Feb 15;7:25.
  • Missal K, Cross MA, Drasdo D. Gene network inference from incomplete expression data: transcriptional control of hematopoietic commitment. Bioinformatics. 2006 Mar 15;22(6):731-8.
  • Missal K, Zhu X, Rose D, Deng W, Skogerbø G, Chen R, Stadler PF. Prediction of structured non-coding RNAs in the genomes of the nematodes Caenorhabditis elegans and Caenorhabditis briggsae. J Exp Zool B Mol Dev Evol. 2006 Jul 15;306(4):379-92.
  • Missal K, Rose D, Stadler PF. Non-coding RNAs in Ciona intestinalis. Bioinformatics. 2005 Sep 1;21.
  • Bompfünewerer AF, Flamm C, Fried C, Fritzsch G, Hofacker IL, Lehmann J, Missal K, Mosig A, Müller B, Prohaska SJ, Stadler BM, Stadler PF, Tanzer A, Washietl S, Witwer C. Evolutionary patterns of non-coding RNAs. Theory Biosci. 2005 Apr;123(4):301-69.