Principal software engineer with a Doctor of Sciences ETH Zürich in computational biology, specialized in statistical modelling and analysis of long-read sequencing technologies. I combine academic curiosity and industry standards to develop bleeding-edge, yet robust and carefully tested, software for everyday use. Expert in end-to-end genomic data analysis, from biological sample to variant call. Proven ability to design and maintain large and complex software, including productization of deep learning. Also highly experienced in scientific computing and real-time processing of large data sets with High-Performance Computing clusters and cloud infrastructure. Eight years of full-time remote working experience. Managing teams for last five years.
Algorithm Design --- C++ Development --- High-Performance Computing --- Statistical Modelling --- Viral Genomics --- Population Structure Reconstruction --- NGS / Long-Read Sequence Analysis --- Bare-metal Code Optimization --- Data Visualization
October 2011 - August 2014
Swiss Federal Institute of Technology Zürich (ETH)Dr. sc. ETH Zürich
Thesis topic: Studies in viral quasispecies reconstruction
October 2009 - July 2011
University of BielefeldM.Sc. in Bioinformatics and Genome Research
Thesis topic: Prediction of Group I Introns under structure variation
October 2006 - March 2009
University of BielefeldB.Sc. in Bioinformatics and Genome Research
Thesis topic: Ideas for and implementation of an automated statistical data analysis
February 2023 - now
Director, Platform Bioinformatics
Pacific Biosciences, Menlo Park, CA, USA; 100% Remote, GermanyLead an expanded team responsible to develop all of platform bioinformatics analysis software.
- Develop all bioinformatics production software solutions, build them 100% reproducible, and deploy on- and off-instrument.
- Develop novel algorithms on GPUs using CUDA.
- Integrate, productise, and optimise deep learning solutions.
- Close collaborations with external partners to shape company objectives.
- Scale up existing software to process hundreds of millions to billions HiFi reads at once.
Juli 2021 - January 2023
Principal Engineer & Associate Director, Bioinformatics
Pacific Biosciences, Menlo Park, CA, USA; 100% Remote, GermanyLead a team responsible to develop on-instrument bioinformatics analysis software. Continue as an individual contributor. Our focus is the generation of long and accurate HiFi reads in near real-time.
- Develop bioinformatics production software solutions, build them 100% reproducible, and deploy on- and off-instrument.
- Port existing and develop new algorithms on GPUs using CUDA.
- Integrate and productise deep learning solutions.
- Close collaborations with external partners to shape company objectives.
- Drive architectural decisions and select hardware for on-instrument compute solutions.
September 2020 - June 2021
Principal Engineer, Bioinformatics
Pacific Biosciences, Menlo Park, CA, USA; 100% Remote, GermanyShape company objectives by developing next-gen products.
- Evaluate and develop on next-gen hardware architectures, incl. ARM, RISC-V, and GPGPU.
- Design and implement on- and off-instrument software.
- Continue enhancing customer-facing tools
- Lead and mentor individual members of the team.
August 2018 - August 2020
Senior Staff Engineer, Bioinformatics
Pacific Biosciences, Menlo Park, CA, USA; 100% Remote, GermanyTechnical leader, mentor individual members of the team. Work independently in cross-functional teams to lead development of new products or execution of time critical analyses. Shape product roadmap from a technical perspective, by delivering software architectures, design specifications, and implementations. Consultant to senior management in long-range company planning. Plan Agile scrum sprints and epics for my team and products. Identify and fix performance bottlenecks in existing products to handle the ever-increasing throughput of sequencing platforms. Enable savvy bioinformaticians to use command-line tools in the cloud or locally, by maintaining bioconda packages on pbbioconda. Carefully maintain extensive, yet visually appealing customer-facing documentation.
Additionally responsible for following new products:
- Iso-Seq: Reference free clustering of full-length transcriptional sequencing data to annotate de-novo genome assemblies with a focus on scalability, reducing runtime by a factor up to 100x, while increasing accuracy over existing solutions.
- Structural Variation: Fast, accurate, and reliable discovery of structural variations from low-coverage, cohort sequencing data. Best in class algorithm with highest recall and precision.
- Consensus: Lead initiatives to massively reduce time to result for our CCS algorithm that generates HiFi reads.
October 2015 - July 2018
Staff Engineer, Bioinformatics
Pacific Biosciences, Menlo Park, CA, USA; 100% Remote, GermanyIndividual contributor, developing of bleeding edge statistical algorithms in C++14.
- Polyploid Consensus: Enhance quality of individual single-molecule reads and genomic regions by polyploid-enabled polishing.
- Minor Variants: Full-stack product development to reconstruct co-occurring minor variants in heterogeneous samples from single-molecule sequencing data, tailored to personalized medicine applications.
- Demultiplexing: Reliable demultiplexing of hi-plex barcoded samples with focus on UX and quality control. Establish internal end-of-line QC pipeline to automatically test purity of barcode oligos before distribution.
- Mapping and Alignment: A frontend for minimap2, state-of-the-art mapper and aligner, for PacBio native data. formats.
September 2014 - September 2015
Senior Engineer, Bioinformatics
Pacific Biosciences, Menlo Park, CA, USA; 100% Remote, GermanyHardware-near C++11 development on x86_64 and MIC architectures on the Sequel instrument to enhance the base-call accuracy and hotspot parallelization and vectorization to enable real-time base calling. Design and implementation of custom binary formats to store high-throughput real-time data. Post process raw base-call data to provide customer-friendly BAM files, including SMRTbell adapters removal, demultiplexing, and spike-in control filtering.
July 2011 - August 2014
Graduate research assistant
Computational Biology Group, ETH Zürich, BaselDevelopment of statistical methods and machine learning approaches for viral quasispecies assembly from next-generation and single-molecule sequencing data. Application to intra-host samples of HBV, HCV, CSFV, and HIV-1 infected individuals. Collaboration with teams across the globe, from Switzerland to Australia.
October 2009 - May 2011
Bielefeld University Bioinformatics Service, University of Bielefeld, GermanyContinuation as Java developer at the BiBiServ2 project. Migrating architecture from JavaServer Faces (JSF) 1.2 to JSF 2. Introduction of PrimeFaces as the main component suite.
September 2009 - May 2011
High performance computing laboratory, Bergen Center for Computational Science, UNIFOB AS, Bergen, NorwayDevelopment of a parsing library for Web Services Description Language and XML Schema files, resolving complete XML Schema structures. Project is funded by the EMBRACE Network of Excellence coordinated by EBI. Further development of a Business Process Execution Language (BPEL) editor to construct complex workflows using the NetBeans Platform.
International Institute (Training, Assessment, Certification), Cognitive Core UG, GermanyBinary auditing trainer for developers at Symantec India. Full practical online workshop on analysis and exploitation of stack based buffer overflows and reverse code engineering of copy protections.
April 2009 - August 2009
Computational Biology Unit, BCCS, UNIFOB AS, Bergen, NorwayImplementation of a fully functional web-based BPEL editor to construct and execute simple linear workflows. Working as a team member on the eSysbio project, funded by the Research Council of Norway through its e-science program eVita.
October 2008 - May 2009
Bielefeld University Bioinformatics Service, University of Bielefeld, GermanyDevelopment of automatically generated web surfaces with JSF for the new BiBiServ2 project.
August 2008 - September 2008
Freelance JSF developer
Teamkollegen.de, Bielefeld, GermanyResponsible for the web development of an AJAX based JSF frontend, i.e., a messaging system and user friendly search interface. A project supported by the Heinz Nixdorf Foundation and the Foundation of the German Economy (sdw).
Long-Read Sequencing Meeting - Long Accurate Reads – Call All Variants with Confidence - Uppsala, Sweden, 2019
SMRT Leiden - Many-to-One-to-Many: Pooling and Demultiplexing - Leiden, Netherlands, 2018
SMRT Leiden - Calling all variants: fast, accurate, population-scale structural variant analysis - Leiden, Netherlands, 2018
SMRT Leiden - Juliet - One Click Minor Variant Calling - Leiden, Netherlands, 2017
4th ICCABS (IEEE) - Viral quasispecies assembly from paired-end reads - Miami, USA, 2014
RECOMB 2014 - Viral quasispecies assembly via maximal clique enumeration - Pittsburgh, USA, 2014
SMIDDY - Global haplotype prediction of HIV-1 - Zürich, Switzerland, 2013
3rd ICCABS (IEEE) - Probing of viral diversity by global haplotype prediction - New Orleans, USA, 2013
2nd CHAIN NGS Meeting - Computational and Statistical Challenges of Ultradeep Sequencing of Viral Quasispecies - Rome, Italy, 2013
Virus goes Bioinformatics - Estimating viral genetic diversity from next-generation sequencing data - Jena, Germany, 2012
RECOMB 2012 - Probabilistic inference of viral quasispecies subject to recombination - Barcelona, Spain, 2012
RUBIES - Amsterdam, Netherlands, 2011
Nil-University - Cairo, Egypt, 2010
EMBRACE - Amsterdam, Netherlands, 2009
CROI 2015 - A Comprehensive Analysis of PrimerIDs to Study Heterogenous HIV-1 Populations - Seattle, USA, 2015
CROI 2014 - Full-length HIV-1 Haplotype Reconstruction from Heterogeneous Virus Populations - Boston, USA, 2014
Statistical Genomics and Data Integration for Personalized Medicine Ascona - Probing of viral diversity by global haplotype prediction - Switzerland, 2013
SIB Days 2103 - Visualization of viral populations - Biel Switzerland, 2013
ECCB 2012 - QuasiRecomb: prediction of recombinant viral quasispecies - Basel, Switzerland, 2012
SIB Days 2012 - Probabilistic inference of viral quasispecies subject to recombination - Biel, Switzerland, 2012
DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformerGunjan Baid, Daniel E. Cook, Kishwar Shafin, Taedong Yun, Felipe Llinares-López, Quentin Berthet, Aaron M. Wenger, William J. Rowell, Maria Nattestad, Howard Yang, Alexey Kolesnikov, Armin Töpfer, Waleed Ammar, Jean-Philippe Vert, Ashish Vaswani, Cory Y. McLean, Pi-Chuan Chang, Andrew Carroll
Highly-accurate long-read sequencing improves variant detection and assembly of a human genome.Aaron M. Wenger, Paul Peluso, William J. Rowell, Pi-Chuan Chang, Richard J. Hall, Gregory T. Concepcion, Jana Ebler, Arkarachai Fungtammasan, Alexey Kolesnikov, Nathan D. Olson, Armin Töpfer, Chen-Shan Chin, Michael Alonge, Medhat Mahmoud, Yufeng Qian, Adam M. Phillippy, Michael C. Schatz, Gene Myers, Mark A. DePristo, Jue Ruan, Tobias Marschall, Fritz J. Sedlazeck, Justin M. Zook, Heng Li, Sergey Koren, Andrew Carroll, David R. Rank, Michael W. Hunkapiller
Single-molecule sequencing reveals complex genomic variation of hepatitis B virus during 15 years of chronic infection following liver transplantation.Brigid Betz-Stablein, Armin Töpfer, M Littlejohn, L Yuen, D Colledge, V Sozzi, P Angus, A Thompson, P Revill, Niko Beerenwinkel, N Warner, Fabio Luciani
A method for near full-length amplification and sequencing for six hepatitis C virus genotypes.Rowena A Bull, Auda A Eltahla, Chaturaka Rodrigo, Sylvie M Koekkoek, Melanie Walker, Mehdi R Pirozyan, Brigid Betz-Stablein, Armin Töpfer, Melissa Laird, Steve Oh, Cheryl Heiner, Lisa Maher, Janke Schinkel, Andrew R Lloyd, Fabio Luciani
Journal of Virology.
A Comprehensive Analysis of Primer IDs to Study Heterogeneous HIV-1 Populations.David Seifert, Francesca Di Giallonardo, Armin Töpfer, Jochen Singer, Stefan Schmutz, Huldrych F. Günthard, Niko Beerenwinkel, Karin J. Metzner
Journal of Molecular Biology.
Full-length haplotype reconstruction to infer the structure of heterogeneous virus populations.Francesca Di Giallonardo*, Armin Töpfer*, Melanie Rey, Sandhya Prabhakaran, Yannick Duport, Christine Leemann, Stefan Schmutz, Nottania K. Campbell, Beda Joos, Maria Rita Lecca, Andrea Patrignani, Martin Däumer, Christian Beisel, Peter Rusert, Alexandra Trkola, Huldrych F. Günthard, Volker Roth, Niko Beerenwinkel, and Karin J. Metzner.
Nucleic Acids Research.
Viral Quasispecies Assembly via Maximal Clique Enumeration.Armin Töpfer, Tobias Marschall, Rowena A. Bull, Fabio Luciani, Alexander Schönhuth, and Niko Beerenwinkel.
PLOS Computational Biology.Abstract appears in R. Sharan, RECOMB 2014 - Research in Computational Molecular Biology, volume 8394 of Lecture Notes in Bioinformatics, pages 309–310. Springer, 2014.
Challenges in RNA Virus Bioinformatics.Manja Marz, Niko Beerenwinkel, Christian Drosten, Markus Fricke, Dmitrij Frishman, Ivo Hofacker, Dieter Hoffmann, Thomas Rattei, Peter Stadler, and Armin Töpfer .
Sequencing approach to analyze the role of quasispecies for classical swine fever.Armin Töpfer, Dirk Höper, Sandra Blome, Martin Beer, Niko Beerenwinkel, Nicolas Ruggli, and Immanuel Leifer.
Probabilistic inference of viral quasispecies subject to recombination.Armin Töpfer, Osvaldo Zagordi, Sandhya Prabhakaran, Volker Roth, Eran Halperin, and Niko Beerenwinkel.
Journal of Computational Biology.Extended abstract appeared in B. Chor, editor, RECOMB 2012 – Research in Computational Molecular Biology, volume 7262 of Lecture Notes in Bioinformatics, pages 342–354. Springer, 2012.
BioXSD: the common data-exchange format for everyday bioinformatics web services.Kalas M., Puntervoll P., Joseph A., Bartaseviciute E., Töpfer A. , Venkataraman P., Pettifer S., Bryne J.C., Ison J., Blanchet C., Rapacki K., and Jonassen I.
pbsv - Fast, accurate, population-scale structural variant analysis from single-molecule data.
IsoSeq3 - Scalable de novo isoform discovery from single-molecule data.
CCS - Generate accurate consensus sequences from single molecules.
Juliet - Reference guided phasing of low-frequency de-novo discovered variants in heterogeneous samples.
Lima - Demultiplex pooled barcoded single-molecule data.
HaploClique - Viral quasispecies assembly from paired-end data.
QuasiRecomb - Reconstruction of recombinant viral quasispecies structures.
InDelFixer - Iterative and very sensitive NGS sequence alignment software.
ConsensusFixer - Consensus sequence caller with ambiguous bases and in-frame insertions.
- Young investigator scholarship, CROI, 2014
- Best poster award, SIB Days, 2013
- Conference fellowship, RECOMB, 2012
- Studentship for foreign internships, ERASMUS, 2009
Master students adviced
- Monica-Andreea Drăgan, Research assistant, 2014
Minimal path cover with paired-end constraints.
- Kee Pang Soh, Lab-rotation, 2014
Error correction of Pacific Biosciences data.
- Veronika Boskova, Lab-rotation, 2013
Visualization of HIV quasispecies data.
- David Seifert, Master thesis, 2012
Computational studies in HIV diversity.
C++20, SIMD, R, Bash, CUDA, ML, Haskell, SQL, Assembly
VSCode, Docker, Photoshop, Illustrator, Latex, IDA Pro, UNIX CLI
Git, CI/CD, TDD, Scrum/Kanban
- Web design, photography, and painting
- Learning bleeding edge technologies
- Cinema and TV series