Personal Information
- Name: Dr. sc. Armin Töpfer
- Date of birth: 24 May 1986
- Nationality: Citizen of Germany
- Address: Gelsenkirchen, Germany
- Languages: German and English
- Github: armintoepfer
- Email: armin.toepfer@gmail.com
Professional Profile
Senior Director Instrument Analysis and Senior Principal Engineer with a Doctor of Sciences ETH Zürich in computational biology, specialized in statistical modeling and analysis of long-read sequencing technologies. I combine academic curiosity and industry standards to develop bleeding-edge, yet robust and carefully tested, software for everyday use. Expert in end-to-end genomic data analysis, from biological sample to variant call. Proven ability to design and maintain large and complex software, including productization of deep learning and GPU acceleration. Highly experienced in scientific computing and real-time processing of large data sets on bare metal Linux servers, high-performance computing clusters, and cloud infrastructure. Ten years of full-time remote working experience. Managing teams for seven years with nine direct reports, reporting to the VP of software and advising executive leadership.
Research Interest
Algorithm Design — C++ Development — High-Performance Computing — Statistical Modelling — Population Structure Reconstruction — NGS / Long-Read Sequence Analysis — Bare-metal Code Optimization — Data Visualization
Education
October 2011 - August 2014
Swiss Federal Institute of Technology Zürich (ETH)
Dr. sc. ETH ZürichThesis topic: Studies in viral quasispecies reconstruction
October 2009 - July 2011
University of Bielefeld
M.Sc. in Bioinformatics and Genome ResearchThesis topic: Prediction of Group I Introns under structure variation
October 2006 - March 2009
University of Bielefeld
B.Sc. in Bioinformatics and Genome ResearchThesis topic: Ideas for and implementation of an automated statistical data analysis
Work Experience
March 2024 - now
Senior Director, Instrument Analysis
Pacific Biosciences, Menlo Park, CA, USA; 100% Remote, Germany
Lead the long read analysis team with nine direct reports that develop all on-instrument analysis software.- Architect data processing from photon counts to highly accurate long reads
- Successful long read platform launches, scaling up existing software to process hundreds of millions to billions HiFi reads at once.
- Added CI/CD to build 100% reproducible and well-tested software.
- Integrated and productized deep learning solutions using ONNX runtime.
- Optimized CPU and GPU code to increase throughput and reduce COGS.
- Spearheaded open-source and community software distribution pbbioconda.
- Plan, orchestrate, and execute cross-team interdisciplinary projects.
- Point of contact for major collaborations with Nvidia and Google.
- Clear communication to executive management.
- Drive architectural decisions and select hardware for on-instrument compute.
February 2023 - February 2024
Director, Platform Bioinformatics
Pacific Biosciences, Menlo Park, CA, USA; 100% Remote, Germany
Lead an expanded team responsible to develop all of platform bioinformatics analysis software.- Develop all bioinformatics production software solutions, build them 100% reproducible, and deploy on- and off-instrument.
- Develop novel algorithms on GPUs using CUDA.
- Integrate, productise, and optimise deep learning solutions.
- Close collaborations with external partners to shape company objectives.
- Scale up existing software to process hundreds of millions to billions HiFi reads at once.
Juli 2021 - January 2023
Principal Engineer & Associate Director, Bioinformatics
Pacific Biosciences, Menlo Park, CA, USA; 100% Remote, Germany
Lead a team responsible to develop on-instrument bioinformatics analysis software. Continue as an individual contributor. Our focus is the generation of long and accurate HiFi reads in near real-time.- Develop bioinformatics production software solutions, build them 100% reproducible, and deploy on- and off-instrument.
- Port existing and develop new algorithms on GPUs using CUDA.
- Integrate and productise deep learning solutions.
- Close collaborations with external partners to shape company objectives.
- Drive architectural decisions and select hardware for on-instrument compute solutions.
September 2020 - June 2021
Principal Engineer, Bioinformatics
Pacific Biosciences, Menlo Park, CA, USA; 100% Remote, Germany
Shape company objectives by developing next-gen products.- Evaluate and develop on next-gen hardware architectures, incl. ARM, RISC-V, and GPGPU.
- Design and implement on- and off-instrument software.
- Continue enhancing customer-facing tools
- Lead and mentor individual members of the team.
August 2018 - August 2020
Senior Staff Engineer, Bioinformatics
Pacific Biosciences, Menlo Park, CA, USA; 100% Remote, Germany
Technical leader, mentor individual members of the team. Work independently in cross-functional teams to lead development of new products or execution of time critical analyses. Shape product roadmap from a technical perspective, by delivering software architectures, design specifications, and implementations. Consultant to senior management in long-range company planning. Plan Agile scrum sprints and epics for my team and products. Identify and fix performance bottlenecks in existing products to handle the ever-increasing throughput of sequencing platforms. Enable savvy bioinformaticians to use command-line tools in the cloud or locally, by maintaining bioconda packages on pbbioconda. Carefully maintain extensive, yet visually appealing customer-facing documentation.Additionally responsible for following new products:
- Iso-Seq: Reference free clustering of full-length transcriptional sequencing data to annotate de-novo genome assemblies with a focus on scalability, reducing runtime by a factor up to 100x, while increasing accuracy over existing solutions.
- Structural Variation: Fast, accurate, and reliable discovery of structural variations from low-coverage, cohort sequencing data. Best in class algorithm with highest recall and precision.
- Consensus: Lead initiatives to massively reduce time to result for our CCS algorithm that generates HiFi reads.
October 2015 - July 2018
Staff Engineer, Bioinformatics
Pacific Biosciences, Menlo Park, CA, USA; 100% Remote, Germany
Individual contributor, developing of bleeding edge statistical algorithms in C++14.Core projects:
- Polyploid Consensus: Enhance quality of individual single-molecule reads and genomic regions by polyploid-enabled polishing.
- Minor Variants: Full-stack product development to reconstruct co-occurring minor variants in heterogeneous samples from single-molecule sequencing data, tailored to personalized medicine applications.
- Demultiplexing: Reliable demultiplexing of hi-plex barcoded samples with focus on UX and quality control. Establish internal end-of-line QC pipeline to automatically test purity of barcode oligos before distribution.
- Mapping and Alignment: A frontend for minimap2, state-of-the-art mapper and aligner, for PacBio native data. formats.
September 2014 - September 2015
Senior Engineer, Bioinformatics
Pacific Biosciences, Menlo Park, CA, USA; 100% Remote, Germany
Hardware-near C++11 development on x86_64 and MIC architectures on the Sequel instrument to enhance the base-call accuracy and hotspot parallelization and vectorization to enable real-time base calling. Design and implementation of custom binary formats to store high-throughput real-time data. Post process raw base-call data to provide customer-friendly BAM files, including SMRTbell adapters removal, demultiplexing, and spike-in control filtering.July 2011 - August 2014
Graduate research assistant
Computational Biology Group, ETH Zürich, Basel
Development of statistical methods and machine learning approaches for viral quasispecies assembly from next-generation and single-molecule sequencing data. Application to intra-host samples of HBV, HCV, CSFV, and HIV-1 infected individuals. Collaboration with teams across the globe, from Switzerland to Australia.October 2009 - May 2011
Research assistant
Bielefeld University Bioinformatics Service, University of Bielefeld, Germany
Continuation as Java developer at the BiBiServ2 project. Migrating architecture from JavaServer Faces (JSF) 1.2 to JSF 2. Introduction of PrimeFaces as the main component suite.September 2009 - May 2011
Programmer
High performance computing laboratory, Bergen Center for Computational Science, UNIFOB AS, Bergen, Norway
Development of a parsing library for Web Services Description Language and XML Schema files, resolving complete XML Schema structures. Project is funded by the EMBRACE Network of Excellence coordinated by EBI. Further development of a Business Process Execution Language (BPEL) editor to construct complex workflows using the NetBeans Platform.November 2009
Professional Trainer
International Institute (Training, Assessment, Certification), Cognitive Core UG, Germany
Binary auditing trainer for developers at Symantec India. Full practical online workshop on analysis and exploitation of stack based buffer overflows and reverse code engineering of copy protections.April 2009 - August 2009
Bioinformatics internship
Computational Biology Unit, BCCS, UNIFOB AS, Bergen, Norway
Implementation of a fully functional web-based BPEL editor to construct and execute simple linear workflows. Working as a team member on the eSysbio project, funded by the Research Council of Norway through its e-science program eVita.October 2008 - May 2009
Research assistant
Bielefeld University Bioinformatics Service, University of Bielefeld, Germany
Development of automatically generated web surfaces with JSF for the new BiBiServ2 project.August 2008 - September 2008
Freelance JSF developer
Teamkollegen.de, Bielefeld, Germany
Responsible for the web development of an AJAX based JSF frontend, i.e., a messaging system and user friendly search interface. A project supported by the Heinz Nixdorf Foundation and the Foundation of the German Economy (sdw).Talks
Long-Read Sequencing Meeting - Long Accurate Reads – Call All Variants with Confidence - Uppsala, Sweden, 2019
SMRT Leiden - Many-to-One-to-Many: Pooling and Demultiplexing - Leiden, Netherlands, 2018
SMRT Leiden - Calling all variants: fast, accurate, population-scale structural variant analysis - Leiden, Netherlands, 2018
SMRT Leiden - Juliet - One Click Minor Variant Calling - Leiden, Netherlands, 2017
4th ICCABS (IEEE) - Viral quasispecies assembly from paired-end reads - Miami, USA, 2014
RECOMB 2014 - Viral quasispecies assembly via maximal clique enumeration - Pittsburgh, USA, 2014
SMIDDY - Global haplotype prediction of HIV-1 - Zürich, Switzerland, 2013
3rd ICCABS (IEEE) - Probing of viral diversity by global haplotype prediction - New Orleans, USA, 2013
2nd CHAIN NGS Meeting - Computational and Statistical Challenges of Ultradeep Sequencing of Viral Quasispecies - Rome, Italy, 2013
Virus goes Bioinformatics - Estimating viral genetic diversity from next-generation sequencing data - Jena, Germany, 2012
RECOMB 2012 - Probabilistic inference of viral quasispecies subject to recombination - Barcelona, Spain, 2012
RUBIES - Amsterdam, Netherlands, 2011
Nil-University - Cairo, Egypt, 2010
EMBRACE - Amsterdam, Netherlands, 2009
Posters
CROI 2015 - A Comprehensive Analysis of PrimerIDs to Study Heterogenous HIV-1 Populations - Seattle, USA, 2015
CROI 2014 - Full-length HIV-1 Haplotype Reconstruction from Heterogeneous Virus Populations - Boston, USA, 2014
Statistical Genomics and Data Integration for Personalized Medicine Ascona - Probing of viral diversity by global haplotype prediction - Switzerland, 2013
SIB Days 2103 - Visualization of viral populations - Biel Switzerland, 2013
ECCB 2012 - QuasiRecomb: prediction of recombinant viral quasispecies - Basel, Switzerland, 2012
SIB Days 2012 - Probabilistic inference of viral quasispecies subject to recombination - Biel, Switzerland, 2012
Publications
2021
DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer
Gunjan Baid, Daniel E. Cook, Kishwar Shafin, Taedong Yun, Felipe Llinares-López, Quentin Berthet, Aaron M. Wenger, William J. Rowell, Maria Nattestad, Howard Yang, Alexey Kolesnikov, Armin Töpfer, Waleed Ammar, Jean-Philippe Vert, Ashish Vaswani, Cory Y. McLean, Pi-Chuan Chang, Andrew Carrolldoi.org/10.1038/s41587-022-01435-7
Nature Biotechnology
2019
Highly-accurate long-read sequencing improves variant detection and assembly of a human genome.
Aaron M. Wenger, Paul Peluso, William J. Rowell, Pi-Chuan Chang, Richard J. Hall, Gregory T. Concepcion, Jana Ebler, Arkarachai Fungtammasan, Alexey Kolesnikov, Nathan D. Olson, Armin Töpfer, Chen-Shan Chin, Michael Alonge, Medhat Mahmoud, Yufeng Qian, Adam M. Phillippy, Michael C. Schatz, Gene Myers, Mark A. DePristo, Jue Ruan, Tobias Marschall, Fritz J. Sedlazeck, Justin M. Zook, Heng Li, Sergey Koren, Andrew Carroll, David R. Rank, Michael W. Hunkapiller10.1038/s41587-019-0217-9
Nature Biotechnology.
2016
Single-molecule sequencing reveals complex genomic variation of hepatitis B virus during 15 years of chronic infection following liver transplantation.
Brigid Betz-Stablein, Armin Töpfer, M Littlejohn, L Yuen, D Colledge, V Sozzi, P Angus, A Thompson, P Revill, Niko Beerenwinkel, N Warner, Fabio Luciani10.1186/s12864-016-2575-8
BMC genomics.
2016
A method for near full-length amplification and sequencing for six hepatitis C virus genotypes.
Rowena A Bull, Auda A Eltahla, Chaturaka Rodrigo, Sylvie M Koekkoek, Melanie Walker, Mehdi R Pirozyan, Brigid Betz-Stablein, Armin Töpfer, Melissa Laird, Steve Oh, Cheryl Heiner, Lisa Maher, Janke Schinkel, Andrew R Lloyd, Fabio Luciani10.1128/JVI.00243-16
Journal of Virology.
2015
A Comprehensive Analysis of Primer IDs to Study Heterogeneous HIV-1 Populations.
David Seifert, Francesca Di Giallonardo, Armin Töpfer, Jochen Singer, Stefan Schmutz, Huldrych F. Günthard, Niko Beerenwinkel, Karin J. Metzner10.1016/j.jmb.2015.12.012
Journal of Molecular Biology.
2014
Full-length haplotype reconstruction to infer the structure of heterogeneous virus populations.
Francesca Di Giallonardo*, Armin Töpfer*, Melanie Rey, Sandhya Prabhakaran, Yannick Duport, Christine Leemann, Stefan Schmutz, Nottania K. Campbell, Beda Joos, Maria Rita Lecca, Andrea Patrignani, Martin Däumer, Christian Beisel, Peter Rusert, Alexandra Trkola, Huldrych F. Günthard, Volker Roth, Niko Beerenwinkel, and Karin J. Metzner.10.1093/nar/gku537
Nucleic Acids Research.
2014
Viral Quasispecies Assembly via Maximal Clique Enumeration.
Armin Töpfer, Tobias Marschall, Rowena A. Bull, Fabio Luciani, Alexander Schönhuth, and Niko Beerenwinkel.10.1371/journal.pcbi.1003515
PLOS Computational Biology.
Abstract appears in R. Sharan, RECOMB 2014 - Research in Computational Molecular Biology, volume 8394 of Lecture Notes in Bioinformatics, pages 309–310. Springer, 2014.10.1007/978-3-319-05269-4_25
2013
Challenges in RNA Virus Bioinformatics.
Manja Marz, Niko Beerenwinkel, Christian Drosten, Markus Fricke, Dmitrij Frishman, Ivo Hofacker, Dieter Hoffmann, Thomas Rattei, Peter Stadler, and Armin Töpfer .10.1093/bioinformatics/btu105
Bioinformatics.
2013
Sequencing approach to analyze the role of quasispecies for classical swine fever.
Armin Töpfer, Dirk Höper, Sandra Blome, Martin Beer, Niko Beerenwinkel, Nicolas Ruggli, and Immanuel Leifer.10.1016/j.virol.2012.11.020
Virology.
2013
Probabilistic inference of viral quasispecies subject to recombination.
Armin Töpfer, Osvaldo Zagordi, Sandhya Prabhakaran, Volker Roth, Eran Halperin, and Niko Beerenwinkel.10.1089/cmb.2012.0232
Journal of Computational Biology.
Extended abstract appeared in B. Chor, editor, RECOMB 2012 – Research in Computational Molecular Biology, volume 7262 of Lecture Notes in Bioinformatics, pages 342–354. Springer, 2012.10.1007/978-3-642-29627-7_36
2010
BioXSD: the common data-exchange format for everyday bioinformatics web services.
Kalas M., Puntervoll P., Joseph A., Bartaseviciute E., Töpfer A. , Venkataraman P., Pettifer S., Bryne J.C., Ison J., Blanchet C., Rapacki K., and Jonassen I.10.1093/bioinformatics/btq391
Bioinformatics.
Software
pbsv - Fast, accurate, population-scale structural variant analysis from single-molecule data.
Iso-Seq - Scalable de novo isoform discovery from single-molecule data.
CCS - Generate accurate consensus sequences from single molecules.
Juliet - Reference guided phasing of low-frequency de-novo discovered variants in heterogeneous samples.
Lima - Demultiplex pooled barcoded single-molecule data.
HaploClique - Viral quasispecies assembly from paired-end data.
QuasiRecomb - Reconstruction of recombinant viral quasispecies structures.
InDelFixer - Iterative and very sensitive NGS sequence alignment software.
ConsensusFixer - Consensus sequence caller with ambiguous bases and in-frame insertions.
Awards
- Innovation Award – SMRT Masking, PacBio, 2024
- Best method – GPU compute advances, PacBio, 2023
- Young investigator scholarship, CROI, 2014
- Best poster award, SIB Days, 2013
- Conference fellowship, RECOMB, 2012
- Studentship for foreign internships, ERASMUS, 2009
Master students adviced
- Monica-Andreea Drăgan, Research assistant,
2014
Minimal path cover with paired-end constraints. - Kee Pang Soh, Lab-rotation, 2014
Error correction of Pacific Biosciences data. - Veronika Boskova, Lab-rotation,
2013
Visualization of HIV quasispecies data. - David Seifert, Master thesis,
2012
Computational studies in HIV diversity.
Technical skills
C++20 • CUDA • Boost • GTest
meson • CMake • ninja
R • Data Visualization
Deep Learning • ONNX Runtime
Bash • SQL • Assembly • Web
Bamboo • Docker • Bootstrapping
C++ toolchains • Multiplatform releases
Agile • Jira • CI • CD
Illustrator • vtune • IDA Pro • LaTeX