Arash Tavakkol
About Me
Arash Tavakkol | Ph.D. in Computer Architecture
With over two decades of experience in the software and systems domain, I have spent roughly half my career in the technology industry and the other half in research. This balanced background enables me to combine the creative, exploratory mindset of research, innovation thinking, out-of-the-box problem solving, and the drive to explore new domains, with the rigor, quality standards, and timely delivery expectations of industry.
I have successfully led software teams in designing, architecting, and delivering cloud-based, distributed systems that are scalable, resilient, and high-performance. My leadership style emphasizes technical excellence, rigorous testing practices, and alignment of architecture with business goals, while fostering the professional growth of team members. I bring extensive hands-on experience with modern cloud platforms, distributed architectures, and orchestration technologies, guiding teams through complex technical challenges from conception to deployment.
A quick learner who thrives on adopting emerging technologies and innovative approaches, I excel at bridging the gap between cutting-edge research and practical, production-ready solutions, delivering systems that are not only technically robust but also strategically impactful.
Education
2016 - 2018
ETH Zurich, Zurich, Switzerland
Postdoc - Systems Group, Department of Computer Science
Research topics: High-Performance and QoS-Aware Memory/Storage Sub-Systems,
RDMA-Based Data Replication in Modern Datacenters, Near-Data Processing
Advisor: Onur Mutlu
2010 - 2015
Sharif University of Technology, Tehran, Iran
Ph.D. in Computer Engineering - Computer Architecture Major
Thesis title: A Scalable and High-Performance Design Architecture for Solid-State Drives Advisor: Hamid Sarbazi-Azad
2005 - 2008
Sharif University of Technology, Tehran, Iran
M.Sc. in Computer Engineering - Computer Architecture Major
Thesis title: Performance of Crossbar-based Interconnection Networks for Multiprocessors
Advisor: Hamid Sarbazi-Azad
2000 - 2005
Sharif University of Technology, Tehran, Iran
B.Sc. in Computer Engineering - Software Engineering Major
Undergraduate final project title: An image watermarking framework using discrete Wavelet Transform to protect image databases against unauthorized modifications Advisor: Shohreh Kasaei
Honors
2018
Best Paper Award, European Network on High Performance and Embedded Architecture and Compilation (HiPEAC)
2016
Third place award in the 18th Iranian National Khwarizmi Youth Festival for my innovations in storage systems, Iran In news (in persian): Official web page, IRNA, ISNA, Mehr News, Hamshahri
2004
Ranked 4th among more than 10,000 applicants in the Iranian Nationwide Graduate School Entrance Exam in Computer Engineering, Iran
2004
Ranked among the top 15 students of the 8th National Scientific Olympiads in Computer Engineering, Iran
2000
Ranked 188th among more than 350,000 applicants in the Iranian Nationwide University Entrance Exams, Iran
Work Experience
2025 - present
Principal Research Engineer @Huawei Zurich Research Center, Zürich, Switzerland
Responsibilities
- As a Principal Researcher, I lead the development of cutting-edge AI solutions with a strong focus on cost optimization, performance enhancement, and resource efficiency. My work spans the full technology stack, with a focus on software and systems, driving innovations that deliver tangible impact in real-world AI deployments.
- I collaborate closely with cross-functional teams, aligning technical strategies with the evolving requirements of AI customers across diverse industries. This involves partnering with multiple stakeholders to define scalable, high-performance AI infrastructures that balance technical ambition with practical constraints. My role combines deep technical research with strategic alignment to push the boundaries of what's possible in AI systems.
Skills & Technologies
Transformer Architectures, LLM Inference Optimization, Distributed Inference, Low-cost Inference Solutions, PyTorch, DeepSpeed-MII, vLLM, Hugging Face Transformers, Model Parallelism & Pipeline Parallelism, Python, C++, Nvidia CUDA
2021 - 2025
Principal Software Engineer @ApplyBoard Inc., Kitchener, Canada
Responsibilities
- Led technical strategy and architecture for modern, cloud-based distributed systems, ensuring scalability, high availability, resilience, and alignment with short-, medium-, and long-term business objectives.
- Integrated cutting-edge AI solutions, from large language models to advanced OCR pipelines, into core products, streamlining document processing workflows, reducing processing time from days to minutes, and delivering substantial cost savings.
- Mentored and coached dozens of developers across multiple software development teams, fostering technical growth, enforcing engineering best practices, and ensuring consistently high software quality.
- Collaborated with cross-functional leadership including heads of operations, customer support, and marketing teams, as well as product managers, to deliver solutions that advanced organizational goals and improved the overall customer experience.
- Influenced company-wide technology direction as part of the Technology Strategy and Standards Council, defining technical standards and guiding innovation for over 200 engineers and product team members.
Skills & Technologies
AI-driven Business Solutions, AI-driven Automation, LLM API Integration, Applied Machine Learning, Architectural Decision-Making, Cross-Functional Team Leadership, Mentoring and Code Reviews, Code Coverage & Test Automation, Driving Technology Innovation, Stakeholder Management, Microservices, Domain-Driven Design, Behavior-Driven Development, SOLID Principles, RESTful Services & API Design, Serverless Architectures, High Availability & Fault Tolerance, Distributed Systems Architecture, Separation of Concerns, Loose Coupling & High Cohesion, Identity and Access Management, CAP Theorem, Concurrency Control & Locking Mechanisms, Content Delivery Networks (CDN), Caching Strategies (Redis, Memcached), Ruby-on-Rails, Typescript & Javascript, NestJS, AWS Services (EKS, RDS, S3, SNS, SQS, Lambda, Textract, Bedrock, Cloud Watch, EC2), Helm Charts for K8s Application Packaging
2020 - 2022
Software Architect/Lead Data Engineer @Fortum, Zürich
Responsibilities
- Designed and implemented large-scale ETL pipelines to ingest IoT data from more than 100 power plants across Finland and Sweden, ensuring reliability, scalability, and efficient data flow.
- Led deep technology evaluations through multiple Proof of Concepts (PoCs) comparing Azure Databricks, AWS EMR, Azure Synapse, and Data Factory, ultimately recommending and driving adoption of Databricks' Lakehouse platform.
- Collaborated with engineering and data science teams to build efficient CI/CD pipelines for Kubernetes deployments and to implement MLflow for experiment tracking and model management.
Skills & Technologies
Driving Technology Innovation, Data Architecture, Big Data, ETL (Extract, Transform, Load), Data Pipelines, Big Data Analytics, Serverless Architectures, Java, Scala, Python, Apache Spark, Apache Kafka, AWS (Kinesis, Lambda, S3, EMR, Firehose, CloudWatch, SNS, SQS), Azure Cloud Services (Event Hubs, Data Factory, Stream Analytics, Synapse Analytics), TimescaleDB, Databricks Lakehouse Platform (Delta Lake), Infrastructure as Code (Terraform), GitHub Actions, GitLab CI, Jenkins (CI/CD), Prometheus for K8s Metrics, Grafana Dashboards for K8s Monitoring
2018 - 2020
Senior Software Engineer/Architect @RepRisk AG, Zürich
Responsibilities
- Owned and maintained 10 backend microservices supporting the daily operations of the company's operations department, ensuring reliability, availability, and scalability
- Acted as the architect for the Data Science team's data-processing platform, designing and implementing systems to fetch and process more than 500K daily documents from news sources and financial institutes.
- Contributed hands-on to backend development while guiding technical decisions and ensuring alignment between engineering solutions and business needs.
- Served as Scrum Master and technical lead for the Research and Technology software development team (5 developers), overseeing the delivery and fostering agile best practices.
Skills & Technologies
Agile Methodologies, Architectural Decision-Making, Microservices, Domain-Driven Design, SOLID Principles, RESTful Services & API Design, GraphQL APIs, Message Brokers (Apache Kafka, ActiveMQ, Hazelcast), PostgreSQL, Java, Python, Spring Framework, Jooq, JavaScript, React, Django, Elasticsearch (indexing, sharding, replication, query optimization)
2016 - 2018
Senior Researcher @Systems Group, ETH Zürich
Responsibilities
- Served as a Senior Researcher in the Systems Group, mentoring PhD and MSc students, supervising research projects, and acting as a teaching assistant to support both academic development and the team's research initiatives.
Skills & Technologies
Hardware Simulation, Memory Subsystem, Storage Subsystem, Storage Virtualization, Performance Optimization, Data Center Infrastructure, Quality-of-Service (QoS), Processing-in-Memory (PIM) Architecture, Data Center Scalability, High-Performance Computing (HPC), Remote Direct Memory Access (RDMA), C#, C++, .NET Framework, C, Cassandra, Redis
2008 - 2015
Researcher/Research Associate/Senior Software Engineer @IPM School of Computer Science
Research Summary
Parallel and Distributed Algorithms
In my work on parallel and distributed computing, I developed reconfigurable on-chip network architectures (Networks-on-Chip) that adapt topology to improve communication and scalability across multicore processors. I also explored leveraging high performance computing (HPC) power for solving abstract mathematical problems, applying distributed algorithms to domains beyond traditional systems.
Scalable Memory and Storage System Design
At Huawei and previously at ETH Zurich, I focused on building scalable SSD systems by designing enterprise-level heterogeneous memory storage systems optimized for performance and reliability. I also created frameworks like MQSim for realistic multi-queue SSD simulation that enables informed system-level design decisions
Big Data Analysis/Acceleration
I explore the role of computational storage devices (CSDs) in accelerating big data analysis by offloading data-intensive tasks such as ETL pipelines, query processing, and filtering for time-series databases directly to the storage layer. By tightly integrating CSD capabilities with the software stack, I aim to significantly reduce I/O bottlenecks and enable faster, more efficient large-scale analytics.
AI Hardware Accelerators
At Huawei, I work on designing and optimizing hardware accelerators such as DPUs and in-storage processing engines that support large-scale machine learning workloads. I focus on improving system throughput and efficiency by aligning hardware capabilities with the needs of modern AI platforms.
LLMs for Multimodal Understanding
I investigate how multimodal LLMs can be accelerated through advanced hardware-software co-design, with a particular focus on leveraging DPUs and storage-centric processing to handle diverse inputs like text, vision, and speech. My goal is to enable scalable and efficient platforms that support the next generation of multimodal AI.
LLM Inference Optimization
I deeply analyze modern LLMs and transformer-based AI platforms to identify bottlenecks in training and inference. I also propose and investigate improvements at both the system and algorithmic levels to optimize inference efficiency, scalability, and deployment.
Skills
Publications
-
Venice: Improving Solid-State Drive Parallelism at Low Cost via Conflict-Free Accesses
-
Quick Generation of SSD Performance Models Using Machine Learning
-
PLMC: A Predictable Tail Latency Mode Coordinator for Shared NVMe SSD with Multiple Hosts
-
ITAP: Idle-Time-Aware Power Management for GPU Execution Units
-
Reducing DRAM Latency via Charge-Level-Aware Look-Ahead Partial Restoration
-
FLIN: Enabling Fairness and Enhancing Performance in Modern NVMe Solid State Drives
-
MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices
-
Dataplant: In-DRAM Security Mechanisms for Low-Cost Devices
-
Enabling Efficient RDMA-based Synchronous Mirroring of Persistent Memory Transactions
-
Performance Evaluation of Dynamic Page Allocation Strategies in SSDs
-
TBM: Twin Block Management Policy to Enhance the Utilization of Plane-Level Parallelism in SSDs
-
Design for Scalability in Enterprise SSDs
-
Unleashing the Potentials of Dynamism for Page Allocation Strategies in SSDs
-
Leveraging HPC Power for Solving Abstract Mathematical Problems
-
Network-on-SSD: A Scalable and High-Performance Communication Design Paradigm for SSDs
-
Application-Aware Topology Reconfiguration for On-Chip Networks
-
Energy-Optimized On-Chip Networks Using Reconfigurable Shortcut Paths
-
Virtual Point-to-Point Connections for NoCs
-
An Efficient Dynamically Reconfigurable On-Chip Network Architecture
-
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections
-
Mesh Connected Crossbars: A Novel NoC Topology with Scalable Communication Bandwidth
-
Mathematical Analysis of Buffer Sizing for Network-on-Chips Under Multimedia Traffic
-
Adaptive Software-Based Deadlock Recovery Technique
-
The Effect of Network Topology and Channel Labels on the Performance of Label-Based Routing Algorithms
Important Academic and Research Software Projects
These are projects I have developed alongside my regular employment, either as academic research projects or as a hobby.
MQSim: Multi-Queue SSD Simulation for Next-Generation Storage Design
- High-fidelity, open-source simulator for modern NVMe SSDs with full multi-queue support; models FTL, schedulers, GC, wear-leveling and queue-based interfaces for realistic studies.
- Visit the official GitHub page.
- Published in FAST 2018: MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices (A. Tavakkol, J. Gomez-Luna, M. Sadrosadati, S. Ghose, O. Mutlu).
Xmulator: Object-Oriented Multi-Layered Simulation Framework
- Developed detailed SSD simulation platform; Orion-based power methodology for NoCs; implemented many NoC/interconnect topologies, widely referenced across publications and theses.
- Visit the official project page.
Design & Implementation of Scientific High-Performance Systems
- Built 2 TFLOPS HPC platform on IBM Cell/BE and 13 TFLOPS platform on NVIDIA GPUs; implemented benchmark kernels (MM, DWT, FFT, N-body, primes, ray tracing).
- Results published in CSI JCSE on leveraging chip multiprocessors for combinatorial problems.
Web-based Document Indexing/Provisioning System
- J2EE document management/sharing platform for IPM; indexing by content/title/keywords/authors/category with mixed queries; duplicate detection & alerting; dynamic access control.
JDB: Distributed Data Gathering & Analysis for Medical Research
- Web-based client/server system for SBMU School of Dentistry; built with C#/.NET; supports descriptive/bivariate stats, linear regression, K-means & hierarchical clustering; import/export (xls/csv/xml), DB merge, RESTful JSON/XML, and rich charting.
Academic Services
Lecturer
- International Online Workshop on Recent Advances in SSD Research and Practice, KIISE, Korea, Sep. 2024.
- One Day Workshop on Memory Systems, Sharif University of Technology, Tehran, Iran, Oct. 2014.
- Two Day Workshop on Multicore Programming, IPM, Tehran, Iran, Feb. 2013.
- One Day Workshop on GPU Programming, IPM, Tehran, Iran, May 2010.
- One Day Workshop on IBM Cell/BE Programming, IPM, Tehran, Iran, Apr. 2010.
Invited Talks
- Huawei Storage Media and Applications Innovation Workshop, Zürich, Switzerland, Apr. 2025.
- Scalability, Fairness, and Availability in Modern Storage Systems: Challenges, Solutions, and Outlook, EPFL, Lausanne, 2018.
- Design for Scalability in Enterprise SSDs, University of Victoria, Victoria, BC, Canada, 2014.
- Some Applications of Emerging Multicore Architectures in Combinatorics, School of Mathematics, IPM, Tehran, Iran, 2011.
Program Committee
- Track Chair, 29th Euromicro PDP, Valladolid, Spain, Mar. 2021.
- Track Chair, 28th Euromicro PDP, Västerås, Sweden, Mar. 2020.
Teaching
TA, ETH Zürich, Zürich, Switzerland.
- Design of Digital Circuits — Spring 2018, Spring 2017
- Computer Architecture — Fall 2017
- Hardware Acceleration for Data Processing Seminar — Fall 2016
Instructor, Sharif University of Technology, Tehran, Iran.
- Multicore Computing — Fall 2012, Fall 2013, Fall 2014, Spring 2016
- Advanced Programming — Fall 2012, Spring 2013
- Introduction to Programming — Spring 2008, Fall 2008, Fall 2010, Fall 2011, Spring 2012
- Formal Specification and Verification (Introduction to Equational Logic) — Fall 2004, Spring 2005, Fall 2006
- Digital Systems Design Lab — Spring 2006, Fall 2006
Instructor, Azad University, Tehran East Branch, Tehran, Iran.
- Advanced Programming — Fall 2007, Spring 2008
- Computer Graphics — Fall 2008
- Machine Organization and Assembly Language — Spring 2008
Peer Review Services
- ACM Transactions on Storage (TOS)
- ACM Computing Surveys
- IEEE Transactions on Computers
- Microprocessors and Microsystems (MICPRO)
- Computers & Electrical Engineering
- Journal of Computer and System Sciences
- Simulation Modelling Practice and Theory
- Nano Communication Networks
- Cluster Computing
- Journal of Systems Architecture (JSA)
- CSI Journal on Computer Science and Engineering (JCSE)
- Euromicro PDP 2013, PDP 2014
- CSI CADS 2010, CADS 2013