profile

Arash Tavakkol

About Me

Arash Tavakkol | Ph.D. in Computer Architecture

With over two decades of experience in the software and systems domain, I have spent roughly half my career in the technology industry and the other half in research. This balanced background enables me to combine the creative, exploratory mindset of research, innovation thinking, out-of-the-box problem solving, and the drive to explore new domains, with the rigor, quality standards, and timely delivery expectations of industry.

I have successfully led software teams in designing, architecting, and delivering cloud-based, distributed systems that are scalable, resilient, and high-performance. My leadership style emphasizes technical excellence, rigorous testing practices, and alignment of architecture with business goals, while fostering the professional growth of team members. I bring extensive hands-on experience with modern cloud platforms, distributed architectures, and orchestration technologies, guiding teams through complex technical challenges from conception to deployment.

A quick learner who thrives on adopting emerging technologies and innovative approaches, I excel at bridging the gap between cutting-edge research and practical, production-ready solutions, delivering systems that are not only technically robust but also strategically impactful.

Education

2016 - 2018

ETH Zurich, Zurich, Switzerland

Postdoc - Systems Group, Department of Computer Science

Research topics: High-Performance and QoS-Aware Memory/Storage Sub-Systems, RDMA-Based Data Replication in Modern Datacenters, Near-Data Processing
Advisor: Onur Mutlu

2010 - 2015

Sharif University of Technology, Tehran, Iran

Ph.D. in Computer Engineering - Computer Architecture Major

Thesis title: A Scalable and High-Performance Design Architecture for Solid-State Drives
Advisor: Hamid Sarbazi-Azad

2005 - 2008

Sharif University of Technology, Tehran, Iran

M.Sc. in Computer Engineering - Computer Architecture Major

Thesis title: Performance of Crossbar-based Interconnection Networks for Multiprocessors
Advisor: Hamid Sarbazi-Azad

2000 - 2005

Sharif University of Technology, Tehran, Iran

B.Sc. in Computer Engineering - Software Engineering Major

Undergraduate final project title: An image watermarking framework using discrete Wavelet Transform to protect image databases against unauthorized modifications
Advisor: Shohreh Kasaei

Honors

2018

Best Paper Award, European Network on High Performance and Embedded Architecture and Compilation (HiPEAC)

2016

Third place award in the 18th Iranian National Khwarizmi Youth Festival for my innovations in storage systems, Iran
In news (in persian): Official web page, IRNA, ISNA, Mehr News, Hamshahri

2004

Ranked 4th among more than 10,000 applicants in the Iranian Nationwide Graduate School Entrance Exam in Computer Engineering, Iran

2004

Ranked among the top 15 students of the 8th National Scientific Olympiads in Computer Engineering, Iran

2000

Ranked 188th among more than 350,000 applicants in the Iranian Nationwide University Entrance Exams, Iran

Work Experience

2025 - present

Principal Research Engineer @Huawei Zurich Research Center, Zürich, Switzerland

Responsibilities

  • As a Principal Researcher, I lead the development of cutting-edge AI solutions with a strong focus on cost optimization, performance enhancement, and resource efficiency. My work spans the full technology stack, with a focus on software and systems, driving innovations that deliver tangible impact in real-world AI deployments.
  • I collaborate closely with cross-functional teams, aligning technical strategies with the evolving requirements of AI customers across diverse industries. This involves partnering with multiple stakeholders to define scalable, high-performance AI infrastructures that balance technical ambition with practical constraints. My role combines deep technical research with strategic alignment to push the boundaries of what's possible in AI systems.

Skills & Technologies

Transformer Architectures, LLM Inference Optimization, Distributed Inference, Low-cost Inference Solutions, PyTorch, DeepSpeed-MII, vLLM, Hugging Face Transformers, Model Parallelism & Pipeline Parallelism, Python, C++, Nvidia CUDA

2021 - 2025

Principal Software Engineer @ApplyBoard Inc., Kitchener, Canada

Responsibilities

  • Led technical strategy and architecture for modern, cloud-based distributed systems, ensuring scalability, high availability, resilience, and alignment with short-, medium-, and long-term business objectives.
  • Integrated cutting-edge AI solutions, from large language models to advanced OCR pipelines, into core products, streamlining document processing workflows, reducing processing time from days to minutes, and delivering substantial cost savings.
  • Mentored and coached dozens of developers across multiple software development teams, fostering technical growth, enforcing engineering best practices, and ensuring consistently high software quality.
  • Collaborated with cross-functional leadership including heads of operations, customer support, and marketing teams, as well as product managers, to deliver solutions that advanced organizational goals and improved the overall customer experience.
  • Influenced company-wide technology direction as part of the Technology Strategy and Standards Council, defining technical standards and guiding innovation for over 200 engineers and product team members.

Skills & Technologies

AI-driven Business Solutions, AI-driven Automation, LLM API Integration, Applied Machine Learning, Architectural Decision-Making, Cross-Functional Team Leadership, Mentoring and Code Reviews, Code Coverage & Test Automation, Driving Technology Innovation, Stakeholder Management, Microservices, Domain-Driven Design, Behavior-Driven Development, SOLID Principles, RESTful Services & API Design, Serverless Architectures, High Availability & Fault Tolerance, Distributed Systems Architecture, Separation of Concerns, Loose Coupling & High Cohesion, Identity and Access Management, CAP Theorem, Concurrency Control & Locking Mechanisms, Content Delivery Networks (CDN), Caching Strategies (Redis, Memcached), Ruby-on-Rails, Typescript & Javascript, NestJS, AWS Services (EKS, RDS, S3, SNS, SQS, Lambda, Textract, Bedrock, Cloud Watch, EC2), Helm Charts for K8s Application Packaging

2020 - 2022

Software Architect/Lead Data Engineer @Fortum, Zürich

Responsibilities

  • Designed and implemented large-scale ETL pipelines to ingest IoT data from more than 100 power plants across Finland and Sweden, ensuring reliability, scalability, and efficient data flow.
  • Led deep technology evaluations through multiple Proof of Concepts (PoCs) comparing Azure Databricks, AWS EMR, Azure Synapse, and Data Factory, ultimately recommending and driving adoption of Databricks' Lakehouse platform.
  • Collaborated with engineering and data science teams to build efficient CI/CD pipelines for Kubernetes deployments and to implement MLflow for experiment tracking and model management.

Skills & Technologies

Driving Technology Innovation, Data Architecture, Big Data, ETL (Extract, Transform, Load), Data Pipelines, Big Data Analytics, Serverless Architectures, Java, Scala, Python, Apache Spark, Apache Kafka, AWS (Kinesis, Lambda, S3, EMR, Firehose, CloudWatch, SNS, SQS), Azure Cloud Services (Event Hubs, Data Factory, Stream Analytics, Synapse Analytics), TimescaleDB, Databricks Lakehouse Platform (Delta Lake), Infrastructure as Code (Terraform), GitHub Actions, GitLab CI, Jenkins (CI/CD), Prometheus for K8s Metrics, Grafana Dashboards for K8s Monitoring

2018 - 2020

Senior Software Engineer/Architect @RepRisk AG, Zürich

Responsibilities

  • Owned and maintained 10 backend microservices supporting the daily operations of the company's operations department, ensuring reliability, availability, and scalability
  • Acted as the architect for the Data Science team's data-processing platform, designing and implementing systems to fetch and process more than 500K daily documents from news sources and financial institutes.
  • Contributed hands-on to backend development while guiding technical decisions and ensuring alignment between engineering solutions and business needs.
  • Served as Scrum Master and technical lead for the Research and Technology software development team (5 developers), overseeing the delivery and fostering agile best practices.

Skills & Technologies

Agile Methodologies, Architectural Decision-Making, Microservices, Domain-Driven Design, SOLID Principles, RESTful Services & API Design, GraphQL APIs, Message Brokers (Apache Kafka, ActiveMQ, Hazelcast), PostgreSQL, Java, Python, Spring Framework, Jooq, JavaScript, React, Django, Elasticsearch (indexing, sharding, replication, query optimization)

2016 - 2018

Senior Researcher @Systems Group, ETH Zürich

Responsibilities

  • Served as a Senior Researcher in the Systems Group, mentoring PhD and MSc students, supervising research projects, and acting as a teaching assistant to support both academic development and the team's research initiatives.

Skills & Technologies

Hardware Simulation, Memory Subsystem, Storage Subsystem, Storage Virtualization, Performance Optimization, Data Center Infrastructure, Quality-of-Service (QoS), Processing-in-Memory (PIM) Architecture, Data Center Scalability, High-Performance Computing (HPC), Remote Direct Memory Access (RDMA), C#, C++, .NET Framework, C, Cassandra, Redis

2008 - 2015

Researcher/Research Associate/Senior Software Engineer @IPM School of Computer Science

Research Summary

Parallel and Distributed Algorithms

In my work on parallel and distributed computing, I developed reconfigurable on-chip network architectures (Networks-on-Chip) that adapt topology to improve communication and scalability across multicore processors. I also explored leveraging high performance computing (HPC) power for solving abstract mathematical problems, applying distributed algorithms to domains beyond traditional systems.

Scalable Memory and Storage System Design

At Huawei and previously at ETH Zurich, I focused on building scalable SSD systems by designing enterprise-level heterogeneous memory storage systems optimized for performance and reliability. I also created frameworks like MQSim for realistic multi-queue SSD simulation that enables informed system-level design decisions

Big Data Analysis/Acceleration

I explore the role of computational storage devices (CSDs) in accelerating big data analysis by offloading data-intensive tasks such as ETL pipelines, query processing, and filtering for time-series databases directly to the storage layer. By tightly integrating CSD capabilities with the software stack, I aim to significantly reduce I/O bottlenecks and enable faster, more efficient large-scale analytics.

AI Hardware Accelerators

At Huawei, I work on designing and optimizing hardware accelerators such as DPUs and in-storage processing engines that support large-scale machine learning workloads. I focus on improving system throughput and efficiency by aligning hardware capabilities with the needs of modern AI platforms.

LLMs for Multimodal Understanding

I investigate how multimodal LLMs can be accelerated through advanced hardware-software co-design, with a particular focus on leveraging DPUs and storage-centric processing to handle diverse inputs like text, vision, and speech. My goal is to enable scalable and efficient platforms that support the next generation of multimodal AI.

LLM Inference Optimization

I deeply analyze modern LLMs and transformer-based AI platforms to identify bottlenecks in training and inference. I also propose and investigate improvements at both the system and algorithmic levels to optimize inference efficiency, scalability, and deployment.

Skills

Programming Languages
Java, C++/C, Python, C#
Database Management Systems
MySQL, PostgreSQL, TimescaleDB, AWS DynamoDB, AWS Redshift, Redis, Apache Cassandra, Elasticsearch
Cloud Platforms
AWS (EC2, Lambda, ECS, EKS, S3, EBS, RDS, DynamoDB, Redshift, ElasticCache, CloudFront, VPC, ELB, IAM, KMS, EMR, Kinesis, CDK), Azure (AKS, Functions, App Service, Synapse, VNet, DevOps, Cosmos DB, AD, AD B2C, Event Hubs, API Management)
Backend Technologies
Java Spring Framework, .NET Core, .NET Entity Framework, Ruby on Rails, ActiveRecord, Flask, Django, WSGI, Nginx, NestJS
Containers and Orchestration
Docker, Kubernetes (AKS, EKS, OpenShift), Helm, Terraform
Message Brokers
Apache Kafka, RabbitMQ, Amazon SNS/SQS, Amazon KDS, Azure Event Hub, Hazelcast IMDG
AI Tools
PyTorch, Ktransformers, DeepSpeed-MII, Hugging Face Transformers, vLLM
Multicore & Parallel Programming Platforms
Nvidia CUDA, OpenMP, MPI, Java Multi-threading, Apache Hadoop, Pthread

Publications

  • Venice: Improving Solid-State Drive Parallelism at Low Cost via Conflict-Free Accesses

    R. T. Nadig, M. Sadrosadati, H. Mao, N. Mansouri-Ghiasi, A. Tavakkol, J. Park, H. Sarbazi-Azad, J. Gomez-Luna, and O. Mutlu
    in 45th International Symposium on Computer Architecture (ISCA ’23), pp. 36:1–36:16, 2023.
  • Quick Generation of SSD Performance Models Using Machine Learning

    M. Tarihi, S. Azadvar, A. Tavakkol, H. Asadi, and H. Sarbazi-Azad
    IEEE Transactions on Emerging Topics in Computing (TETC), Vol. 10, No. 4, pp. 1821–1836, 2022.
  • PLMC: A Predictable Tail Latency Mode Coordinator for Shared NVMe SSD with Multiple Hosts

    T. Roy, J. Gupta, K. Kant, A. Pal, D. Minturn, and A. Tavakkol
    IEEE International Conference on Networking, Architecture and Storage (NAS), pp. 1–6, 2021.
  • ITAP: Idle-Time-Aware Power Management for GPU Execution Units

    M. Sadrosadati, S. B. Ehsani, H. Falahati, R. Ausavarungnirun, A. Tavakkol, M. Abaee, L. Orosa, Y. Wang, H. Sarbazi-Azad, and O. Mutlu
    ACM Transactions on Architecture and Code Optimization (TACO), Vol. 16, No. 1, pp. 3:1–3:33, 2018.
  • Reducing DRAM Latency via Charge-Level-Aware Look-Ahead Partial Restoration

    Y. Wang, A. Tavakkol, L. Orosa, S. Ghose, N. M. Ghiasi, M. Patel, J. S. Kim, H. Hassan, M. Sadrosadati, and O. Mutlu
    in 51st International Symposium on Microarchitecture (MICRO ’18), pp. 298–311, 2018.
  • FLIN: Enabling Fairness and Enhancing Performance in Modern NVMe Solid State Drives

    A. Tavakkol, M. Sadrosadati, S. Ghose, J. Kim, Y. Luo, Y. Wang, N. M. Ghiasi, L. Orosa, J. Gómez-Luna, and O. Mutlu
    in 45th International Symposium on Computer Architecture (ISCA ’18), pp. 397–410, 2018.
  • MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices

    A. Tavakkol, J. Gómez-Luna, M. Sadrosadati, S. Ghose, and O. Mutlu
    in 16th USENIX Conference on File and Storage Technologies (FAST ’18), pp. 49–66, 2018.
  • Dataplant: In-DRAM Security Mechanisms for Low-Cost Devices

    L. Orosa, Y. Wang, I. Puddu, M. Sadrosadati, K. Razavi, J. Gomez-Luna, H. Hassan, N. Mansouri-Ghiasi, A. Tavakkol, M. Patel, J. Kim, V. Seshadri, U. Kang, S. Ghose, R. Azevedo, and O. Mutlu
    Preliminary version in arXiv, 2018.
  • Enabling Efficient RDMA-based Synchronous Mirroring of Persistent Memory Transactions

    A. Tavakkol, A. Kolli, S. Novakovic, K. Razavi, J. Gómez-Luna, H. Hassan, C. Barthels, Y. Wang, M. Sadrosadati, S. Ghose, A. Singla, P. Subrahmanyam, and O. Mutlu
    Preliminary version in arXiv, 2018.
  • Performance Evaluation of Dynamic Page Allocation Strategies in SSDs

    A. Tavakkol, P. Mehrvarzy, M. Arjomand, and H. Sarbazi-Azad
    ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS), Vol. 1, No. 2, pp. 7:1–7:33, 2016.
  • TBM: Twin Block Management Policy to Enhance the Utilization of Plane-Level Parallelism in SSDs

    A. Tavakkol, P. Mehrvarzy, and H. Sarbazi-Azad
    Computer Architecture Letters (CAL), Vol. 15, No. 2, pp. 121–124, 2016.
  • Design for Scalability in Enterprise SSDs

    A. Tavakkol, M. Arjomand, and H. Sarbazi-Azad
    Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT ’14), pp. 417–430, 2014.
  • Unleashing the Potentials of Dynamism for Page Allocation Strategies in SSDs

    A. Tavakkol, M. Arjomand, and H. Sarbazi-Azad
    Proceedings of ACM SIGMETRICS ’14, pp. 551–552, 2014.
  • Leveraging HPC Power for Solving Abstract Mathematical Problems

    E. Totoni, A. Tavakkol, G. B. Khosrovshahi, A. Khonsari, and H. Sarbazi-Azad
    CSI Journal on Computer Science and Engineering (JCSE), Vol. 11, No. 2, pp. 1–14, 2014.
  • Network-on-SSD: A Scalable and High-Performance Communication Design Paradigm for SSDs

    A. Tavakkol, M. Arjomand, and H. Sarbazi-Azad
    Computer Architecture Letters (CAL), Vol. 12, No. 1, pp. 5–8, January 2013.
  • Application-Aware Topology Reconfiguration for On-Chip Networks

    M. Modarressi, A. Tavakkol, and H. Sarbazi-Azad
    IEEE Transactions on Very Large Scale Integration Systems (TVLSI), Vol. 19, No. 11, pp. 2010–2022, 2011.
  • Energy-Optimized On-Chip Networks Using Reconfigurable Shortcut Paths

    N. Teimouri, M. Modarressi, A. Tavakkol, and H. Sarbazi-Azad
    Lecture Notes in Computer Science, ARCS 2011, pp. 231–242, 2011.
  • Virtual Point-to-Point Connections for NoCs

    M. Modarressi, A. Tavakkol, and H. Sarbazi-Azad
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), Vol. 29, No. 6, pp. 855 - 868, June 2010.
  • An Efficient Dynamically Reconfigurable On-Chip Network Architecture

    M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol
    Proceedings of the 2010 47th ACM/IEEE Design Automation Conference (DAC 2010), pp. 166 - 169, 2010.
  • Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections

    M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol
    Proceedings of NOCS ’09, pp. 203–212, 2009.
  • Mesh Connected Crossbars: A Novel NoC Topology with Scalable Communication Bandwidth

    A. Tavakkol, H. Sarbazi-Azad, and R. Moraveji
    Proceedings of ISPA ’08, pp. 319–326, 2008.
  • Mathematical Analysis of Buffer Sizing for Network-on-Chips Under Multimedia Traffic

    A. Khonsari, M. R. Aghajani, A. Tavakkol, and M. S. Talebi
    Proceedings of ICCD 2008, pp. 150–155, 2008.
  • Adaptive Software-Based Deadlock Recovery Technique

    M. Mirza-Aghatabar, A. Tavakkol, H. Sarbazi-Azad, and A. Nayebi
    Proceedings of AINAW 2008, pp. 514–519, 2008.
  • The Effect of Network Topology and Channel Labels on the Performance of Label-Based Routing Algorithms

    R. Moraveji, H. Sarbazi-Azad, and A. Tavakkol
    Lecture Notes in Computer Science, ICCS 2008, pp. 529–538, 2008.

Important Academic and Research Software Projects

These are projects I have developed alongside my regular employment, either as academic research projects or as a hobby.

MQSim: Multi-Queue SSD Simulation for Next-Generation Storage Design 2018 – present

  • High-fidelity, open-source simulator for modern NVMe SSDs with full multi-queue support; models FTL, schedulers, GC, wear-leveling and queue-based interfaces for realistic studies.
  • Visit the official GitHub page.
  • Published in FAST 2018: MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices (A. Tavakkol, J. Gomez-Luna, M. Sadrosadati, S. Ghose, O. Mutlu).

Xmulator: Object-Oriented Multi-Layered Simulation Framework 2006 – present

  • Developed detailed SSD simulation platform; Orion-based power methodology for NoCs; implemented many NoC/interconnect topologies, widely referenced across publications and theses.
  • Visit the official project page.

Design & Implementation of Scientific High-Performance Systems 2007 – 2008

  • Built 2 TFLOPS HPC platform on IBM Cell/BE and 13 TFLOPS platform on NVIDIA GPUs; implemented benchmark kernels (MM, DWT, FFT, N-body, primes, ray tracing).
  • Results published in CSI JCSE on leveraging chip multiprocessors for combinatorial problems.

Web-based Document Indexing/Provisioning System 2009 – 2010

  • J2EE document management/sharing platform for IPM; indexing by content/title/keywords/authors/category with mixed queries; duplicate detection & alerting; dynamic access control.

JDB: Distributed Data Gathering & Analysis for Medical Research 2008 – 2009

  • Web-based client/server system for SBMU School of Dentistry; built with C#/.NET; supports descriptive/bivariate stats, linear regression, K-means & hierarchical clustering; import/export (xls/csv/xml), DB merge, RESTful JSON/XML, and rich charting.

Academic Services

Lecturer

  • International Online Workshop on Recent Advances in SSD Research and Practice, KIISE, Korea, Sep. 2024.
  • One Day Workshop on Memory Systems, Sharif University of Technology, Tehran, Iran, Oct. 2014.
  • Two Day Workshop on Multicore Programming, IPM, Tehran, Iran, Feb. 2013.
  • One Day Workshop on GPU Programming, IPM, Tehran, Iran, May 2010.
  • One Day Workshop on IBM Cell/BE Programming, IPM, Tehran, Iran, Apr. 2010.

Invited Talks

Program Committee

Teaching

TA, ETH Zürich, Zürich, Switzerland.

Instructor, Sharif University of Technology, Tehran, Iran.

Instructor, Azad University, Tehran East Branch, Tehran, Iran.

  • Advanced Programming — Fall 2007, Spring 2008
  • Computer Graphics — Fall 2008
  • Machine Organization and Assembly Language — Spring 2008

Peer Review Services

  • ACM Transactions on Storage (TOS)
  • ACM Computing Surveys
  • IEEE Transactions on Computers
  • Microprocessors and Microsystems (MICPRO)
  • Computers & Electrical Engineering
  • Journal of Computer and System Sciences
  • Simulation Modelling Practice and Theory
  • Nano Communication Networks
  • Cluster Computing
  • Journal of Systems Architecture (JSA)
  • CSI Journal on Computer Science and Engineering (JCSE)
  • Euromicro PDP 2013, PDP 2014
  • CSI CADS 2010, CADS 2013

Contact Me