Using Queuing Theory to Model Data Center Systems

Presenters

  • David Meisner, Facebook, meisner@fb.com
  • Mor Harchol-Balter, Carnegie Mellon University, harchol@cs.cmu.edu 
  • Thomas Wenisch, University of Michigan, twenisch@umich.edu

Duration: half day

Abstract

Recently, there has been an explosive growth in Internet services, greatly increasing the importance of data center systems. Applications served from “the cloud” are driving data center growth and quickly overtaking traditional workstations. Although there are many analytic and simulation tools for evaluating components of desktop and server architectures in detail, scalable modeling tools are noticeably missing.

We believe that stochastic methods and queueing theory together provide an avenue to answer important questions about data center systems. In the first half of this tutorial, we present a crash-course (or perhaps, a refresher for some) on the essential elements of queueing theory with particular applications to modeling data center systems. We also illustrate how queueing theory can be used to solve problems related to the design and analysis of computer systems.

In the second part of the tutorial, we describe BigHouse, a simulation infrastructure that combines queuing theory and stochastic methods to model data centers systems. Instead of simulating servers using detailed microarchitectural models, BigHouse raises the level of abstraction using the tools of queuing theory, enabling simulation at 1000-server scale in less than an hour. We include brief background on data center power modeling, a description of the statistical methods used by BigHouse, parallelization techniques, a tour of the simulator code, and a case study of using BigHouse to model data center power capping.

BigHouse is described in the following publication: D. Meisner, J. Wu, T. F. Wenisch.

BigHouse: A simulation infrastructure for data center systems. Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS). Best Paper Award, Apr. 2012.

Tutorial Objectives

The goals of this tutorial are two-fold. In the first half, we want to give attendees an appreciation of the power of queueing theory and provide a crash-course (or perhaps a refresher for many) on essential elements of queueing theory with particular application to modeling data center systems.

In the second half of the tutorial, we introduce the BigHouse simulation infrastructure, explain its design principles, provide a walk-through of the organization of the simulator code, and finally walk the audience through a case study of modeling a simple data center power-capping system.

Topics

Part 1: What every systems researcher should know about queuing theory. Mor Harchol-Balter
(2 hours)

We begin with the fundamentals of queueing theory terminology: the arrival process; service process; server utilization; system utilization; throughput versus response time; open versus closed queueing networks; prioritization policies; scheduling policies, etc. Our goal is not just to explain the terminology, but also to briefly survey what kinds of systems can be easily analyzed via basic queueing theory. Examples of systems we will discuss include classed networks, where the job “type” determines its routing and service time, or queues where jobs have different priorities.

We then move on to describing systems design questions which are often counterintuitive, but which are illuminated via queueing theory. Examples include:

  • How is response time affected by variability in the arrival process and job sizes?
  • What are guidelines for capacity provisioning in data centers to meet QoS goals?
  • How do different scheduling policies compare with respect to response time and fairness?
  • How do different task assignment policies for server farms compare, and what is the impact of heavy-tailed workloads?
  • Why do closed systems respond differently to scheduling as compared to open systems?
  • What is the effect of setup times as the size of a data center grows, and what is the impact on power management?

Part 2: Modeling data center systems in BigHouse. David Meisner & Thomas Wenisch
(2 hours)

  • Brief intro to data center systems & data center power modeling
  • BigHouse philosophy & design
  • Sampling & confidence estimates for mean and quantile statistics
  • Sampling challenges: warm up and sample independence
  • Parallelizing BigHouse simulations
  • Case study of using BigHouse to study data center power cappingTarget Audience:
    Systems researchers with interest in one or more of the following:
  • A crash course (or refresher) in queueing theory
  • Applications of queuing theory to data centers
  • Modeling power in data centers
  • Simulating queuing systems
  • Sampling methods applicable to queuing simulation
  • Parallelization techniques
  • Using the BigHouse simulation infrastructurePre-requisite Knowledge:
  • No prior knowledge of queueing theory is needed
  • Some familiarity with data centers is helpful, but not essential
  • Familiarity with Java is necessary to use BigHouse

Biographies

David Meisner is a Software Engineer at Facebook. He received his PhD from The University of Michigan, Ann Arbor and his ScB in Electrical Engineering at Brown University. His research interests focus on efficiency in large-scale data centers, computer architecture and performance modeling.

Mor Harchol-Balter is an Associate Professor in Computer Science at Carnegie Mellon University. From 2008-2011, she served as the Associate Department Head for Computer Science. She received her doctorate from the Computer Science department at the University of California at Berkeley. She is a recipient of the McCandless Chair, the NSF CAREER award, the NSF Postdoctoral Fellowship in the Mathematical Sciences, multiple best paper awards, and several teaching awards, including the Herbert A. Simon Award for Teaching Excellence. She is heavily involved in the ACM SIGMETRICS research community, where she served as Technical Program Chair for Sigmetrics 2007 and is General Chair for Sigmetrics 2013. Mor’s work focuses on designing new resource allocation policies (load balancing policies, power management policies, and scheduling policies) for server farms and distributed systems in general. Her work spans both

queueing analysis and systems implementation. She is the author of a book titled, Performance Modeling and Design of Computer Systems, to be published this year by Cambridge University Press.

Thomas Wenisch is the Morris Wellman Faculty Development Assistant Professor of Computer Science and Engineering at the University of Michigan, specializing in computer architecture. Tom’s prior research includes memory streaming for commercial server applications, store-wait-free multiprocessor memory systems, memory disaggregation, and rigorous sampling-based performance evaluation methodologies. His ongoing work focuses on computational sprinting, data center architecture, energy-efficient server design, and multi-core / multiprocessor memory systems. Tom received an NSF CAREER award in 2009, two papers selected in IEEE Micro Top Picks, and Best Paper Awards at HPCA 2012 and ISPASS 2012. Prior to his academic career, Tom was a software developer at American Power Conversion, where he worked on data center thermal topology estimation. Tom received his Ph.D. in Electrical and Computer Engineering from Carnegie Mellon University.