Monday, February 29, 2016
9:00 AM in Rice 504
In Kee Kim
Advisor: Marty Humphrey
Attending Faculty: Alfred Weaver, Yanjun Qi, Hongning Wang, Byungkyu Brian Park (Civil Engineering at UVA)
Title: Proactive Resource Management to Ensure Predictable End-to-End Performance for Cloud Applications
Public IaaS clouds have become an essential infrastructure for many organizations to run their applications due to diverse types of resources, cost efficient pay-as-you-go pricing models, scalability and elasticity of resources. To effectively leverage public IaaS clouds, cloud users tend to employ resource managers that elastically control cloud resources to handle dynamic changes of workloads. These resource managers typically have two interrelated goals: maximizing SLA (Service Level Agreement) satisfaction and minimizing execution cost. However, existing cloud resource managers have difficulty meeting these two goals due to the low accuracy and poor generalizability of a workload predictor and a cloud performance model. Designing these two components and a resource manager combined with them is challenging because of uncertainties in public IaaS clouds, namely 1) uncertainty in future workload patterns and 2) uncertainty in cloud resource performance.
This project creates a new cloud resource management framework that contains a workload predictor, a performance model, and a dynamic resource reconfiguration mechanism. This framework provides capabilities to maximize SLA satisfaction and minimize cloud cost through ensuring predictable end-to-end performance for cloud applications. The project consists of four parts. First, we develop a workload prediction model that forecasts future job arrivals to cloud applications by dynamically aggregating best workload predictors for diverse workload patterns. This workload predictor provides accurate predictions to enable the resource manager to determine when to scale proper cloud resources. Second, we develop a cloud performance model that predicts the performance uncertainty. This model is based on actual measurement of performance on real cloud infrastructures and provides a statistical guarantee to end-to-end executions of user jobs. Third, we develop a resource management framework that provides a dynamic resource reconfiguration capability by near-optimal combination of horizontal and vertical scaling mechanisms. This framework offers online and adaptive approaches to reconfigure available cloud resources to minimize the financial cost for resource use with SLA requirements. And last, we develop a simulation framework that supports a large scale evaluation for cloud applications and resource management policies. This simulation framework provides trustworthy results under particular workloads and enables users to test various real-world test scenarios with minimal amount of effort.
This research improves performance of cloud resource management systems under workload and performance uncertainties in public IaaS clouds. In addition, two main cloud scaling problems when and how to scale will be addressed by this research.