Thursday, April 6, 2017 at 10:00 am in Rice 536
Committee: Baishakhi Ray (Advisor), Kevin Sullivan (Advisor), Alfred Weaver (Chair), Joanne Bechta Dugan, Mary Lou Soffa, and Jonathan Bell, GMU
Title: Optimize System Performance via Design and Configuration Space Exploration
The runtime performance of complex software systems often depends on the settings of a large number of static system configuration and design parameters. For example, big data systems like Hadoop present hundreds of configuration parameters to engineers. Many of them influence runtime performance, and some interact in complex ways, which make the spaces extremely large and complex in structure. It is hard for engineers to understand how all these parameters affect performance, and thus it is hard to find combinations of these parameters that achieve available levels of performance. The result in practice is that engineers often just accept default settings or design decisions made by tools, leading such systems to significantly underperform relative to their potential. This problem, in turn, has impacts on cost, revenue, customer satisfaction, business reputation, and mission effectiveness.
To improve the overall performance of the end-to-end systems, I propose to systematically explore (i) how to design a new system towards better performance given a specification, and (ii) how to auto-configure an existing system for optimal performance. To achieve the first objective, we synthesize all the designs and test cases from a given formal specification. We then exhaustively profile the performance of all synthesized designs by applying the test cases and select Pareto-optimal designs based on the performance profiling results. However, such solution may not be applied directly to large-scale legacy systems: they often lack exhaustive specifications, changing the core design is not always feasible, and our previous approach of exhaustively exploring the whole design spaces may not scale well for such large scale systems. To address these issues, our second objective is to explore alternate configuration options using stochastic algorithms. A large system typically comes with a wide range of configuration options. Exhaustively exploring the entire configuration space is non-trivial; not all the parameters even contribute to the systems’ performance. Thus, in this work, we first plan to reduce dimensionality and complexity of the configuration space. In particular, we propose to use domain knowledge to select the configuration parameters that may affect system performance and then adopt heuristic-based search algorithms to find optimal values for these parameters. Next, using the selected parameters, we propose to develop machine learning models to optimize end-to-end system performance for given usage patterns.
We plan to evaluate our design space exploration approach in the object-relational mapping (ORM) schema design field. Our preliminary results show that our approach can significantly improve both time and space performance of created database schemas. We also plan to evaluate our configuration space exploration approach in big-data processing ecosystems, to tune the parameters of such systems to gain better performance as executing users’ jobs.