PhD Qualifying Exam Presentation
Date: Thursday, December 15, 2016 at 10:00 AM in Rice 242
Committee: Kevin Skadron (Advisor), Worthy N. Martin (Committee Chair), Hongning Wang and Jane Qi.
Frequent Subtree Mining on the Automata Processor: Challenges and Opportunities
Tree structures are used abundantly in a wide variety of domains such as XML databases, bioinformatics, web mining, natural language processing, and chemical compound structure extraction. Frequency counting of complex patterns such as subtrees and subgraphs is more challenging than for simple itemsets and sequences, as the number of possible candidates in a tree are much higher than one-dimensional data structures, with dramatically higher processing times. Considering the ever-increasing amount of data and size of individual datasets, developing a scalable and efficient high-performance computing solution becomes more and more important. In this project, we propose a new and scalable solution to the frequent subtree mining (FTM) problem using a new and highly parallel accelerator architecture, called the Automata Processor (AP). We present a four-stage pruning strategy on the AP to reduce the search space of the FTM candidates. This achieves up to 353X speedup on four real-world and synthetic datasets, when compared with PatternMatcher, a practical CPU solution in lower support thresholds. To provide a fully accurate and still scalable solution, we propose a hybrid method to combine the AP-FTM with a CPU exact matching approach, and achieve up to 262X speedup over PatternMatcher on a challenging database. Finally, we develop a level-wise FTM algorithm on GPU to provide a comprehensive comparison of FTM on different platforms. The AP advantage grows further with larger datasets.