Thesis Defense: Udita Patel

Thesis Defense: Udita Patel

Thursday, April 27, 2016 at 9:00am in Manchester 017

Performance Analysis of Parallel Support Vector Machines on a MapReduce Architecture


The quantity of electronic data has grown exponentially with the rapid development of the World Wide Web, the Internet of Things, and other digital technologies. As a result, data mining and machine learning algorithms face computational complexity issues when applied to real world datasets. Support Vector Machines (SVM) are powerful classification and regression tools but their computational requirements increase rapidly as the number of training examples increases. To address this problem, several parallel MapReduce based implementations of SVMs have been proposed. These implementations have in common that they decompose a large-scale multi-class problem to a number of relatively smaller sub-problems by dividing the data into multiple partitions which can be processed in parallel; however, these approaches use different aggregation and combination strategies to form the final model. In this project, we implement three parallel SVM algorithms on the Hadoop implementation of MapReduce, using the libsvm library for core SVM computations. We conduct a comprehensive investigation of the three architectures in order to compare their generalization performance, accuracy, and training times. Our experimental dataset contains 42000 examples of hand written digits drawn from the MNIST hand written digit collection.