MS Project Presentation: Weiwei Ge
Abstract: This project is about creating two modules to introduce MapReduce. MapReduce is a type of programming strategy designed for circumstances when there is too much data for a single computer to store, and a sequential program solution requires excessive resources (time and space). The goal of the MapReduce paradigm is to solve a problem with massive amounts of data much faster than traditional sequential strategies. The modules introduce the process of MapReduce step by step and use example programs to find similar pairs of files in the file system by calculating cosine similarity of the frequencies of shared words.