Past implementations of k-means algorithm on traditional MapReduce framework used heuristic to determine the terminating point of k-means algorithm iteration. Whereas heuristic is not good enough for an empirical study. Therefore, suitable convergence criteria such as minimum sum of square error (MSSE) which aims to minimise sum of square error of both intra and inter cluster should be used. MSSE is an important convergence criteria for a k-means algorithm because it uses only data for its computation and thus reduces noise, improves accuracy, and reliability. However, computation of MSSE as a convergence criteria for a k-means algorithm has been so difficult to express on a MapReduce framework. In this work we discover the reasons for the difficulty and proposed that a solution can be provided by using a new complex data structure for the key-value pair of MapReduce framework and introduction of a new optimization function in the reduce phase.
Dr. Oyebola Akande