Novel Geometric Algorithms for Learning from Big Data


报告题目:Novel Geometric Algorithms for Learning from Big Data

报 告 人:徐金辉 教授,博导,  纽约州立大学布法罗分校

时    间:2017年 7月9 日 下午4:00-5:00  

地    点:计算机学院计B518



报告摘要:Big Data provide great potential for us to improve many aspects of our daily life. However, due to their intrinsic nature, they also impose many obstacles for us to fully utilize them. In this talk, I will address several such obstacles and present our geometric solutions. Particularly, I will use constrained clustering as an example to illustrated our ideas. Constrained clustering is a commonly encountered unsupervised learning problem. Due to the additional constraints, its optimal solution no longer has the so-called locality property, which is a key property required by many existing algorithms for achieving quality guaranteed solutions. To overcome the difficulty caused by the loss of locality, we present a unified framework, called Peeling-and-Enclosing, to iteratively solve two variants of the constrained clustering problems, constrained k-means clustering (k-CMeans) and constrained k-median clustering (k-CMedian). Our framework is based on a standalone geometric technique, called Simplex Lemma, which enables us to efficiently approximate the mean (or median) point of an unknown set of points by searching a small-size grid, independent of the dimensionality of the space, in a simplex (or its surrounding region), and thus can be used to handle high dimensional data. With this technique, our framework generates, in nearly linear time, a small set of candidates for the k means or k medians, and ensures that one of them achieves a  (1 + ε)- approximation. Combining this unified framework with a problem-specific selection algorithm, we obtain a (1 + ε)-approximation for each constrained clustering problem. Our framework improves considerably the best known results for many constrained clustering problems, including some longstanding open problems.