Conference 2019
Top image

 
Home
Program LNMB conference
Registration LNMB Conference
Invited Speakers LNMB Conference
Program PhD presentations
Abstracts PhD presentations
Announcement NGB/LNMB Seminar
Abstracts/Bios NGB/LNMB Seminar
Registration NGB/LNMB Seminar
Conference Office
How to get there
 
Return to LNMB Site
 

Dorit Hochbaum: Combinatorial optimization for image segmentation and large scale data mining

Abstract: The dominant algorithms for machine learning tasks fall most often in the realm of AI or continuous optimization of intractable problems. We present combinatorial algorithms for machine learning, data mining, and image segmentation that, unlike the majority of existing machine learning methods, utilize pairwise similarities. These algorithms are efficient and reduce the classification problem to a network flow problem on a graph. One of these algorithms addresses the problem of finding a cluster that is as dissimilar as possible from the complement, while having as much similarity as possible within the cluster. These two objectives are combined either as a ratio or with linear weights. This problem is a variant of normalized cut, which is intractable. The problem and the polynomial-time algorithm solving it are called HNC (Hochbaum's Normalized Cut). It is demonstrated, via an extensive empirical study, that incorporating the use of pairwise similarities improves accuracy of classification and clustering. However, a drawback of the use of similarities is the quadratic rate of growth in the size of the data. A methodology called "sparse computation" has been devised to address and eliminate this quadratic growth. It is demonstrated that the technique of "sparse computation" enables the scalability of similarity-based algorithms to very large-scale data sets while maintaining high levels of accuracy. We demonstrate several applications of variants of HNC for data mining, medical imaging, and image segmentation tasks, including a recent one in which HNC is among the top performing methods in a benchmark for cell identification in calcium imaging movies for neuroscience brain research.