...
Submodular set functions are characterized by an intuitive
diminishing marginal costs property. Their structure
admits efficient (approximate) optimization algorithms in
many settings.
Unfortunately, not all set functions are submodular, and I
am collecting a
growing list
of functions that may appear
submodular at first look but are actually not.
Many classical combinatorial problems ask to select a set of graph edges that form a particular structure - for example, a graph cut, while minimizing a sum of edge weights. We replace the sum of edge weights by a more general, non-additive submodular set function. The resulting problem is a Cooperative Graph Cut.
Graph cuts have been used as a generic tool in
combinatorial optimization. Replacing the common sum of
edge weights by a submodular function enhances the
representative capabilities of graph cuts. For example,
graph cuts haven been popular for MAP inference in
pairwise Markoc Random Fields, used e.g. for image
segmentation. These have some well-known shortcomings:
the optimal segmentations tend to short-cut fine object
boundaries, in particular when the contrast is
low. Cooperative cuts correspond to high-order potentials
and admit to incorporate information about the object
boundary, leading to much better segmentations.
Cooperative cuts indeed unify and generalize a number
of higher-order energy functions that have been used in
Computer Vision.
S. Jegelka and J. Bilmes. Submodularity beyond submodular energies: coupling edges in graph cuts. CVPR 2011.
In its most general form, the problem of finding a
minimum cooperative cut is extremely hard, and polynomial
lower bounds apply. However, we have explored several
approximation algorithms that in practice work better than
their theoretical worst-case approximation factor. Some of
these are also efficient enough to be applied to large
graphs arising from images.
Our theoretical results complement recent
work in theoretical computer science on related
problems.
S. Jegelka and J. Bilmes. Approximation
bounds for inference using cooperative cut. ICML 2011.
Jegelka and
J. Bilmes. Multi-label
cooperative cuts. CVPR
2011 Workshop on Inference in Graphical Models with Structured
Potentials.
Sequential decision problems ask to repeatedly solve an optimization problem with an unknown, changing cost function. A decision for the current cost must be made based on observing the costs at previous steps, but not the current one. Such problems become challenging when the optimization problem is combinatorial and therefore the decision space exponentially large. We address sequential combinatorial problems and derive the first algorithms that handle nonlinear, submodular cost functions.
S. Jegelka and J. Bilmes. Online submodular minimization for combinatorial structures. ICML 2011.
Graph cuts have particularly been used to represent pseudo-boolean and set functions and thereby as a tool to minimize such functions more efficiently than e.g. with a generic algorithm for minimizing submodular functions. However, not all submodular functions can be (efficiently) represented as graph cuts. However, any submodular function can be represented as a cooperative cut, and this leads to efficient approximation algorithms that work well on several submodular functions.
S. Jegelka, H. Lin and J. Bilmes. On fast approximate submodular minimization. NIPS 2011.
Data in machine learning often arises from noisy measurements. When such data is used in an optimization problem, it is beneficial to know the stability of the optimal solution to perturbations in the data. We show a method for analyzing this stability for LP relaxations of graph partitioning problems. The method can handle the exponential number of constraints and applies to problems such as correlation clustering, clustering aggregation or modularity clustering.
S. Nowozin and S. Jegelka. Solution stability in linear programming relaxations: graph partitioning and unsipervised learning. ICML 2009.
Many clustering criteria aim for clusers that are spatially separated. For example, the popular k-means crtiterion seeks clusers whose means are maximally far apart. If the data is assumed to be samples of a mixture of distributions and we want to recover the underlying distributions, then spatial separation may not be the ideal criterion, e.g. if we have two Gaussians with the same mean but different variance. Using a kernel criterion, however, we can separate distributions by higher-order moments. This observation also explains capabilities of the kernel k-means algorithm for example in separating distributions by moments other than the mean.
S. Jegelka, A. Gretton, B. Schoelkopf, B.K Sriperumbudur and U. von Luxburg. Generalized clustering via kernel embeddings. KI 2009.
The tensor clustering problem generalizes co-custering (also called biclustering) from matrices to tensors. Informally, we aim to partition a given tensor into honogeneous "cubes". Formally, we want to find the closest best low-rank factorization of a particular form. We show the first approximation bounds for tensor clustering with metrics and Bregman divergences. This work also illustrates the limits of ignoring the "co" in co-clustering.
S. Jegelka, S. Sra and A. Banerjee. Approximation algorithms for tensor clustering. ALT 2009.
Most clustering problems correspond to NP-hard optimization problems. Furthermore, even if we could find the optimzal solution, this procedure may fail to be statisticaly consistent. Therefore, we relax computationally hard clustering problems (such as k-means or normalized cut) to formulations that cen be solved exactly in polynomial time and that are statistically consistent and converge to the solution of the given objective as the number of sample points grows.
U. von Luxburg, S. Bubeck, S. Jegelka and M. Kaufmann. Consistent minimization of clustering objective functions. NIPS 2007
H. Shen, S. Jegelka and A. Gretton. Fast Kernel-based Independent Component Analysis. IEEE
Transactions on Signal Processing 57(9),
pp. 3498-3511, 2009.
H. Shen, S. Jegelka and A. Gretton. Fast Kernel ICA
using an Approximate Newton Method. AISTATS, 2007.
S. Jegelka and A. Gretton. Brisk Kernel Independent Component Analysis. In L. Bottou, O. Chapelle,
D. DeCoste, J. Weston, editors. Large Scale Kernel
Machines, pp. 225-250. MIT Press, 2007.