We consider active, semi-supervised learning in an offline transductive setting. We show that a previously proposed error bound for active learning on undirected weighted graphs can be generalized by replacing graph cut with an arbitrary symmetric submodular function. Arbitrary non-symmetric submodular functions can be used via symmetrization. Different choices of submodular functions give different versions of the error bound that are appropriate for different kinds of problems. Moreover, the bound is deterministic and holds for adversarially chosen labels. We show exactly minimizing this error bound is NP-complete. However, we also introduce for any submodular function an associated active semi-supervised learning method that approximately minimizes the corresponding error bound. We show that the error bound is tight in the sense that there is no other bound of the same form which is better. Our theoretical results are supported by experiments on real data.
We introduce a natural generalization of submodular set cover and exact active learning with a finite hypothesis class (query learning). We call this new problem interactive submodular set cover. Applications include advertising in social networks with hidden information. We give an approximation guarantee for a novel greedy algorithm and give a hardness of approximation result which matches up to constant factors. We also discuss negative results for simpler approaches and present encouraging early experimental results.
We analyze the expected cost of a greedy active learning algorithm. Our analysis extends previous work to a more general setting in which different queries have different costs. Moreover, queries may have more than two possible responses and the distribution over hypotheses may be non uniform. Specific applications include active learning with label costs, active learning for multiclass and partial label queries, and batch mode active learning. We also discuss an approximate version of interest when there are very many queries.