Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tor Lattimore

PAC Bounds for Discounted MDPs

Feb 17, 2012
Tor Lattimore, Marcus Hutter

We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finite-state discounted Markov Decision Processes (MDPs). For the upper bound we make the assumption that each action leads to at most two possible next-states and prove a new bound for a UCRL-style algorithm on the number of time-steps when it is not Probably Approximately Correct (PAC). The new lower bound strengthens previous work by being both more general (it applies to all policies) and tighter. The upper and lower bounds match up to logarithmic factors.

* Proc. 23rd International Conf. on Algorithmic Learning Theory (ALT 2012) pages 320-334
* 25 LaTeX pages

Via

Access Paper or Ask Questions

No Free Lunch versus Occam's Razor in Supervised Learning

Nov 16, 2011
Tor Lattimore, Marcus Hutter

Figure 1 for No Free Lunch versus Occam's Razor in Supervised Learning

The No Free Lunch theorems are often used to argue that domain specific knowledge is required to design successful algorithms. We use algorithmic information theory to argue the case for a universal bias allowing an algorithm to succeed in all interesting problem domains. Additionally, we give a new algorithm for off-line classification, inspired by Solomonoff induction, with good performance on all structured problems under reasonable assumptions. This includes a proof of the efficacy of the well-known heuristic of randomly selecting training data in the hope of reducing misclassification rates.

* 16 LaTeX pages, 1 figure

Via

Access Paper or Ask Questions

Asymptotically Optimal Agents

Jul 27, 2011
Tor Lattimore, Marcus Hutter

Artificial general intelligence aims to create agents capable of learning to solve arbitrary interesting problems. We define two versions of asymptotic optimality and prove that no agent can satisfy the strong version while in some cases, depending on discounting, there does exist a non-computable weak asymptotically optimal agent.

* Proc. 22nd International Conf. on Algorithmic Learning Theory (ALT-2011) pages 368-382
* 21 LaTeX pages

Via

Access Paper or Ask Questions

Universal Prediction of Selected Bits

Jul 27, 2011
Tor Lattimore, Marcus Hutter, Vaibhav Gavane

Many learning tasks can be viewed as sequence prediction problems. For example, online classification can be converted to sequence prediction with the sequence being pairs of input/target data and where the goal is to correctly predict the target data given input data and previous input/target pairs. Solomonoff induction is known to solve the general sequence prediction problem, but only if the entire sequence is sampled from a computable distribution. In the case of classification and discriminative learning though, only the targets need be structured (given the inputs). We show that the normalised version of Solomonoff induction can still be used in this case, and more generally that it can detect any recursive sub-pattern (regularity) within an otherwise completely unstructured sequence. It is also shown that the unnormalised version can fail to predict very simple recursive sub-patterns.

* Proc. 22nd International Conf. on Algorithmic Learning Theory (ALT-2011) pages 262-276
* 17 LaTeX pages

Via

Access Paper or Ask Questions

Time Consistent Discounting

Jul 27, 2011
Tor Lattimore, Marcus Hutter

A possibly immortal agent tries to maximise its summed discounted rewards over time, where discounting is used to avoid infinite utilities and encourage the agent to value current rewards more than future ones. Some commonly used discount functions lead to time-inconsistent behavior where the agent changes its plan over time. These inconsistencies can lead to very poor behavior. We generalise the usual discounted utility model to one where the discount function changes with the age of the agent. We then give a simple characterisation of time-(in)consistent discount functions and show the existence of a rational policy for an agent that knows its discount function is time-inconsistent.

* Proc. 22nd International Conf. on Algorithmic Learning Theory (ALT-2011) pages 383-397
* 17 LaTeX pages, 5 figures

Via

Access Paper or Ask Questions