Before a novel treatment is made available to the wider public, clinical trials are undertaken to show that the treatment is safe and efficacious. Development of treatments in small populations is particularly challenging due to the limited number of subjects available for experimentation. Although a particular disease affects relatively few patients, it is estimated that between 27 and 36 million people in the European Union alone suffer from at least one so called rare disease . The standard approach for clinical trials, however, requires a large number of patients to be recruited in a trial.
Bandit models  present an appealing alternative to the standard approach taken in clinical trials. These models provide an idealized mathematical decision making framework for deciding how to optimally allocate a resource among a number of competing uses, given that such allocation is to be done sequentially and under randomly evolving conditions. Although their scope is very general it is clear that a clinical trial which aims to identify the best treatment while treating patients as effectively as possible is a natural application area.
Consider a clinical trial that wants to test the effectiveness of several treatments and uses a “rule” to allocate patients to one of the treatments sequentially. Assume that this “rule” will maximize the expected number of patient “successes” (i.e., positive responses to the drug or treatment). Such a procedure would be adaptive and skew patient allocation towards better performing treatments and is perfectly suited to study new treatments in rare diseases.
This idea of a clinical trial was the initial motivation of (multi-arm) bandit models half a century ago. Gittins and Jones  obtained an optimal, deterministic rule to perform such a task, which is termed the Gittins index rule. This solution has been successfully applied in many fields such as communications networks or online marketing. Ironically, however, the Gittins index has never actually been used in clinical practice, in part due to following issues :
Clinical trials are performed on finite samples, while the Gittins index assumes an infinite horizon;
For the vast majority of medical conditions the outcome of interest is observed with some delay after treatment. The Gittins index, however, requires that each patient's outcome is observed before the next patient is assigned;
The Gittins index has usually been studied under a binomial or normal distribution, while other distributions such as exponential are common in clinical trials;
Fortunately recent generalizations of the Gittins’ bandit model, so called “restless bandits”, would allow for solutions that would overcome the issues of finite horizon and delayed responses. A restless bandit model addressing the issue of finite horizon was proposed in .
In this project we investigate sequential optimization problems in which several treatments (arms) with unknown efficiency compete for allocation to patients with a certain disease. The objective is to identify which treatment is best in order to both learn about treatment efficiency (to improve treatment of future patients) and effectively treat the patient in hand.
The main idea of this project is thus to utilize restless bandits to move away from the assumption of no response delay and contribute to developing a box of models, tools and solutions that are applicable in clinical trials. Specific aims of the project could be:
Modelling of the problem with delayed patients responses as an optimal sequential decision making problem in the stochastic dynamic programming (restless bandit framework)
Design of index policies and their comparison to existing approaches in terms of statistical and optimality performance
Design and study of efficient algorithms for optimal solutions, creation of a software package for and collaboration with statisticians and clinicians to apply designed solutions in real clinical trials
 European Union Committee of Experts on Rare Diseases (2013) 2013 Report on the state of the art of rare disease activity in Europe. http://www.eucerd.eu/upload/file/Reports/2013ReportStateofArtRDActivitie...
 Gittins, J., Glazebrook, K., and Weber, R. (2011). Multi-armed bandit allocation indices. Wiley.
 Gittins, J. C. and Jones, D. M. (1974). A dynamic allocation index for the sequential design of experiments. In Gani, J., Sarkadi, K., and Vincze, I., editors, Progress in Statistics (European Meeting of Statisticians, Budapest, 1972), pages 241-266. North-Holland, Amsterdam, The Netherlands.
 Villar, S. S., Bowden, J., Wason, J. (accepted): Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges. Statistical Science.
- Bandit models