ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

PhD position in Reinforcement Learning for dialogue systems at Orange Labs near Paris

Country/Region : France

Website : http://orange.com

Description

Contact : Romain Laroche (romain.laroche-AT-orange.com)
PhD thesis subject : Reinforcement Learning and coadaptation with users through dialogue
1) Context
Historically, spoken dialogue systems were created to answer the need to automate simple steps of customer relationship thanks to interactive voice response. The growing interest rapidly led the researchers to optimise these services from experience, with reinforcement learning algorithms (Sutton&Barto, 1998). Many research works (Lemon & Pietquin 2007, Laroche 2010, El Asri 2012) have investigated this problem from top to bottom: the design of a dialogue strategy that fits the requirements of the many.
More recently, dialogue systems have been applied to another domain: the personal assistants (Siri, Google now, Cortana, and others). The ineluctable rise of these assistants in smart cars and smart homes will only reinforce, in future, this trend. These assistants have a particularity: they are repeatedly being used and reused by the same person(s).
Consequently, we need to revise the aforementioned research topic: the design of a dialogue strategy that fits the requirements of each user. Therefore, learning must take into account both the experience gathered with all users, and the specific experience gathered with the current user. There is, a tour knowledge, no work on this topic currently.
Personal assistants raise another crucial issue: co-learning. Indeed, the user learns to use the assistant at the same time than the latter is adapting to the former. Co-reinforcement learning has been studied ten years ago when multi-agent systems were popular (Sheppard 1998, Scherrer 2002, Kutschinski 2003), but never in the dialogue context.
2) Scientific objectives.
We consider the following context:
- The system is used by many users
- These users have a repeated and daily use of this system
- The users share use-cases, but also have their own specificities
The general objective of the thesis is to adapt as fast as possible to a new user of the system.
The PhD student will have to model and implement a reinforcement learning algorithm dedicated to personal assistants. Three levels of adaptation to the new user will be taken into account:
- Getting familiar with the user: no specific profiling. Use of a generic policy.
- Profiling: the system categorizes the user into a typical profile.
- Personalisation: the user has his own requirements. The system learns to optimise its interactions with this specific user.
Obviously, the frontier between these three levels of progression is fuzzy. The first scientific challenge concerns the combination of these adaptations into a general model.
The second scientific challenge regards the co-learning dynamics with the user. Technically, it means that the reinforcement learning algorithm must be able:
- to handle the non stationarity of the user’s behaviours
- to anticipate the behavioural evolution of the user thanks to a model of the user’s learning
- to exploit at best this co-learning by becoming an actor of the user’s learning, and by adopting a strategy optimising the user’s learning.

Last modified: 2015-05-25 21:42:38