PhD in Artificial Intelligence
LaboratoryLORIA (University of Lorraine, INRIA, CNRS)
TopicDevelopmental Reinforcement Learning
SupervisorsAlain Dutech (Researcher INRIA-LORIA), Yann Boniface (Associate Professor UL-LORIA)
Master in Computer Science
SpecializationArtificial Intelligence and Decision
Research TrainingIntelligent Agents, Learning and Decision
Magna cum laude
Bachelor in Computer Science
High School Diploma in Sciences
Study on transfer learning with reward shaping methods within a framework of lifelong learning. Developmental and evolutionary approach. Simulation in C++.
The principle was to first use a direct policy search in the sensorimotor space, i.e. with no pre-defined discrete sets of states nor actions, and then extract from the corresponding learning traces discrete actions and identify the relevant dimensions of the state to estimate the value function. Once this is done, the robot can apply reinforcement learning to be more robust to new domains and, if required, to learn faster than a direct policy search.
Bibliographical research, reading articles and state of the art on the integration of knowledge from an expert during learning. Implementation in C++ of a new idea : the expert is an agent himself, he learns to give advice to propitious moments to sereval agents themselves currently learning.
An agent (the “teacher”) advises another one (the “student”) by suggesting actions the latter should take, while learning a specific task in a sequential decision problem; the teacher is limited by a “budget” (the number of times such advice can be given). Implementation in C++ of a new idea : the teacher is also learning, he learns to give advice to propitious moments to the student. He is learning how to teach better. We provided experimental results with the Mountain car domain, showing how our approach outperforms the state-of-the-art heuristics.
Meta-learning in neural networks
Deepening ideas developed in articles of consciousness and meta-representations with multilayer perceptrons. How can they judge their own performances and improve them. Introduction to research, neural networks, latex and python.
A first neural network was learning a classification task, while a second one, called higher-order network, learned to bet if the prediction of the first network was correct from its hidden layer neurons. The higher-order network was indeed capable of learning such information, which meant that it can predict when the first network was going to fail. Thus, we proposed several architectures to combine the two networks in order to increase the overall prediction quality of the first network. The source code is available here.
|162 hours||Algorithms and Algorithms and Programming in Python (2nd year integrated preparatory cycle).|
|Practical work on python imperative programming (pyzo IDE) : searching, sorting and small games (Reversi, Connect Four, …). I wrote several practical work subject about data structures, Dijkstra, artificial intelligence and networking. I developed a first server (in Java) to create gaming party between two students, so they could challenge their artificial intelligence agent in a tournament determining their grades. During the last year, instead of comparing their agent on small games, we decided that the students had to create autonomous trading agents. Thus, I developed a second server (also in Java) to simulate a stock market exchange. In both cases, the students had simply to interface with the server in python, so they could focus on developing their artificial intelligence. The source code is available here.|
|8 hours||Collaboration and Programming in Java (2nd year integrated preparatory cycle).|
|I designed and did seminars on Linux command-line, git, continuous integration, object-oriented design, unit testing, threads and synchronization in Java. To let students practice collaborative development, I set up a Jenkins instance communicating with a Github project.|
35 hours Algorithmic and Java Object-Oriented Programming (1st year of engineering school).
Responsible: Vincent Chevrier
Practical work on search algorithms, object-oriented design and Lego robot navigation.
Achille had to compare the features of two new algorithms (Qprop and ACER) with ours. He extended one of our actor-critic agents with off-policy multi-step replay using the Retrace algorithm in C++. As experimental validation, he used the cluster of the lab to train deep neural networks on the half-cheetah environment.
Nicolas had to explore if the Power algorithm could be used with neural networks instead of dynamic movement primitives. He developed the Power algorithm inside our C++ framework using Gaussian mixture policies. He experimentally validated his agent on the acrobot environment (double inverted pendulum).
Other Computer cluster, shell bash & csh, uml, lua
Libraries boost, caffe, sfml, cegui, glib, apache commons, jflex, java cup, opencv
Simulators torcs, ode
Storagepostgresql, oracle, mysql, sqlite, xml (schema, dtd, xpath)
Utilities KDevelop, Eclipse, Netbeans, Microsoft Visual Studio, CodeBlocks, LaTex, git
OS. archlinux, debian, ubuntu
DevelopmentIsometric 2D game in team, Dynamic website with applet-server, Server management
OthersIT News, Free software and Self Hosting, Hardware, Cryptocurrency
SportBadminton (5 years in association)
Developmental robotics emerged in the 2000s, when researchers initiate to equip robots with sophisticated learning algorithms without providing pre-defined representations and with less specific knowledge given a priori. This thesis is part of this current by adding the hypothesis that the goal of the agent is to maximize a reward signal: it learns by reinforcement. Its body is located in a rich and continuous environment, it does not manipulate discrete symbols, and therefore does not have a countable set of actions or preconceived states. The models learned by the agent are nonlinear (neural networks); it must build its own representations through its many interactions with the environment, without relying on a set of preconceived basic functions. While many reinforcement learning algorithms are discrete or rely on linear models of basic functions, the main question addressed here is how to long-term learn by reinforcement in a continuous space of states and actions, with nonlinear models and less specific knowledge. To address this problem, a final hypothesis is formulated: the body of the agent, and thus the difficulty of the problem it solves, grows with time, in order to allow a partially guided exploration of the research space.