PhD in Artificial Intelligence

University of Lorraine (UL), France - Nancy
October 2014–January 2018

LaboratoryLORIA (University of Lorraine, INRIA, CNRS)
TopicDevelopmental Reinforcement Learning
SupervisorsAlain Dutech (Researcher INRIA-LORIA), Yann Boniface (Associate Professor UL-LORIA)

Master in Computer Science

Pierre and Marie Curie University (UPMC), France - Paris

SpecializationArtificial Intelligence and Decision
Research TrainingIntelligent Agents, Learning and Decision
Magna cum laude

Bachelor in Computer Science

University of Lorraine, France - Nancy
Magna cum laude

High School Diploma in Sciences

Lycée Charlemagne, France - Thionville

Professional Experiences

Postdoctorat Researcher

Univ. of Michigan-Shanghai Jiao Tong Univ. Joint Institute, China - Shanghai
2018 (ongoing)
SupervisorPaul Weng (Assistant Professor)
Developing deep reinforcement learning algorithms.

Research internship

ISIR - UPMC, France - Paris, AMAC
2014 (6 months)
SupervisorStéphane Doncieux (Professor UPMC-ISIR)
Study on transfer learning with reward shaping methods within a framework of lifelong learning. Developmental and evolutionary approach. Simulation in C++.
The principle was to first use a direct policy search in the sensorimotor space, i.e. with no pre-defined discrete sets of states nor actions, and then extract from the corresponding learning traces discrete actions and identify the relevant dimensions of the state to estimate the value function. Once this is done, the robot can apply reinforcement learning to be more robust to new domains and, if required, to learn faster than a direct policy search.

Research internship

LIP6, France - Paris, DECISION
2013 (3 months)
SupervisorPaolo Viappiani (Researcher CNRS-LIP6), Paul Weng (Associate Professor UPMC-LIP6)
Bibliographical research, reading articles and state of the art on the integration of knowledge from an expert during learning. Implementation in C++ of a new idea : the expert is an agent himself, he learns to give advice to propitious moments to sereval agents themselves currently learning.
An agent (the “teacher”) advises another one (the “student”) by suggesting actions the latter should take, while learning a specific task in a sequential decision problem; the teacher is limited by a “budget” (the number of times such advice can be given). Implementation in C++ of a new idea : the teacher is also learning, he learns to give advice to propitious moments to the student. He is learning how to teach better. We provided experimental results with the Mountain car domain, showing how our approach outperforms the state-of-the-art heuristics.

Research internship

INRIA - LORIA, France - Nancy, Cortex & Maia
2012 (6 months)
SupervisorYann Boniface (UL-LORIA), Alain Dutech (INRIA-LORIA), Nicolas Rougier (Researcher INRIA-LORIA)
Meta-learning in neural networks
Deepening ideas developed in articles of consciousness and meta-representations with multilayer perceptrons. How can they judge their own performances and improve them. Introduction to research, neural networks, latex and python.
A first neural network was learning a classification task, while a second one, called higher-order network, learned to bet if the prediction of the first network was correct from its hidden layer neurons. The higher-order network was indeed capable of learning such information, which meant that it can predict when the first network was going to fail. Thus, we proposed several architectures to combine the two networks in order to increase the overall prediction quality of the first network. The source code is available here.


Mathieu Perrein France, France - Waldweistroff
2010 (3 weeks)
C# and WPF development using Microsoft Visual Studio.


Teaching assistant

École Nationale Supérieure of Electricity and Mechanics, France - Nancy
162 hours Algorithms and Algorithms and Programming in Python (2nd year integrated preparatory cycle).
Responsible: Jean-Philippe Mangeot
Practical work on python imperative programming (pyzo IDE) : searching, sorting and small games (Reversi, Connect Four, …). I wrote several practical work subject about data structures, Dijkstra, artificial intelligence and networking. I developed a first server (in Java) to create gaming party between two students, so they could challenge their artificial intelligence agent in a tournament determining their grades. During the last year, instead of comparing their agent on small games, we decided that the students had to create autonomous trading agents. Thus, I developed a second server (also in Java) to simulate a stock market exchange. In both cases, the students had simply to interface with the server in python, so they could focus on developing their artificial intelligence. The source code is available here.
8 hours Collaboration and Programming in Java (2nd year integrated preparatory cycle).
I designed and did seminars on Linux command-line, git, continuous integration, object-oriented design, unit testing, threads and synchronization in Java. To let students practice collaborative development, I set up a Jenkins instance communicating with a Github project.

35 hours Algorithmic and Java Object-Oriented Programming (1st year of engineering school).
Responsible: Vincent Chevrier
Practical work on search algorithms, object-oriented design and Lego robot navigation.


Achille Fedioun

University of Lorraine, France - Nancy, LORIA
2017 (5 months)
End-of-studies internship (Master Computer Science and Engineer school): Reinforcement learning with continuous state and action spaces using model-free actor-critic algorithms with deep neural networks.
Achille had to compare the features of two new algorithms (Qprop and ACER) with ours. He extended one of our actor-critic agents with off-policy multi-step replay using the Retrace algorithm in C++. As experimental validation, he used the cluster of the lab to train deep neural networks on the half-cheetah environment.

Nicolas Lefebvre

University of Lorraine, France - Nancy, LORIA
2015 (5 months)
End-of-studies internship (Master Cognitive Science): Reinforcement learning with continuous state and action spaces using model-free actor-only algorithms.
Nicolas had to explore if the Power algorithm could be used with neural networks instead of dynamic movement primitives. He developed the Power algorithm inside our C++ framework using Gaussian mixture policies. He experimentally validated his agent on the acrobot environment (double inverted pendulum).

Computer skills

(move your mouse over elements for more information)
Softwarec++, java, python, octave, c, prolog, c#, ocaml
Webj2ee, php, javascript & ajax, html & css
Other        Computer cluster, shell bash & csh, uml, lua
Libraries   boost, caffe, sfml, cegui, glib, apache commons, jflex, java cup, opencv
Simulators torcs, ode
Storagepostgresql, oracle, mysql, sqlite, xml (schema, dtd, xpath)
Utilities    KDevelop, Eclipse, Netbeans, Microsoft Visual Studio, CodeBlocks, LaTex, git
OS.           archlinux, debian, ubuntu


French : Mother tongue

English : Fluent


DevelopmentIsometric 2D game in team, Dynamic website with applet-server, Server management
OthersIT News, Free software and Self Hosting, Hardware, Cryptocurrency
SportBadminton (5 years in association)


PhD Context
Developmental robotics emerged in the 2000s, when researchers initiate to equip robots with sophisticated learning algorithms without providing pre-defined representations and with less specific knowledge given a priori. This thesis is part of this current by adding the hypothesis that the goal of the agent is to maximize a reward signal: it learns by reinforcement. Its body is located in a rich and continuous environment, it does not manipulate discrete symbols, and therefore does not have a countable set of actions or preconceived states. The models learned by the agent are nonlinear (neural networks); it must build its own representations through its many interactions with the environment, without relying on a set of preconceived basic functions. While many reinforcement learning algorithms are discrete or rely on linear models of basic functions, the main question addressed here is how to long-term learn by reinforcement in a continuous space of states and actions, with nonlinear models and less specific knowledge. To address this problem, a final hypothesis is formulated: the body of the agent, and thus the difficulty of the problem it solves, grows with time, in order to allow a partially guided exploration of the research space.