PublicationsZimmer and Weng (2019), Zimmer and Weng (2019), Lin et al. (2019)
ToolsC++, Python, OpenAI Gym, OpenAI Baselines, LaTex, PyTorch, Tensorflow, Caffe, Roboschool, Octave.
- Development and analysis of deep reinforcement learning algorithms with continuous environments.
- Setting up and manage the AAAL team cluster (OAR, IPSEC, NFS, NIS, KVM).
- Member of the Program Committee of IJCAI 2019 and IJCAI 2020. Reviews for ICLR 2019, ICDL 2019, ACML 2020, ICML 2019, ICRA 2020.
- Collaboration with Prof. Juan Rojas of Guangdong University of Technology who leads a robotic team. We proposed a way to exploit symmetry naturally present in robotics problems to improve the data efficiency of goal-based reinforcement learning algorithms.
- Co-supervision of students (12 undergraduates and 1 graduate) for different research projects and to participate in NeurIPS competition tracks.
PublicationZimmer and Doncieux (2017)
ToolsC++, C, OpenGL, ODE, Git, Python, Scikit-learn, Octave, Sferes, Bash, LaTex, FANN, OAR.
Study on transfer learning with reward shaping methods within a framework of lifelong learning. Developmental and evolutionary approach. Simulation in C++.
The principle was to first use a direct policy search in the sensorimotor space, i.e. with no pre-defined discrete sets of states nor actions, and then extract from the corresponding learning traces discrete actions and identify the relevant dimensions of the state to estimate the value function. Once this is done, the robot can apply reinforcement learning to be more robust to new domains and, if required, to learn faster than a direct policy search.
PublicationZimmer et al. (2014)
ToolsC++, C, Git, Torcs, Latex, OAR.
Bibliographical research, reading articles and state of the art on the integration of knowledge from an expert during learning.
An agent (the “teacher”) advises another one (the “student”) by suggesting actions the latter should take, while learning a specific task in a sequential decision problem; the teacher is limited by a “budget” (the number of times such advice can be given). Implementation in C++ of a new idea : the teacher is also learning, he learns to give advice to propitious moments to the student. He is learning how to teach better. We provided experimental results with the Mountain car domain, showing how our approach outperforms the state-of-the-art heuristics.
ReportZimmer et al. (2012)
ToolsPython, Latex, Git.
Meta-learning in neural networks
Deepening ideas developed in articles of consciousness and meta-representations with multilayer perceptrons. How can they judge their own performances and improve them. Introduction to research, neural networks, latex and python.
A first neural network was learning a classification task, while a second one, called higher-order network, learned to bet if the prediction of the first network was correct from its hidden layer neurons. The higher-order network was indeed capable of learning such information, which meant that it can predict when the first network was going to fail. Thus, we proposed several architectures to combine the two networks in order to increase the overall prediction quality of the first network.
PhD in Artificial Intelligence
TopicDevelopmental Reinforcement Learning (Zimmer, 2018)
SupervisorsAlain Dutech (Researcher INRIA–LORIA), Yann Boniface (Associate Professor UL–LORIA)
ReviewersOlivier Pietquin (Professor University of Lille – Deepmind), Olivier Sigaud (Professor UPMC – INRIA)
ExaminersIsabelle Debled (Professor University of Lorraine), Celine Teulière (Associate Professor Institut Pascal)
ToolsC++, Git, OAR, OpenAI Gym, LaTex, Python, Jenkins, Caffe, Octave, ODE, Scikit-learn, FANN, OpenGL
Source Codehttps://github.com/matthieu637/ddrl, https://github.com/matthieu637/lhpo
Master in Computer Science
Research TrainingIntelligent Agents, Learning and Decision
Magna cum laude
Bachelor in Computer Science
High School Diploma in Sciences
Qualification - Associate Professor
|162 hours||Algorithms and Algorithms and Programming in Python (2nd year integrated preparatory cycle).|
|Practical work on python imperative programming (pyzo IDE) : searching, sorting and small games (Reversi, Connect Four, …). I wrote several practical work subject about data structures, Dijkstra, artificial intelligence and networking. I developed a first server (in Java) to create gaming party between two students, so they could challenge their artificial intelligence agent in a tournament determining their grades. During the last year, instead of comparing their agent on small games, we decided that the students had to create autonomous trading agents. Thus, I developed a second server (also in Java) to simulate a stock market exchange. In both cases, the students had simply to interface with the server in python, so they could focus on developing their artificial intelligence. The source code is available here.|
|8 hours||Collaboration and Programming in Java (2nd year integrated preparatory cycle).|
|I designed and did seminars on Linux command-line, git, continuous integration, object-oriented design, unit testing, threads and synchronization in Java. To let students practice collaborative development, I set up a Jenkins instance communicating with a Github project.|
35 hours Algorithmic and Java Object-Oriented Programming (1st year of engineering school).
Responsible: Vincent Chevrier
Practical work on search algorithms, object-oriented design and Lego robot navigation.
Training on Teaching
Muhammad Umer Siddique
Adapting deep reinforcement learning algorithms (PPO, A2C and DQN) to optimize a specific multi-objective problem where each objective represents the same measure but for different users. Muhammad adapted Python algorithms from stable-baselines to take into account vector reward and applied them on 3 domains: species conservation, traffic light control and data center control.
Research Projects of Undergraduate Students
- 5 students, Fall 2019-Spring 2020, 8 months, Experimental Evaluation of deep reinforcement learning algorithms on HPC over Atari games and PyBullet environments
PRP – Chenmin Hou, Zhengjie Ji, Shuhyi Zhu, Siwei Ye and Run Peng
- 1 student, Summer 2019, 4 months, Improving DQN with dynamic discount factor
VE490 – Xinyang Ren
- 1 student, Summer 2019, 4 months, Displaying the landscape of deep neural networks for deep reinforcement learning
VE490 – Yifei Zhang
- 4 students, Spring 2019, 4 months, Deep reinforcement learning for UAV control
PRP – Xingyue Qian, Yunfan He, Chen Zhikai and Gaopeng Song
- 1 student, Spring 2019, 4 months, Model-based reinforcement learning with PILCO on Roboschool
VE490 – Zhenyuan Zhang
Achille had to compare the features of two new algorithms (Qprop and ACER) with ours. He extended one of our actor-critic agents with off-policy multi-step replay using the Retrace algorithm in C++. As experimental validation, he used the cluster of the lab to train deep neural networks on the half-cheetah environment.
Nicolas had to explore if the Power algorithm could be used with neural networks instead of dynamic movement primitives. He developed the Power algorithm inside our C++ framework using Gaussian mixture policies. He experimentally validated his agent on the acrobot environment (double inverted pendulum).
Machine Learningcaffe, pytorch, tensorflow, scikit-learn, openai baselines
Libraries boost, caffe, sfml, cegui, glib, apache commons, jflex, java cup, opencv, jenkins
Simulators torcs, ode, openai gym, roboschool
Storagepostgresql, oracle, mysql, sqlite, xml (schema, dtd, xpath)
Utilities KDevelop, Eclipse, Netbeans, Microsoft Visual Studio, CodeBlocks, LaTex, git, pycharm, spyder
OS. archlinux, debian, ubuntu
Other Computer cluster (grid5000, aws, google cloud), shell bash & csh, uml, lua
DevelopmentIsometric 2D game in team, Dynamic website with applet-server, Server management
OthersFree software and Self Hosting, Hardware, IT News, Cryptocurrency
AwardNIPS 2017 – Learning to run (top 100), Funded to attend the IRCN Course in Neuro-Inspired Computation at Tokyo 2019, NeurIPS 2019: Learn to Move (top 20)
SportBadminton (7 years)
Developmental robotics emerged in the 2000s, when researchers initiate to equip robots with sophisticated learning algorithms without providing pre-defined representations and with less specific knowledge given a priori. This thesis is part of this current by adding the hypothesis that the goal of the agent is to maximize a reward signal: it learns by reinforcement. Its body is located in a rich and continuous environment, it does not manipulate discrete symbols, and therefore does not have a countable set of actions or preconceived states. The models learned by the agent are nonlinear (neural networks); it must build its own representations through its many interactions with the environment, without relying on a set of preconceived basic functions. While many reinforcement learning algorithms are discrete or rely on linear models of basic functions, the main question addressed here is how to long-term learn by reinforcement in a continuous space of states and actions, with nonlinear models and less specific knowledge. To address this problem, a final hypothesis is formulated: the body of the agent, and thus the difficulty of the problem it solves, grows with time, in order to allow a partially guided exploration of the research space.