Professional Experiences

Postdoctoral Researcher

Univ. of Michigan-Shanghai Jiao Tong Univ. Joint Institute, China - Shanghai
June 2018 (ongoing)
SupervisorPaul Weng (Assistant Professor UM-SJTU Joint Institute)
PublicationsZimmer and Weng (2019), Zimmer and Weng (2019), Lin et al. (2019)
ToolsC++, Python, OpenAI Gym, OpenAI Baselines, LaTex, PyTorch, Tensorflow, Caffe, Roboschool, Octave.

  • Development and analysis of deep reinforcement learning algorithms with continuous environments.
  • Setting up and manage the AAAL team cluster (OAR, IPSEC, NFS, NIS, KVM).
  • Member of the Program Committee of IJCAI 2019 and IJCAI 2020. Reviews for ICLR 2019, ICDL 2019, ACML 2020, ICML 2019, ICRA 2020.
  • Collaboration with Prof. Juan Rojas of Guangdong University of Technology who leads a robotic team. We proposed a way to exploit symmetry naturally present in robotics problems to improve the data efficiency of goal-based reinforcement learning algorithms.
  • Co-supervision of students (12 undergraduates and 1 graduate) for different research projects and to participate in NeurIPS competition tracks.

Research internship

ISIR - UPMC, France - Paris, AMAC
2014 (6 months)
SupervisorStéphane Doncieux (Professor UPMCISIR)
PublicationZimmer and Doncieux (2017)
ToolsC++, C, OpenGL, ODE, Git, Python, Scikit-learn, Octave, Sferes, Bash, LaTex, FANN, OAR.
Study on transfer learning with reward shaping methods within a framework of lifelong learning. Developmental and evolutionary approach. Simulation in C++.
The principle was to first use a direct policy search in the sensorimotor space, i.e. with no pre-defined discrete sets of states nor actions, and then extract from the corresponding learning traces discrete actions and identify the relevant dimensions of the state to estimate the value function. Once this is done, the robot can apply reinforcement learning to be more robust to new domains and, if required, to learn faster than a direct policy search.

Research internship

LIP6, France - Paris, DECISION
2013 (3 months)
SupervisorsPaolo Viappiani (Researcher CNRSLIP6), Paul Weng (Associate Professor UPMCLIP6)
PublicationZimmer et al. (2014)
ToolsC++, C, Git, Torcs, Latex, OAR.
Source Code
Bibliographical research, reading articles and state of the art on the integration of knowledge from an expert during learning.
An agent (the “teacher”) advises another one (the “student”) by suggesting actions the latter should take, while learning a specific task in a sequential decision problem; the teacher is limited by a “budget” (the number of times such advice can be given). Implementation in C++ of a new idea : the teacher is also learning, he learns to give advice to propitious moments to the student. He is learning how to teach better. We provided experimental results with the Mountain car domain, showing how our approach outperforms the state-of-the-art heuristics.

Research internship

INRIA - LORIA, France - Nancy, Cortex & Maia
2012 (6 months)
SupervisorsYann Boniface (ULLORIA), Alain Dutech (INRIALORIA), Nicolas Rougier (Researcher INRIALORIA)
ReportZimmer et al. (2012)
ToolsPython, Latex, Git.
Source Code
Meta-learning in neural networks
Deepening ideas developed in articles of consciousness and meta-representations with multilayer perceptrons. How can they judge their own performances and improve them. Introduction to research, neural networks, latex and python.
A first neural network was learning a classification task, while a second one, called higher-order network, learned to bet if the prediction of the first network was correct from its hidden layer neurons. The higher-order network was indeed capable of learning such information, which meant that it can predict when the first network was going to fail. Thus, we proposed several architectures to combine the two networks in order to increase the overall prediction quality of the first network.


Mathieu Perrein France, France - Waldweistroff
2010 (3 weeks)
C# and WPF development using Microsoft Visual Studio.


PhD in Artificial Intelligence

University of Lorraine (UL), France - Nancy
October 2014–January 2018
LaboratoryLORIA (University of Lorraine, INRIA, CNRS)
TopicDevelopmental Reinforcement Learning (Zimmer, 2018)
SupervisorsAlain Dutech (Researcher INRIALORIA), Yann Boniface (Associate Professor ULLORIA)
ReviewersOlivier Pietquin (Professor University of Lille – Deepmind), Olivier Sigaud (Professor UPMCINRIA)
ExaminersIsabelle Debled (Professor University of Lorraine), Celine Teulière (Associate Professor Institut Pascal)
ToolsC++, Git, OAR, OpenAI Gym, LaTex, Python, Jenkins, Caffe, Octave, ODE, Scikit-learn, FANN, OpenGL
Source Code,

Master in Computer Science

Pierre and Marie Curie University (UPMC), France - Paris
SpecializationArtificial Intelligence and Decision
Research TrainingIntelligent Agents, Learning and Decision
Magna cum laude

Bachelor in Computer Science

University of Lorraine, France - Nancy
Magna cum laude

High School Diploma in Sciences

Lycée Charlemagne, France - Thionville


Qualification - Associate Professor


Teaching assistant

École Nationale Supérieure of Electricity and Mechanics, France - Nancy

162 hours Algorithms and Algorithms and Programming in Python (2nd year integrated preparatory cycle).
Responsible: Jean-Philippe Mangeot
Practical work on python imperative programming (pyzo IDE) : searching, sorting and small games (Reversi, Connect Four, …). I wrote several practical work subject about data structures, Dijkstra, artificial intelligence and networking. I developed a first server (in Java) to create gaming party between two students, so they could challenge their artificial intelligence agent in a tournament determining their grades. During the last year, instead of comparing their agent on small games, we decided that the students had to create autonomous trading agents. Thus, I developed a second server (also in Java) to simulate a stock market exchange. In both cases, the students had simply to interface with the server in python, so they could focus on developing their artificial intelligence. The source code is available here.
8 hours Collaboration and Programming in Java (2nd year integrated preparatory cycle).
I designed and did seminars on Linux command-line, git, continuous integration, object-oriented design, unit testing, threads and synchronization in Java. To let students practice collaborative development, I set up a Jenkins instance communicating with a Github project.

35 hours Algorithmic and Java Object-Oriented Programming (1st year of engineering school).
Responsible: Vincent Chevrier
Practical work on search algorithms, object-oriented design and Lego robot navigation.

Training on Teaching

University of Lorraine, France
20 hours


Muhammad Umer Siddique

SJTU, China - Shanghai, UM-SJTU Joint Institute
2018-2020 (ongoing)
With Paul Weng. Master student: Fair optimization in deep reinforcement Learning.
Adapting deep reinforcement learning algorithms (PPO, A2C and DQN) to optimize a specific multi-objective problem where each objective represents the same measure but for different users. Muhammad adapted Python algorithms from stable-baselines to take into account vector reward and applied them on 3 domains: species conservation, traffic light control and data center control.

Research Projects of Undergraduate Students

SJTU, China - Shanghai, UM-SJTU Joint Institute
2018-2020 (ongoing)
With Paul Weng.

  • 5 students, Fall 2019-Spring 2020, 8 months, Experimental Evaluation of deep reinforcement learning algorithms on HPC over Atari games and PyBullet environments
     PRP – Chenmin Hou, Zhengjie Ji, Shuhyi Zhu, Siwei Ye and Run Peng
  • 1 student, Summer 2019, 4 months, Improving DQN with dynamic discount factor
     VE490 – Xinyang Ren
  • 1 student, Summer 2019, 4 months, Displaying the landscape of deep neural networks for deep reinforcement learning
     VE490 – Yifei Zhang
  • 4 students, Spring 2019, 4 months, Deep reinforcement learning for UAV control
     PRP – Xingyue Qian, Yunfan He, Chen Zhikai and Gaopeng Song
  • 1 student, Spring 2019, 4 months, Model-based reinforcement learning with PILCO on Roboschool
     VE490 – Zhenyuan Zhang

Achille Fedioun

University of Lorraine, France - Nancy, LORIA
2017 (5 months)
With Alain Dutech and Yann Boniface. End-of-studies internship (Master Computer Science and Engineer school): Reinforcement learning with continuous state and action spaces using model-free actor-critic algorithms with deep neural networks.
Achille had to compare the features of two new algorithms (Qprop and ACER) with ours. He extended one of our actor-critic agents with off-policy multi-step replay using the Retrace algorithm in C++. As experimental validation, he used the cluster of the lab to train deep neural networks on the half-cheetah environment.

Nicolas Lefebvre

University of Lorraine, France - Nancy, LORIA
2015 (5 months)
With Alain Dutech and Yann Boniface. End-of-studies internship (Master Cognitive Science): Reinforcement learning with continuous state and action spaces using model-free actor-only algorithms.
Nicolas had to explore if the Power algorithm could be used with neural networks instead of dynamic movement primitives. He developed the Power algorithm inside our C++ framework using Gaussian mixture policies. He experimentally validated his agent on the acrobot environment (double inverted pendulum).

Computer skills

(move your mouse over elements for more information)
Programming languagec++, java, python, octave, c, prolog, c#, ocaml
Machine Learningcaffe, pytorch, tensorflow, scikit-learn, openai baselines
Webj2ee, php, javascript & ajax, html & css, wordpress, laravel
Libraries   boost, caffe, sfml, cegui, glib, apache commons, jflex, java cup, opencv, jenkins
Simulators torcs, ode, openai gym, roboschool
Storagepostgresql, oracle, mysql, sqlite, xml (schema, dtd, xpath)
Utilities    KDevelop, Eclipse, Netbeans, Microsoft Visual Studio, CodeBlocks, LaTex, git, pycharm, spyder
OS.           archlinux, debian, ubuntu

Other Computer cluster (grid5000, aws, google cloud), shell bash & csh, uml, lua


French : Mother tongue

English : Fluent


DevelopmentIsometric 2D game in team, Dynamic website with applet-server, Server management
OthersFree software and Self Hosting, Hardware, IT News, Cryptocurrency
AwardNIPS 2017 – Learning to run (top 100), Funded to attend the IRCN Course in Neuro-Inspired Computation at Tokyo 2019, NeurIPS 2019: Learn to Move (top 20)

SportBadminton (7 years)


PhD Context
Developmental robotics emerged in the 2000s, when researchers initiate to equip robots with sophisticated learning algorithms without providing pre-defined representations and with less specific knowledge given a priori. This thesis is part of this current by adding the hypothesis that the goal of the agent is to maximize a reward signal: it learns by reinforcement. Its body is located in a rich and continuous environment, it does not manipulate discrete symbols, and therefore does not have a countable set of actions or preconceived states. The models learned by the agent are nonlinear (neural networks); it must build its own representations through its many interactions with the environment, without relying on a set of preconceived basic functions. While many reinforcement learning algorithms are discrete or rely on linear models of basic functions, the main question addressed here is how to long-term learn by reinforcement in a continuous space of states and actions, with nonlinear models and less specific knowledge. To address this problem, a final hypothesis is formulated: the body of the agent, and thus the difficulty of the problem it solves, grows with time, in order to allow a partially guided exploration of the research space.