Natasha Jaques 2
Hear about why OpenAI cites her work in RLHF and dialog models, approaches to rewards in RLHF, ChatGPT, Industry vs Academia, PsiPhi-Learning, AGI and more!
Dr Natasha Jaques is a Senior Research Scientist at Google Brain.
Featured References
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard
Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control
Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning
Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar
Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience
Marwa Abdulhai, Natasha Jaques, Sergey Levine
Additional References
Dr Natasha Jaques is a Senior Research Scientist at Google Brain.
Featured References
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard
Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control
Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning
Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar
Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience
Marwa Abdulhai, Natasha Jaques, Sergey Levine
Additional References
- Fine-Tuning Language Models from Human Preferences, Daniel M. Ziegler et al 2019
- Learning to summarize from human feedback, Nisan Stiennon et al 2020
- Training language models to follow instructions with human feedback, Long Ouyang et al 2022
Creators and Guests
Host
Robin Ranjit Singh Chauhan
🌱 Head of Eng @AgFunder 🧠 AI:Reinforcement Learning/ML/DL/NLP🎙️Host @TalkRLPodcast 💳 ex-@Microsoft ecomm PgmMgr 🤖 @UWaterloo CompEng 🇨🇦 🇮🇳