Arash Ahmadian on Rethinking RLHF
Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.
Featured Reference
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
Featured Reference
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker
Additional References
- Self-Rewarding Language Models, Yuan et al 2024
- Reinforcement Learning: An Introduction, Sutton and Barto 1992
- Learning from Delayed Rewards, Chris Watkins 1989
- Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992
Creators and Guests
Host
Robin Ranjit Singh Chauhan
🌱 Head of Eng @AgFunder 🧠 AI:Reinforcement Learning/ML/DL/NLP🎙️Host @TalkRLPodcast 💳 ex-@Microsoft ecomm PgmMgr 🤖 @UWaterloo CompEng 🇨🇦 🇮🇳