Shicong Cen is qualified to endorse.
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Shicong Cen: | Is registered as an author of this paper. Can endorse for cs.AI, cs.GT, cs.IT, cs.LG, math.IT, math.OC, stat.ML. (why?) |
Jincheng Mei, Katayoon Goshvadi, Hanjun Dai, Tong Yang, Sherry Yang, Dale Schuurmans, Yuejie Chi and Bo Dai are not registered as owners of this paper. (why?)