We gratefully acknowledge support from
the Simons Foundation and member institutions.

Shicong Cen is qualified to endorse.

Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF

Shicong Cen: Is registered as an author of this paper.
Can endorse for cs.AI, cs.GT, cs.IT, cs.LG, math.IT, math.OC, stat.ML. (why?)

Jincheng Mei, Katayoon Goshvadi, Hanjun Dai, Tong Yang, Sherry Yang, Dale Schuurmans, Yuejie Chi and Bo Dai are not registered as owners of this paper. (why?)