-
VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Authors:
Xuan He,
Dongfu Jiang,
Ge Zhang,
Max Ku,
Achint Soni,
Sherman Siu,
Haonan Chen,
Abhranil Chandra,
Ziyan Jiang,
Aaran Arulraj,
Kai Wang,
Quy Duc Do,
Yuansheng Ni,
Bohan Lyu,
Yaswanth Narsupalli,
Rongqi Fan,
Zhiheng Lyu,
Yuchen Lin,
Wenhu Chen
Abstract:
The recent years have witnessed great advances in video generation. However, the development of automatic video metrics is lagging significantly behind. None of the existing metric is able to provide reliable scores over generated videos. The main barrier is the lack of large-scale human-annotated dataset. In this paper, we release VideoFeedback, the first large-scale dataset containing human-prov…
▽ More
The recent years have witnessed great advances in video generation. However, the development of automatic video metrics is lagging significantly behind. None of the existing metric is able to provide reliable scores over generated videos. The main barrier is the lack of large-scale human-annotated dataset. In this paper, we release VideoFeedback, the first large-scale dataset containing human-provided multi-aspect score over 37.6K synthesized videos from 11 existing video generative models. We train VideoScore (initialized from Mantis) based on VideoFeedback to enable automatic video quality assessment. Experiments show that the Spearman correlation between VideoScore and humans can reach 77.1 on VideoFeedback-test, beating the prior best metrics by about 50 points. Further result on other held-out EvalCrafter, GenAI-Bench, and VBench show that VideoScore has consistently much higher correlation with human judges than other metrics. Due to these results, we believe VideoScore can serve as a great proxy for human raters to (1) rate different video models to track progress (2) simulate fine-grained human feedback in Reinforcement Learning with Human Feedback (RLHF) to improve current video generation models.
△ Less
Submitted 14 October, 2024; v1 submitted 21 June, 2024;
originally announced June 2024.
-
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark (Published at NeurIPS 2024 Track Datasets and Benchmarks)
Authors:
Yubo Wang,
Xueguang Ma,
Ge Zhang,
Yuansheng Ni,
Abhranil Chandra,
Shiguang Guo,
Weiming Ren,
Aaran Arulraj,
Xuan He,
Ziyan Jiang,
Tianle Li,
Max Ku,
Kai Wang,
Alex Zhuang,
Rongqi Fan,
Xiang Yue,
Wenhu Chen
Abstract:
In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can achieve in language comprehension and reasoning across diverse domains. However, as models continue to improve, their performance on these benchmarks has begun to plateau, making it increasingly difficult to discern differences in…
▽ More
In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can achieve in language comprehension and reasoning across diverse domains. However, as models continue to improve, their performance on these benchmarks has begun to plateau, making it increasingly difficult to discern differences in model capabilities. This paper introduces MMLU-Pro, an enhanced dataset designed to extend the mostly knowledge-driven MMLU benchmark by integrating more challenging, reasoning-focused questions and expanding the choice set from four to ten options. Additionally, MMLU-Pro eliminates the trivial and noisy questions in MMLU. Our experimental results show that MMLU-Pro not only raises the challenge, causing a significant drop in accuracy by 16% to 33% compared to MMLU but also demonstrates greater stability under varying prompts. With 24 different prompt styles tested, the sensitivity of model scores to prompt variations decreased from 4-5% in MMLU to just 2% in MMLU-Pro. Additionally, we found that models utilizing Chain of Thought (CoT) reasoning achieved better performance on MMLU-Pro compared to direct answering, which is in stark contrast to the findings on the original MMLU, indicating that MMLU-Pro includes more complex reasoning questions. Our assessments confirm that MMLU-Pro is a more discriminative benchmark to better track progress in the field.
△ Less
Submitted 7 October, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Pressure Induced Metallization of BaMn2As2
Authors:
A. T. Satya,
Awadhesh Mani,
A. Arulraj,
N. V. Chandra Shekar,
K. Vinod,
C. S. Sundar,
A. Bharathi
Abstract:
The temperature and pressure dependent electrical resistivity rho(T,P) studies have been performed on BaMn2As2 single crystal in the 4.2 to 300 K range upto of 8.2 GPa to investigate the evolution of its ground state properties. The rho(T) shows a negative co-efficient of resistivity under pressure upto 3.2 GPa. The occurrence of an insulator to metal transition (MIT) in an external P ~4.5 GPa is…
▽ More
The temperature and pressure dependent electrical resistivity rho(T,P) studies have been performed on BaMn2As2 single crystal in the 4.2 to 300 K range upto of 8.2 GPa to investigate the evolution of its ground state properties. The rho(T) shows a negative co-efficient of resistivity under pressure upto 3.2 GPa. The occurrence of an insulator to metal transition (MIT) in an external P ~4.5 GPa is indicated by a change in the temperature co-efficient in the rho(T) data at ~36 K . However complete metallization in entire temperature range is seen at a P~5.8 GPa. High pressure XRD studies carried out at room temperature also shows an anomaly in the pressure versus volume curve around P ~ 5 GPa, without a change in crystal structure, indicative of an electronic transition. Further, a clear precipitous drop in rho(T) at ~17 K is seen for P ~5.8 GPa which suggests the possibility of the system going over to a superconducting ground state.
△ Less
Submitted 19 November, 2011; v1 submitted 22 October, 2011;
originally announced October 2011.
-
Giant anisotropic magnetostriction in Pr$_{0.5}$Sr$_{0.5}$MnO$_3$
Authors:
R. Mahendiran,
C. Marquina,
M. R. Ibarra,
A. Arulraj,
C. N. R. Rao,
A. Maignan,
B. Raveau
Abstract:
Magnetic, linear thermal expansion (LTE), anisotropic ($λ_t$) and volume ($ω$) magnetostriction properties of Pr$_{0.5}$Sr$_{0.5}$MnO$_3$ were investigated. The LTE decreases smoothly from 300 K without a clear anomaly either around the Curie (T$_C$ = 270 K) or the Neel temperature (T$_N$ = 100 K) and it exhibits hysteresis over a wide temperature range (60 K-270 K) upon warming. Isothermal magn…
▽ More
Magnetic, linear thermal expansion (LTE), anisotropic ($λ_t$) and volume ($ω$) magnetostriction properties of Pr$_{0.5}$Sr$_{0.5}$MnO$_3$ were investigated. The LTE decreases smoothly from 300 K without a clear anomaly either around the Curie (T$_C$ = 270 K) or the Neel temperature (T$_N$ = 100 K) and it exhibits hysteresis over a wide temperature range (60 K-270 K) upon warming. Isothermal magnetization study suggests that 13 % of the ferromagnetic phase coexists with 87 % of the antiferromagnetic phase at 25 K. The parallel and perpendicular magnetostrictions undergo rapid changes during the metamagnetic transition. Contrary to the isotropic giant volume magnetostriction reported in manganites so far, this compound exhibits a giant anisotropic magnetostriction ($λ_t \approx 10^{-3}$) and smaller volume ($ω\approx 10^{-4}$) magnetostrictions below T$_N$. We suggest that the field induced antiferromagnetic to ferromagnetic transition is accompanied by a structural transition from the d$_{x^2-y^2}$ orbital ordered antiferromagnetic (orthorhombic) to the orbital disordered ferromagnetic (tetragonal) phase. The metamagnetic transition proceeds through nucleation and growth of the ferromagnetic domains at the expense of the antiferromagnetic phase. The preferential orientation of the ferromagnetic (tetragonal) domains along the field direction increases the linear dimension of the sample in the field direction and decreases in the orthogonal direction leading to the observed giant anisotropic magnetostriction effect. Our study also suggests that nanodomains of the low temperature antiferromagnetic phase possibly exist in the temperature region T$_N$ < T < T$_C$.
△ Less
Submitted 9 August, 2001; v1 submitted 3 August, 2001;
originally announced August 2001.
-
Collapse of the charge ordering gap of Nd_{0.5}Sr_{0.5}MnO_{3} in an applied magnetic field
Authors:
Amlan Biswas,
Anthony Arulraj,
A. K. Raychaudhuri,
C. N. R. Rao
Abstract:
We report results of tunneling studies on the charge ordering compound Nd_{0.5}Sr_{0.5}MnO_{3} in a magnetic field up to 6T and for temperature down to 25K.We show that a gap (2Δ_{CO} \approx 0.5eV opens up in the density of state (DOS) at the Fermilevel (E_F) on charge ordering (T_{CO}=150K) which collapses in an applied magnetic field when the charge ordered state melts. There is a clear corre…
▽ More
We report results of tunneling studies on the charge ordering compound Nd_{0.5}Sr_{0.5}MnO_{3} in a magnetic field up to 6T and for temperature down to 25K.We show that a gap (2Δ_{CO} \approx 0.5eV opens up in the density of state (DOS) at the Fermilevel (E_F) on charge ordering (T_{CO}=150K) which collapses in an applied magnetic field when the charge ordered state melts. There is a clear correspondence between the behavior of the resistivity and the gap formation and its collapse in an applied magnetic field. We conclude that a gap in the DOS at E_F is necessary for the stability of the charge ordered state.
△ Less
Submitted 27 March, 1999; v1 submitted 5 March, 1999;
originally announced March 1999.