-
GPU Accelerated Sparse Cholesky Factorization
Authors:
M. Ozan Karsavuran,
Esmond G. Ng,
Barry W. Peyton
Abstract:
The solution of sparse symmetric positive definite linear systems is an important computational kernel in large-scale scientific and engineering modeling and simulation. We will solve the linear systems using a direct method, in which a Cholesky factorization of the coefficient matrix is performed using a right-looking approach and the resulting triangular factors are used to compute the solution.…
▽ More
The solution of sparse symmetric positive definite linear systems is an important computational kernel in large-scale scientific and engineering modeling and simulation. We will solve the linear systems using a direct method, in which a Cholesky factorization of the coefficient matrix is performed using a right-looking approach and the resulting triangular factors are used to compute the solution. Sparse Cholesky factorization is compute intensive. In this work we investigate techniques for reducing the factorization time in sparse Cholesky factorization by offloading some of the dense matrix operations on a GPU. We will describe the techniques we have considered. We achieved up to 4x speedup compared to the CPU-only version.
△ Less
Submitted 23 September, 2024; v1 submitted 21 September, 2024;
originally announced September 2024.
-
Some new techniques to use in serial sparse Cholesky factorization algorithms
Authors:
M. Ozan Karsavuran,
Esmond G. Ng,
Barry W. Peyton,
Jonathan L. Peyton
Abstract:
We present a new variant of serial right-looking supernodal sparse Cholesky factorization (RL). Our comparison of RL with the multifrontal method confirms that RL is simpler, slightly faster, and requires slightly less storage. The key to the rest of the work in this paper is recent work on reordering columns within supernodes so that the dense off-diagonal blocks in the factor matrix joining pair…
▽ More
We present a new variant of serial right-looking supernodal sparse Cholesky factorization (RL). Our comparison of RL with the multifrontal method confirms that RL is simpler, slightly faster, and requires slightly less storage. The key to the rest of the work in this paper is recent work on reordering columns within supernodes so that the dense off-diagonal blocks in the factor matrix joining pairs of supernodes are fewer and larger. We present a second new variant of serial right-looking supernodal sparse Cholesky factorization (RLB), where this one is specifically designed to exploit fewer and larger off-diagonal blocks in the factor matrix obtained by reordering within supernodes. A key distinction found in RLB is that it uses no floating-point working storage and performs no assembly operations. Our key finding is that RLB is unequivocally faster than its competitors. Indeed, RLB is consistently, but modestly, faster than its competitors whenever Intel's MKL sequential BLAS are used. More importantly, RLB is substantially faster than its competitors whenever Intel's MKL multithreaded BLAS are used. Finally, RLB using the multithreaded BLAS achieves impressive speedups over RLB using the sequential BLAS.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
Mica: Automated Differential Testing for OCaml Modules
Authors:
Ernest Ng,
Harrison Goldstein,
Benjamin C. Pierce
Abstract:
Suppose we are given two OCaml modules implementing the same signature. How do we check that they are observationally equivalent -- that is, that they behave the same on all inputs? One established technique is to use a property-based testing (PBT) tool such as QuickCheck. Currently, however, this can require significant amounts of boilerplate code and ad-hoc test harnesses. To address this issue,…
▽ More
Suppose we are given two OCaml modules implementing the same signature. How do we check that they are observationally equivalent -- that is, that they behave the same on all inputs? One established technique is to use a property-based testing (PBT) tool such as QuickCheck. Currently, however, this can require significant amounts of boilerplate code and ad-hoc test harnesses. To address this issue, we present Mica, an automated tool for testing observational equivalence of OCaml modules. Mica is implemented as a PPX compiler extension, allowing users to supply minimal annotations to a module signature. These annotations guide Mica to automatically derive specialized PBT code that checks observational equivalence. We discuss the design of Mica and demonstrate its efficacy as a testing tool on various modules taken from real-world OCaml libraries.
△ Less
Submitted 26 August, 2024;
originally announced August 2024.
-
Accelerating Eigenvalue Computation for Nuclear Structure Calculations via Perturbative Corrections
Authors:
Dong Min Roh,
Esmond Ng,
Chao Yang,
Dean Lee,
Pieter Maris,
James P. Vary
Abstract:
We present a new method for computing the lowest few eigenvalues and the corresponding eigenvectors of a nuclear many-body Hamiltonian represented in a truncated configuration interaction subspace, i.e., the no-core shell model (NCSM). The method uses the hierarchical structure of the NCSM Hamiltonian to partition the Hamiltonian as the sum of two matrices. The first matrix corresponds to the Hami…
▽ More
We present a new method for computing the lowest few eigenvalues and the corresponding eigenvectors of a nuclear many-body Hamiltonian represented in a truncated configuration interaction subspace, i.e., the no-core shell model (NCSM). The method uses the hierarchical structure of the NCSM Hamiltonian to partition the Hamiltonian as the sum of two matrices. The first matrix corresponds to the Hamiltonian represented in a small configuration space, whereas the second is viewed as the perturbation to the first matrix. Eigenvalues and eigenvectors of the first matrix can be computed efficiently. Perturbative corrections to the eigenvectors of the first matrix can be obtained from the solutions of a sequence of linear systems of equations defined in the small configuration space. These correction vectors can be combined with the approximate eigenvectors of the first matrix to construct a subspace from which more accurate approximations of the desired eigenpairs can be obtained. We call this method a Subspace Projection with Perturbative Corrections (SPPC) method. We show by numerical examples that the SPPC method can be more efficient than conventional iterative methods for solving large-scale eigenvalue problems such as the Lanczos, block Lanczos and the locally optimal block preconditioned conjugate gradient (LOBPCG) method. The method can also be combined with other methods to avoid convergence stagnation.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Hybrid AM/FM Mode-Locking of Singly-Resonant OPOs
Authors:
Ryan Hamerly,
Evan Laksono,
Marc Jankowski,
Edwin Ng,
Noah Flemens,
Myoung-Gyun Suh,
Hideo Mabuchi
Abstract:
We investigate a new mode-locking regime in the singly-resonant OPO employing simultaneous amplitude- and frequency-modulation of the intracavity field. This OPO exhibits deterministic, "turn-key" formation of a stable, broadband, chirped frequency comb with high conversion efficiency. Comb-forming dynamics follow a simple phase-space dynamical model, governed by cavity dispersion and modulator ch…
▽ More
We investigate a new mode-locking regime in the singly-resonant OPO employing simultaneous amplitude- and frequency-modulation of the intracavity field. This OPO exhibits deterministic, "turn-key" formation of a stable, broadband, chirped frequency comb with high conversion efficiency. Comb-forming dynamics follow a simple phase-space dynamical model, governed by cavity dispersion and modulator chirp, which agrees closely with full numerical simulations. The comb exhibits fast, mode-hop-free tuning over the full gain window of the OPA crystal, controlled by the modulator frequency. Conditions for comb stability, and techniques to enhance comb bandwidth through intentional phase-mismatch and chirping, are investigated.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Pose Priors from Language Models
Authors:
Sanjay Subramanian,
Evonne Ng,
Lea Müller,
Dan Klein,
Shiry Ginosar,
Trevor Darrell
Abstract:
We present a zero-shot pose optimization method that enforces accurate physical contact constraints when estimating the 3D pose of humans. Our central insight is that since language is often used to describe physical interaction, large pretrained text-based models can act as priors on pose estimation.
We can thus leverage this insight to improve pose estimation by converting natural language des…
▽ More
We present a zero-shot pose optimization method that enforces accurate physical contact constraints when estimating the 3D pose of humans. Our central insight is that since language is often used to describe physical interaction, large pretrained text-based models can act as priors on pose estimation.
We can thus leverage this insight to improve pose estimation by converting natural language descriptors, generated by a large multimodal model (LMM), into tractable losses to constrain the 3D pose optimization. Despite its simplicity, our method produces surprisingly compelling pose reconstructions of people in close contact, correctly capturing the semantics of the social and physical interactions. We demonstrate that our method rivals more complex state-of-the-art approaches that require expensive human annotation of contact points and training specialized models. Moreover, unlike previous approaches, our method provides a unified framework for resolving self-contact and person-to-person contact.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Skew-Gaussian model of small-photon-number coherent Ising machines
Authors:
Yoshitaka Inui,
Edwin Ng,
Yoshihisa Yamamoto
Abstract:
A Gaussian quantum theory of bosonic modes has been widely used to describe quantum optical systems, including coherent Ising machines (CIMs) that consist of $χ^{(2)}$ degenerate optical parametric oscillators (DOPOs) as nonlinear elements. However, Gaussian models have been thought to be invalid in the extremely strong-gain-saturation limit. Here, we develop an extended Gaussian model including t…
▽ More
A Gaussian quantum theory of bosonic modes has been widely used to describe quantum optical systems, including coherent Ising machines (CIMs) that consist of $χ^{(2)}$ degenerate optical parametric oscillators (DOPOs) as nonlinear elements. However, Gaussian models have been thought to be invalid in the extremely strong-gain-saturation limit. Here, we develop an extended Gaussian model including two third-order fluctuation products, $\langle δ\hat{X}^3\rangle$ and $\langle δ\hat{X}δ\hat{P}^2\rangle$, which we call self-skewness and cross-skewness, respectively. This new model which we call skew-Gaussian model more precisely replicates the success probability predicted by the quantum master equation (QME), relative to Gaussian models. We also discuss the impact of skew variables on the performance of CIMs.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
Ultrafast second-order nonlinear photonics -- from classical physics to non-Gaussian quantum dynamics
Authors:
Marc Jankowski,
Ryotatsu Yanagimoto,
Edwin Ng,
Ryan Hamerly,
Timothy P. McKenna,
Hideo Mabuchi,
M. M. Fejer
Abstract:
Photonic integrated circuits with second-order ($χ^{(2)}$) nonlinearities are rapidly scaling to remarkably low powers. At this time, state-of-the-art devices achieve saturated nonlinear interactions with thousands of photons when driven by continuous-wave lasers, and further reductions in these energy requirements enabled by the use of ultrafast pulses may soon push nonlinear optics into the real…
▽ More
Photonic integrated circuits with second-order ($χ^{(2)}$) nonlinearities are rapidly scaling to remarkably low powers. At this time, state-of-the-art devices achieve saturated nonlinear interactions with thousands of photons when driven by continuous-wave lasers, and further reductions in these energy requirements enabled by the use of ultrafast pulses may soon push nonlinear optics into the realm of single-photon nonlinearities. This tutorial reviews these recent developments in ultrafast nonlinear photonics, discusses design strategies for realizing few-photon nonlinear interactions, and presents a unified treatment of ultrafast quantum nonlinear optics using a framework that smoothly interpolates from classical behaviors to the few-photon scale. These emerging platforms for quantum optics fundamentally differ from typical realizations in cavity quantum electrodynamics due to the large number of coupled optical modes. Classically, multimode behaviors have been well studied in nonlinear optics, with famous examples including soliton formation and supercontinuum generation. In contrast, multimode quantum systems exhibit a far greater variety of behaviors, and yet closed-form solutions are even sparser than their classical counterparts. In developing a framework for ultrafast quantum optics, we will identify what behaviors carry over from classical to quantum devices, what intuition must be abandoned, and what new opportunities exist at the intersection of ultrafast and quantum nonlinear optics. While this article focuses on establishing connections between the classical and quantum behaviors of devices with $χ^{(2)}$ nonlinearities, the frameworks developed here are general and are readily extended to the description of dynamical processes based on third-order ($χ^{(3)}$) nonlinearities.
△ Less
Submitted 17 January, 2024; v1 submitted 11 January, 2024;
originally announced January 2024.
-
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Authors:
Evonne Ng,
Javier Romero,
Timur Bagautdinov,
Shaojie Bai,
Trevor Darrell,
Angjoo Kanazawa,
Alexander Richard
Abstract:
We present a framework for generating full-bodied photorealistic avatars that gesture according to the conversational dynamics of a dyadic interaction. Given speech audio, we output multiple possibilities of gestural motion for an individual, including face, body, and hands. The key behind our method is in combining the benefits of sample diversity from vector quantization with the high-frequency…
▽ More
We present a framework for generating full-bodied photorealistic avatars that gesture according to the conversational dynamics of a dyadic interaction. Given speech audio, we output multiple possibilities of gestural motion for an individual, including face, body, and hands. The key behind our method is in combining the benefits of sample diversity from vector quantization with the high-frequency details obtained through diffusion to generate more dynamic, expressive motion. We visualize the generated motion using highly photorealistic avatars that can express crucial nuances in gestures (e.g. sneers and smirks). To facilitate this line of research, we introduce a first-of-its-kind multi-view conversational dataset that allows for photorealistic reconstruction. Experiments show our model generates appropriate and diverse gestures, outperforming both diffusion- and VQ-only methods. Furthermore, our perceptual evaluation highlights the importance of photorealism (vs. meshes) in accurately assessing subtle motion details in conversational gestures. Code and dataset available online.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
Mesoscopic ultrafast nonlinear optics -- The emergence of multimode quantum non-Gaussian physics
Authors:
Ryotatsu Yanagimoto,
Edwin Ng,
Marc Jankowski,
Rajveer Nehra,
Timothy P. McKenna,
Tatsuhiro Onodera,
Logan G. Wright,
Ryan Hamerly,
Alireza Marandi,
M. M. Fejer,
Hideo Mabuchi
Abstract:
Over the last few decades, nonlinear optics has become significantly more nonlinear, traversing nearly a billionfold improvement in energy efficiency, with ultrafast nonlinear nanophotonics in particular emerging as a frontier for combining both spatial and temporal engineering. At present, cutting-edge experiments in nonlinear nanophotonics place us just above the mesoscopic regime, where a few h…
▽ More
Over the last few decades, nonlinear optics has become significantly more nonlinear, traversing nearly a billionfold improvement in energy efficiency, with ultrafast nonlinear nanophotonics in particular emerging as a frontier for combining both spatial and temporal engineering. At present, cutting-edge experiments in nonlinear nanophotonics place us just above the mesoscopic regime, where a few hundred photons suffice to trigger nonlinear saturation. In contrast to classical or deep-quantum optics, the mesoscale is characterized by dynamical interactions between mean-field, Gaussian, and non-Gaussian quantum features, all within a close hierarchy of scales. When combined with the inherent multimode complexity of optical fields, such hybrid quantum-classical dynamics present theoretical, experimental, and engineering challenges to the contemporary framework of quantum optics. In this review, we highlight the unique physics that emerges in multimode nonlinear optics at the mesoscale and outline key principles for exploiting both classical and quantum features to engineer novel functionalities. We briefly survey the experimental landscape and draw attention to outstanding technical challenges in materials, dispersion engineering, and device design for accessing mesoscopic operation. Finally, we speculate on how these capabilities might usher in some new paradigms in quantum photonics, from quantum-augmented information processing to nonclassical-light-driven dynamics and phenomena to all-optical non-Gaussian measurement and sensing. The physics unlocked at the mesoscale present significant challenges and opportunities in theory and experiment alike, and this review is intended to serve as a guidepost as we begin to navigate this new frontier in ultrafast quantum nonlinear optics.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
Using system-reservoir methods to derive effective field theories for broadband nonlinear quantum optics: a case study on cascaded quadratic nonlinearities
Authors:
Chris Gustin,
Ryotatsu Yanagimoto,
Edwin Ng,
Tatsuhiro Onodera,
Hideo Mabuchi
Abstract:
In broadband quantum optical systems, nonlinear interactions among a large number of frequency components induce complex dynamics that may defy heuristic analysis. In this work we introduce a perturbative framework for factoring out reservoir degrees of freedom and establishing a concise effective model (effective field theory) for the remaining system. Our approach combines approximate diagonaliz…
▽ More
In broadband quantum optical systems, nonlinear interactions among a large number of frequency components induce complex dynamics that may defy heuristic analysis. In this work we introduce a perturbative framework for factoring out reservoir degrees of freedom and establishing a concise effective model (effective field theory) for the remaining system. Our approach combines approximate diagonalization of judiciously partitioned subsystems with master equation techniques. We consider cascaded optical $χ^{(2)}$ (quadratic) nonlinearities as an example and show that the dynamics can be construed (to leading order) as self-phase modulations of dressed fundamental modes plus cross-phase modulations of dressed fundamental and second-harmonic modes. We then formally eliminate the second-harmonic degrees of freedom and identify emergent features of the fundamental wave dynamics, such as two-photon loss channels, and examine conditions for accuracy of the reduced model in dispersive and dissipative parameter regimes. Our results highlight the utility of system-reservoir methods for deriving accurate, intuitive reduced models for complex dynamics in broadband nonlinear quantum photonics.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
Zen: Near-Optimal Sparse Tensor Synchronization for Distributed DNN Training
Authors:
Zhuang Wang,
Zhaozhuo Xu,
Anshumali Shrivastava,
T. S. Eugene Ng
Abstract:
Distributed training is the de facto standard to scale up the training of Deep Neural Networks (DNNs) with multiple GPUs. The performance bottleneck of distributed training lies in communications for gradient synchronization. Recently, practitioners have observed sparsity in gradient tensors, suggesting the potential to reduce the traffic volume in communication and improve end-to-end training eff…
▽ More
Distributed training is the de facto standard to scale up the training of Deep Neural Networks (DNNs) with multiple GPUs. The performance bottleneck of distributed training lies in communications for gradient synchronization. Recently, practitioners have observed sparsity in gradient tensors, suggesting the potential to reduce the traffic volume in communication and improve end-to-end training efficiency. Yet, the optimal communication scheme to fully leverage sparsity is still missing. This paper aims to address this gap. We first analyze the characteristics of sparse tensors in popular DNN models to understand the fundamentals of sparsity. We then systematically explore the design space of communication schemes for sparse tensors and find the optimal one. % We then find the optimal scheme based on the characteristics by systematically exploring the design space. We also develop a gradient synchronization system called Zen that approximately realizes it for sparse tensors. We demonstrate that Zen can achieve up to 5.09x speedup in communication time and up to 2.48x speedup in training throughput compared to the state-of-the-art methods.
△ Less
Submitted 23 September, 2023;
originally announced September 2023.
-
Can Language Models Learn to Listen?
Authors:
Evonne Ng,
Sanjay Subramanian,
Dan Klein,
Angjoo Kanazawa,
Trevor Darrell,
Shiry Ginosar
Abstract:
We present a framework for generating appropriate facial responses from a listener in dyadic social interactions based on the speaker's words. Given an input transcription of the speaker's words with their timestamps, our approach autoregressively predicts a response of a listener: a sequence of listener facial gestures, quantized using a VQ-VAE. Since gesture is a language component, we propose t…
▽ More
We present a framework for generating appropriate facial responses from a listener in dyadic social interactions based on the speaker's words. Given an input transcription of the speaker's words with their timestamps, our approach autoregressively predicts a response of a listener: a sequence of listener facial gestures, quantized using a VQ-VAE. Since gesture is a language component, we propose treating the quantized atomic motion elements as additional language token inputs to a transformer-based large language model. Initializing our transformer with the weights of a language model pre-trained only on text results in significantly higher quality listener responses than training a transformer from scratch. We show that our generated listener motion is fluent and reflective of language semantics through quantitative metrics and a qualitative user study. In our evaluation, we analyze the model's ability to utilize temporal and semantic aspects of spoken text. Project page: https://people.eecs.berkeley.edu/~evonne_ng/projects/text2listen/
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
Quantum noise dynamics in nonlinear pulse propagation
Authors:
Edwin Ng,
Ryotatsu Yanagimoto,
Marc Jankowski,
M. M. Fejer,
Hideo Mabuchi
Abstract:
The propagation of ultrafast pulses in dispersion-engineered waveguides, exhibiting strong field confinement in both space and time, is a promising avenue towards single-photon nonlinearities in an all-optical platform. However, quantum engineering in such systems requires new numerical tools and physical insights to harness their complicated multimode and nonlinear quantum dynamics. In this work,…
▽ More
The propagation of ultrafast pulses in dispersion-engineered waveguides, exhibiting strong field confinement in both space and time, is a promising avenue towards single-photon nonlinearities in an all-optical platform. However, quantum engineering in such systems requires new numerical tools and physical insights to harness their complicated multimode and nonlinear quantum dynamics. In this work, we use a self-consistent, multimode Gaussian-state model to capture the nonlinear dynamics of broadband quantum fluctuations and correlations, including entanglement. Notably, despite its parametrization by Gaussian states, our model exhibits nonlinear dynamics in both the mean field and the quantum correlations, giving it a marked advantage over conventional linearized treatments of quantum noise, especially for systems exhibiting gain saturation and strong nonlinearities. Numerically, our approach takes the form of a Gaussian split-step Fourier (GSSF) method, naturally generalizing highly efficient SSF methods used in classical ultrafast nonlinear optics; the equations for GSSF evaluate in $O(M^2\log M)$ time for an $M$-mode system with $O(M^2)$ quantum correlations. To demonstrate the broad applicability of GSSF, we numerically study quantum noise dynamics and multimode entanglement in several ultrafast systems, from canonical soliton propagation in third-order ($χ^{(3)}$) waveguides to saturated $χ^{(2)}$ broadband parametric generation and supercontinuum generation, e.g., as recently demonstrated in thin-film lithium niobate nanophotonics.
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
Diffusion Co-Policy for Synergistic Human-Robot Collaborative Tasks
Authors:
Eley Ng,
Ziang Liu,
Monroe Kennedy III
Abstract:
Modeling multimodal human behavior has been a key barrier to increasing the level of interaction between human and robot, particularly for collaborative tasks. Our key insight is that an effective, learned robot policy used for human-robot collaborative tasks must be able to express a high degree of multimodality, predict actions in a temporally consistent manner, and recognize a wide range of fre…
▽ More
Modeling multimodal human behavior has been a key barrier to increasing the level of interaction between human and robot, particularly for collaborative tasks. Our key insight is that an effective, learned robot policy used for human-robot collaborative tasks must be able to express a high degree of multimodality, predict actions in a temporally consistent manner, and recognize a wide range of frequencies of human actions in order to seamlessly integrate with a human in the control loop. We present Diffusion Co-policy, a method for planning sequences of actions that synergize well with humans during test time. The co-policy predicts joint human-robot action sequences via a Transformer-based diffusion model, which is trained on a dataset of collaborative human-human demonstrations, and directly executes the robot actions in a receding horizon control framework. We demonstrate in both simulation and real environments that the method outperforms other state-of-art learning methods on the task of human-robot table-carrying with a human in the loop. Moreover, we qualitatively highlight compelling robot behaviors that demonstrate evidence of true human-robot collaboration, including mutual adaptation, shared task understanding, leadership switching, and low levels of wasteful interaction forces arising from dissent.
△ Less
Submitted 12 November, 2023; v1 submitted 20 May, 2023;
originally announced May 2023.
-
Engineering cubic quantum nondemolition Hamiltonian with mesoscopic optical parametric interactions
Authors:
Ryotatsu Yanagimoto,
Rajveer Nehra,
Edwin Ng,
Alireza Marandi,
Hideo Mabuchi
Abstract:
We propose a scheme to realize cubic quantum nondemolition (QND) Hamiltonian with optical parametric interactions. We show that strongly squeezed fundamental and second harmonic fields propagating in a $χ^{(2)}$ nonlinear medium effectively evolve under a cubic QND Hamiltonian. We highlight the versatility offered by such Hamiltonian for engineering non-Gaussian quantum states, such as Schrödinger…
▽ More
We propose a scheme to realize cubic quantum nondemolition (QND) Hamiltonian with optical parametric interactions. We show that strongly squeezed fundamental and second harmonic fields propagating in a $χ^{(2)}$ nonlinear medium effectively evolve under a cubic QND Hamiltonian. We highlight the versatility offered by such Hamiltonian for engineering non-Gaussian quantum states, such as Schrödinger cat states and cubic phase states. We show that our scheme can be highly tolerant against overall detection inefficiency with an auxiliary high-gain phase-sensitive optical amplifier. Our proposal involves parametric interactions in a mesoscopic photon-number regime, significantly enhancing the effective nonlinear coupling from the natïve single-photon coupling rate while providing powerful means to fight photon propagation loss. Experimental numbers suggest that our scheme might be feasible in the near future, particularly with pulsed nonlinear nanophotonics.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
Nerfstudio: A Modular Framework for Neural Radiance Field Development
Authors:
Matthew Tancik,
Ethan Weber,
Evonne Ng,
Ruilong Li,
Brent Yi,
Justin Kerr,
Terrance Wang,
Alexander Kristoffersen,
Jake Austin,
Kamyar Salahi,
Abhik Ahuja,
David McAllister,
Angjoo Kanazawa
Abstract:
Neural Radiance Fields (NeRF) are a rapidly growing area of research with wide-ranging applications in computer vision, graphics, robotics, and more. In order to streamline the development and deployment of NeRF research, we propose a modular PyTorch framework, Nerfstudio. Our framework includes plug-and-play components for implementing NeRF-based methods, which make it easy for researchers and pr…
▽ More
Neural Radiance Fields (NeRF) are a rapidly growing area of research with wide-ranging applications in computer vision, graphics, robotics, and more. In order to streamline the development and deployment of NeRF research, we propose a modular PyTorch framework, Nerfstudio. Our framework includes plug-and-play components for implementing NeRF-based methods, which make it easy for researchers and practitioners to incorporate NeRF into their projects. Additionally, the modular design enables support for extensive real-time visualization tools, streamlined pipelines for importing captured in-the-wild data, and tools for exporting to video, point cloud and mesh representations. The modularity of Nerfstudio enables the development of Nerfacto, our method that combines components from recent papers to achieve a balance between speed and quality, while also remaining flexible to future modifications. To promote community-driven development, all associated code and data are made publicly available with open-source licensing at https://nerf.studio.
△ Less
Submitted 16 October, 2023; v1 submitted 8 February, 2023;
originally announced February 2023.
-
Contrastive Learning for Self-Supervised Pre-Training of Point Cloud Segmentation Networks With Image Data
Authors:
Andrej Janda,
Brandon Wagstaff,
Edwin G. Ng,
Jonathan Kelly
Abstract:
Reducing the quantity of annotations required for supervised training is vital when labels are scarce and costly. This reduction is particularly important for semantic segmentation tasks involving 3D datasets, which are often significantly smaller and more challenging to annotate than their image-based counterparts. Self-supervised pre-training on unlabelled data is one way to reduce the amount of…
▽ More
Reducing the quantity of annotations required for supervised training is vital when labels are scarce and costly. This reduction is particularly important for semantic segmentation tasks involving 3D datasets, which are often significantly smaller and more challenging to annotate than their image-based counterparts. Self-supervised pre-training on unlabelled data is one way to reduce the amount of manual annotations needed. Previous work has focused on pre-training with point clouds exclusively. While useful, this approach often requires two or more registered views. In the present work, we combine image and point cloud modalities by first learning self-supervised image features and then using these features to train a 3D model. By incorporating image data, which is often included in many 3D datasets, our pre-training method only requires a single scan of a scene and can be applied to cases where localization information is unavailable. We demonstrate that our pre-training approach, despite using single scans, achieves comparable performance to other multi-scan, point cloud-only methods.
△ Less
Submitted 4 September, 2023; v1 submitted 17 January, 2023;
originally announced January 2023.
-
Self-Supervised Pre-training of 3D Point Cloud Networks with Image Data
Authors:
Andrej Janda,
Brandon Wagstaff,
Edwin G. Ng,
Jonathan Kelly
Abstract:
Reducing the quantity of annotations required for supervised training is vital when labels are scarce and costly. This reduction is especially important for semantic segmentation tasks involving 3D datasets that are often significantly smaller and more challenging to annotate than their image-based counterparts. Self-supervised pre-training on large unlabelled datasets is one way to reduce the amo…
▽ More
Reducing the quantity of annotations required for supervised training is vital when labels are scarce and costly. This reduction is especially important for semantic segmentation tasks involving 3D datasets that are often significantly smaller and more challenging to annotate than their image-based counterparts. Self-supervised pre-training on large unlabelled datasets is one way to reduce the amount of manual annotations needed. Previous work has focused on pre-training with point cloud data exclusively; this approach often requires two or more registered views. In the present work, we combine image and point cloud modalities, by first learning self-supervised image features and then using these features to train a 3D model. By incorporating image data, which is often included in many 3D datasets, our pre-training method only requires a single scan of a scene. We demonstrate that our pre-training approach, despite using single scans, achieves comparable performance to other multi-scan, point cloud-only methods.
△ Less
Submitted 16 December, 2022; v1 submitted 21 November, 2022;
originally announced November 2022.
-
It Takes Two: Learning to Plan for Human-Robot Cooperative Carrying
Authors:
Eley Ng,
Ziang Liu,
Monroe Kennedy III
Abstract:
Cooperative table-carrying is a complex task due to the continuous nature of the action and state-spaces, multimodality of strategies, and the need for instantaneous adaptation to other agents. In this work, we present a method for predicting realistic motion plans for cooperative human-robot teams on the task. Using a Variational Recurrent Neural Network (VRNN) to model the variation in the traje…
▽ More
Cooperative table-carrying is a complex task due to the continuous nature of the action and state-spaces, multimodality of strategies, and the need for instantaneous adaptation to other agents. In this work, we present a method for predicting realistic motion plans for cooperative human-robot teams on the task. Using a Variational Recurrent Neural Network (VRNN) to model the variation in the trajectory of a human-robot team across time, we are able to capture the distribution over the team's future states while leveraging information from interaction history. The key to our approach is leveraging human demonstration data to generate trajectories that synergize well with humans during test time in a receding horizon fashion. Comparison between a baseline, sampling-based planner RRT (Rapidly-exploring Random Trees) and the VRNN planner in centralized planning shows that the VRNN generates motion more similar to the distribution of human-human demonstrations than the RRT. Results in a human-in-the-loop user study show that the VRNN planner outperforms decentralized RRT on task-related metrics, and is significantly more likely to be perceived as human than the RRT planner. Finally, we demonstrate the VRNN planner on a real robot paired with a human teleoperating another robot.
△ Less
Submitted 7 March, 2023; v1 submitted 26 September, 2022;
originally announced September 2022.
-
Quantum nondemolition measurements with optical parametric amplifiers for ultrafast universal quantum information processing
Authors:
Ryotatsu Yanagimoto,
Rajveer Nehra,
Ryan Hamerly,
Edwin Ng,
Alireza Marandi,
Hideo Mabuchi
Abstract:
Realization of a room-temperature ultra-fast photon-number-resolving (PNR) quantum nondemolition (QND) measurement would have significant implications for photonic quantum information processing (QIP), enabling, e.g., deterministic quantum computation in discrete-variable architectures, but the requirement for strong coupling has hampered the development of scalable implementations. In this work,…
▽ More
Realization of a room-temperature ultra-fast photon-number-resolving (PNR) quantum nondemolition (QND) measurement would have significant implications for photonic quantum information processing (QIP), enabling, e.g., deterministic quantum computation in discrete-variable architectures, but the requirement for strong coupling has hampered the development of scalable implementations. In this work, we propose and analyze a nonlinear-optical route to PNR QND using quadratic (i.e., $χ^{(2)}$) nonlinear interactions. We show that the coherent pump field driving a phase-mismatched optical parametric amplifier (OPA) experiences displacements conditioned on the number of signal Bogoliubov excitations. A measurement of the pump displacement thus provides a QND measurement of the signal Bogoliubov excitations, projecting the signal mode to a squeezed photon-number state. We then show how our nonlinear OPA dynamics can be utilized for deterministically generating Gottesman-Kitaev-Preskill states only with additional Gaussian resources, offering an all-optical route for fault-tolerant QIP in continuous-variable systems. Finally, we place these QND schemes into a more traditional context by highlighting analogies between the phase-mismatched optical parametric oscillator and multilevel atom-cavity QED systems, by showing how continuous monitoring of the outcoupled pump quadrature induces conditional localization of the intracavity signal mode onto squeezed photon-number states. Our analysis suggests that our proposal may be viable in near-term $χ^{(2)}$ nonlinear nanophotonics, highlighting the rich potential of OPA as a universal tool for ultrafast non-Gaussian quantum state engineering and quantum computation.
△ Less
Submitted 2 September, 2022;
originally announced September 2022.
-
Computational Modelling of Plasticity-Led Evolution
Authors:
Eden Tian Hwa Ng,
Akira R. Kinjo
Abstract:
Plasticity-led evolution is a form of evolution where a change in the environment induces novel traits via phenotypic plasticity, after which the novel traits are genetically accommodated over generations under the novel environment. This mode of evolution is expected to resolve the problem of gradualism (i.e., evolution by the slow accumulation of mutations that induce phenotypic variation) impli…
▽ More
Plasticity-led evolution is a form of evolution where a change in the environment induces novel traits via phenotypic plasticity, after which the novel traits are genetically accommodated over generations under the novel environment. This mode of evolution is expected to resolve the problem of gradualism (i.e., evolution by the slow accumulation of mutations that induce phenotypic variation) implied by the Modern Evolutionary Synthesis, in the face of a large environmental change. While experimental works are essential for validating that plasticity-led evolution indeed happened, we need computational models to gain insight into its underlying mechanisms and make qualitative predictions. Such computational models should include the developmental process and gene-environment interactions in addition to genetics and natural selection. We point out that gene regulatory network models can incorporate all the above notions. In this review, we highlight results from computational modelling of gene regulatory networks that consolidate the criteria of plasticity-led evolution. Since gene regulatory networks are mathematically equivalent to artificial recurrent neural networks, we also discuss their analogies and discrepancies, which may help further understand the mechanisms underlying plasticity-led evolution.
△ Less
Submitted 18 December, 2022; v1 submitted 1 August, 2022;
originally announced August 2022.
-
Degenerate optical parametric amplification in CMOS silicon
Authors:
David Heydari,
Mircea Catuneanu,
Edwin Ng,
Dodd J. Gray Jr.,
Ryan Hamerly,
Jatadhari Mishra,
Marc Jankowski,
M. M. Fejer,
Kambiz Jamshidi,
Hideo Mabuchi
Abstract:
Silicon is a common material for photonics due to its favorable optical properties in the telecom and mid-wave IR bands, as well as compatibility with a wide range of complementary metal-oxide semiconductor (CMOS) foundry processes. Crystalline inversion symmetry precludes silicon from natively exhibiting second-order nonlinear optical processes. In this work, we build on recent work in silicon ph…
▽ More
Silicon is a common material for photonics due to its favorable optical properties in the telecom and mid-wave IR bands, as well as compatibility with a wide range of complementary metal-oxide semiconductor (CMOS) foundry processes. Crystalline inversion symmetry precludes silicon from natively exhibiting second-order nonlinear optical processes. In this work, we build on recent work in silicon photonics that break this material symmetry using large bias fields, thereby enabling $χ^{(2)}$ interactions. Using this approach, we demonstrate both second-harmonic generation (with a normalized efficiency of $0.2\,\%\,\mathrm{W^{-1} cm^{-2}}$) and, to our knowledge, the first degenerate $χ^{(2)}$ optical parametric amplifier (with relative gain of $0.02\,\mathrm{dB}$ using $3\,\mathrm{mW}$ of pump power on-chip at a pump wavelength of $1196\,\mathrm{nm}$) using silicon-on-insulator waveguides fabricated in a CMOS-compatible commercial foundry. We expect this technology to enable the integration of novel nonlinear optical devices such as optical parametric amplifiers, oscillators, and frequency converters into large-scale, hybrid photonic-electronic systems by leveraging the extensive ecosystem of CMOS fabrication.
△ Less
Submitted 15 July, 2022;
originally announced July 2022.
-
ByteComp: Revisiting Gradient Compression in Distributed Training
Authors:
Zhuang Wang,
Haibin Lin,
Yibo Zhu,
T. S. Eugene Ng
Abstract:
Gradient compression (GC) is a promising approach to addressing the communication bottleneck in distributed deep learning (DDL). However, it is challenging to find the optimal compression strategy for applying GC to DDL because of the intricate interactions among tensors. To fully unleash the benefits of GC, two questions must be addressed: 1) How to express all compression strategies and the corr…
▽ More
Gradient compression (GC) is a promising approach to addressing the communication bottleneck in distributed deep learning (DDL). However, it is challenging to find the optimal compression strategy for applying GC to DDL because of the intricate interactions among tensors. To fully unleash the benefits of GC, two questions must be addressed: 1) How to express all compression strategies and the corresponding interactions among tensors of any DDL training job? 2) How to quickly select a near-optimal compression strategy? In this paper, we propose ByteComp to answer these questions. It first designs a decision tree abstraction to express all the compression strategies and develops empirical models to timeline tensor computation, communication, and compression to enable ByteComp to derive the intricate interactions among tensors. It then designs a compression decision algorithm that analyzes tensor interactions to eliminate and prioritize strategies and optimally offloads compression to CPUs. Experimental evaluations show that ByteComp can improve the training throughput over the start-of-the-art compression-enabled system by up to 77% for representative DDL training jobs. Moreover, the computational time needed to select the compression strategy is measured in milliseconds, and the selected strategy is only a few percent from optimal.
△ Less
Submitted 6 June, 2022; v1 submitted 28 May, 2022;
originally announced May 2022.
-
Ultra-broadband mid-infrared generation in dispersion-engineered thin-film lithium niobate
Authors:
Jatadhari Mishra,
Marc Jankowski,
Alexander Y. Hwang,
Hubert S. Stokowski,
Timothy P. McKenna,
Carsten Langrock,
Edwin Ng,
David Heydari,
Hideo Mabuchi,
Amir H. Safavi-Naeini,
M . M. Fejer
Abstract:
Thin-film lithium niobate (TFLN) is an emerging platform for compact, low-power nonlinear-optical devices, and has been used extensively for near-infrared frequency conversion. Recent work has extended these devices to mid-infrared wavelengths, where broadly tunable sources may be used for chemical sensing. To this end, we demonstrate efficient and broadband difference frequency generation between…
▽ More
Thin-film lithium niobate (TFLN) is an emerging platform for compact, low-power nonlinear-optical devices, and has been used extensively for near-infrared frequency conversion. Recent work has extended these devices to mid-infrared wavelengths, where broadly tunable sources may be used for chemical sensing. To this end, we demonstrate efficient and broadband difference frequency generation between a fixed 1-micron pump and a tunable telecom source in uniformly-poled TFLN-on-sapphire by harnessing the dispersion-engineering available in tightly-confining waveguides. We show a simultaneous 1-2 order-of-magnitude improvement in conversion efficiency and ~5-fold enhancement of operating bandwidth for mid-infrared generation when compared to conventional lithium niobate waveguides. We also examine the effects of mid-infrared loss from surface-adsorbed water on the performance of these devices.
△ Less
Submitted 10 June, 2022; v1 submitted 18 May, 2022;
originally announced May 2022.
-
Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion
Authors:
Evonne Ng,
Hanbyul Joo,
Liwen Hu,
Hao Li,
Trevor Darrell,
Angjoo Kanazawa,
Shiry Ginosar
Abstract:
We present a framework for modeling interactional communication in dyadic conversations: given multimodal inputs of a speaker, we autoregressively output multiple possibilities of corresponding listener motion. We combine the motion and speech audio of the speaker using a motion-audio cross attention transformer. Furthermore, we enable non-deterministic prediction by learning a discrete latent rep…
▽ More
We present a framework for modeling interactional communication in dyadic conversations: given multimodal inputs of a speaker, we autoregressively output multiple possibilities of corresponding listener motion. We combine the motion and speech audio of the speaker using a motion-audio cross attention transformer. Furthermore, we enable non-deterministic prediction by learning a discrete latent representation of realistic listener motion with a novel motion-encoding VQ-VAE. Our method organically captures the multimodal and non-deterministic nature of nonverbal dyadic interactions. Moreover, it produces realistic 3D listener facial motion synchronous with the speaker (see video). We demonstrate that our method outperforms baselines qualitatively and quantitatively via a rich suite of experiments. To facilitate this line of research, we introduce a novel and large in-the-wild dataset of dyadic conversations. Code, data, and videos available at https://evonneng.github.io/learning2listen/.
△ Less
Submitted 18 April, 2022;
originally announced April 2022.
-
Temporal trapping: a route to strong coupling and deterministic optical quantum computation
Authors:
Ryotatsu Yanagimoto,
Edwin Ng,
Marc Jankowski,
Hideo Mabuchi,
Ryan Hamerly
Abstract:
The realization of deterministic photon-photon gates is a central goal in optical quantum computation and engineering. A longstanding challenge is that optical nonlinearities in scalable, room-temperature material platforms are too weak to achieve the required strong coupling, due to the critical loss-confinement tradeoff in existing photonic structures. In this work, we introduce a novel confinem…
▽ More
The realization of deterministic photon-photon gates is a central goal in optical quantum computation and engineering. A longstanding challenge is that optical nonlinearities in scalable, room-temperature material platforms are too weak to achieve the required strong coupling, due to the critical loss-confinement tradeoff in existing photonic structures. In this work, we introduce a novel confinement method, dispersion-engineered temporal trapping, to circumvent the tradeoff, paving a route to all-optical strong coupling. Temporal confinement is imposed by an auxiliary trap pulse via cross-phase modulation, which, combined with the spatial confinement of a waveguide, creates a "flying cavity" that enhances the nonlinear interaction strength by at least an order of magnitude. Numerical simulations confirm that temporal trapping confines the multimode nonlinear dynamics to a single-mode subspace, enabling high-fidelity deterministic quantum gate operations. With realistic dispersion engineering and loss figures, we show that temporally trapped ultrashort pulses could achieve strong coupling on near-term nonlinear nanophotonic platforms. Our results highlight the potential of ultrafast nonlinear optics to become the first scalable, high-bandwidth, and room-temperature platform that achieves a strong coupling, opening a new path to quantum computing, simulation, and light sources.
△ Less
Submitted 1 December, 2022; v1 submitted 22 March, 2022;
originally announced March 2022.
-
Watch Those Words: Video Falsification Detection Using Word-Conditioned Facial Motion
Authors:
Shruti Agarwal,
Liwen Hu,
Evonne Ng,
Trevor Darrell,
Hao Li,
Anna Rohrbach
Abstract:
In today's era of digital misinformation, we are increasingly faced with new threats posed by video falsification techniques. Such falsifications range from cheapfakes (e.g., lookalikes or audio dubbing) to deepfakes (e.g., sophisticated AI media synthesis methods), which are becoming perceptually indistinguishable from real videos. To tackle this challenge, we propose a multi-modal semantic foren…
▽ More
In today's era of digital misinformation, we are increasingly faced with new threats posed by video falsification techniques. Such falsifications range from cheapfakes (e.g., lookalikes or audio dubbing) to deepfakes (e.g., sophisticated AI media synthesis methods), which are becoming perceptually indistinguishable from real videos. To tackle this challenge, we propose a multi-modal semantic forensic approach to discover clues that go beyond detecting discrepancies in visual quality, thereby handling both simpler cheapfakes and visually persuasive deepfakes. In this work, our goal is to verify that the purported person seen in the video is indeed themselves by detecting anomalous facial movements corresponding to the spoken words. We leverage the idea of attribution to learn person-specific biometric patterns that distinguish a given speaker from others. We use interpretable Action Units (AUs) to capture a person's face and head movement as opposed to deep CNN features, and we are the first to use word-conditioned facial motion analysis. We further demonstrate our method's effectiveness on a range of fakes not seen in training including those without video manipulation, that were not addressed in prior work.
△ Less
Submitted 1 December, 2022; v1 submitted 20 December, 2021;
originally announced December 2021.
-
Onset of non-Gaussian quantum physics in pulsed squeezing with mesoscopic fields
Authors:
Ryotatsu Yanagimoto,
Edwin Ng,
Atsushi Yamamura,
Tatsuhiro Onodera,
Logan G. Wright,
Marc Jankowski,
M. M. Fejer,
Peter L. McMahon,
Hideo Mabuchi
Abstract:
We study the emergence of non-Gaussian quantum features in pulsed squeezed light generation with a mesoscopic number (i.e., dozens to hundreds) of pump photons. Due to the strong optical nonlinearities necessarily involved in this regime, squeezing occurs alongside significant pump depletion, compromising the predictions made by conventional semiclassical models for squeezing. Furthermore, nonline…
▽ More
We study the emergence of non-Gaussian quantum features in pulsed squeezed light generation with a mesoscopic number (i.e., dozens to hundreds) of pump photons. Due to the strong optical nonlinearities necessarily involved in this regime, squeezing occurs alongside significant pump depletion, compromising the predictions made by conventional semiclassical models for squeezing. Furthermore, nonlinear interactions among multiple frequency modes render the system dynamics exponentially intractable in naïve quantum models, requiring a more sophisticated modeling framework. To this end, we construct a nonlinear Gaussian approximation to the squeezing dynamics, defining a "Gaussian interaction frame" (GIF) in which non-Gaussian quantum dynamics can be isolated and concisely described using a few dominant (i.e., principal) supermodes. Numerical simulations of our model reveal non-Gaussian distortions of squeezing in the mesoscopic regime, largely associated with signal-pump entanglement. We argue that the state of the art in nonlinear nanophotonics is quickly approaching this regime, providing an all-optical platform for experimental studies of the semiclassical-to-quantum transition in a rich paradigm of coherent, multimode nonlinear dynamics. Mesoscopic pulsed squeezing thus provides an intriguing case study of the rapid rise in dynamic complexity associated with semiclassical-to-quantum crossover, which we view as a correlate of the emergence of new information-processing capacities in the quantum regime.
△ Less
Submitted 26 November, 2021;
originally announced November 2021.
-
A parsimonious model of blood glucose homeostasis
Authors:
Eric Ng,
Jaycee Morgan Kaufman,
Lennaert van Veen,
Yan Fossat
Abstract:
The mathematical modelling of biological systems has historically followed one of two approaches: comprehensive and minimal. In comprehensive models, the involved biological pathways are modelled independently, then brought together as an ensemble of equations that represents the system being studied, most often in the form of a large system of coupled differential equations. This approach often c…
▽ More
The mathematical modelling of biological systems has historically followed one of two approaches: comprehensive and minimal. In comprehensive models, the involved biological pathways are modelled independently, then brought together as an ensemble of equations that represents the system being studied, most often in the form of a large system of coupled differential equations. This approach often contains a very large number of tuneable parameters (> 100) where each describes some physical or biochemical subproperty. As a result, such models scale very poorly when assimilation of real world data is needed. Furthermore, condensing model results into simple indicators is challenging, an important difficulty in scenarios where medical diagnosis is required. In this paper, we develop a minimal model of glucose homeostasis with the potential to yield diagnostics for pre-diabetes. We model glucose homeostasis as a closed control system containing a self-feedback mechanism that describes the collective effects of the physiological components involved. The model is analyzed as a planar dynamical system, then tested and verified using data collected with continuous glucose monitors (CGMs) from healthy individuals in four separate studies. We show that, although the model has only a small number (3) of tunable parameters, their distribution across subjects has a consistent distribution both for hyperglycemic and for hypoglycemic episodes.
△ Less
Submitted 13 November, 2021;
originally announced November 2021.
-
Deep Learning and Spectral Embedding for Graph Partitioning
Authors:
Alice Gatti,
Zhixiong Hu,
Tess Smidt,
Esmond G. Ng,
Pieter Ghysels
Abstract:
We present a graph bisection and partitioning algorithm based on graph neural networks. For each node in the graph, the network outputs probabilities for each of the partitions. The graph neural network consists of two modules: an embedding phase and a partitioning phase. The embedding phase is trained first by minimizing a loss function inspired by spectral graph theory. The partitioning module i…
▽ More
We present a graph bisection and partitioning algorithm based on graph neural networks. For each node in the graph, the network outputs probabilities for each of the partitions. The graph neural network consists of two modules: an embedding phase and a partitioning phase. The embedding phase is trained first by minimizing a loss function inspired by spectral graph theory. The partitioning module is trained through a loss function that corresponds to the expected value of the normalized cut. Both parts of the neural network rely on SAGE convolutional layers and graph coarsening using heavy edge matching. The multilevel structure of the neural network is inspired by the multigrid algorithm. Our approach generalizes very well to bigger graphs and has partition quality comparable to METIS, Scotch and spectral partitioning, with shorter runtime compared to METIS and spectral partitioning.
△ Less
Submitted 8 December, 2021; v1 submitted 16 October, 2021;
originally announced October 2021.
-
Dynamically learning the parameters of a chaotic system using partial observations
Authors:
Elizabeth Carlson,
Joshua Hudson,
Adam Larios,
Vincent R. Martinez,
Eunice Ng,
Jared P. Whitehead
Abstract:
Motivated by recent progress in data assimilation, we develop an algorithm to dynamically learn the parameters of a chaotic system from partial observations. Under reasonable assumptions, we rigorously establish the convergence of this algorithm to the correct parameters when the system in question is the classic three-dimensional Lorenz system. Computationally, we demonstrate the efficacy of this…
▽ More
Motivated by recent progress in data assimilation, we develop an algorithm to dynamically learn the parameters of a chaotic system from partial observations. Under reasonable assumptions, we rigorously establish the convergence of this algorithm to the correct parameters when the system in question is the classic three-dimensional Lorenz system. Computationally, we demonstrate the efficacy of this algorithm on the Lorenz system by recovering any proper subset of the three non-dimensional parameters of the system, so long as a corresponding subset of the state is observable. We also provide computational evidence that this algorithm works well beyond the hypotheses required in the rigorous analysis, including in the presence of noisy observations, stochastic forcing, and the case where the observations are discrete and sparse in time.
△ Less
Submitted 18 August, 2021;
originally announced August 2021.
-
MXDAG: A Hybrid Abstraction for Cluster Applications
Authors:
Weitao Wang,
Sushovan Das,
Xinyu Crystal Wu,
Zhuang Wang,
Ang Chen,
T. S. Eugene Ng
Abstract:
Distributed applications, such as database queries and distributed training, consist of both compute and network tasks. DAG-based abstraction primarily targets compute tasks and has no explicit network-level scheduling. In contrast, Coflow abstraction collectively schedules network flows among compute tasks but lacks the end-to-end view of the application DAG. Because of the dependencies and inter…
▽ More
Distributed applications, such as database queries and distributed training, consist of both compute and network tasks. DAG-based abstraction primarily targets compute tasks and has no explicit network-level scheduling. In contrast, Coflow abstraction collectively schedules network flows among compute tasks but lacks the end-to-end view of the application DAG. Because of the dependencies and interactions between these two types of tasks, it is sub-optimal to only consider one of them. We argue that co-scheduling of both compute and network tasks can help applications towards the globally optimal end-to-end performance. However, none of the existing abstractions can provide fine-grained information for co-scheduling. We propose MXDAG, an abstraction to treat both compute and network tasks explicitly. It can capture the dependencies and interactions of both compute and network tasks leading to improved application performance.
△ Less
Submitted 15 July, 2021;
originally announced July 2021.
-
Bayesian Time Varying Coefficient Model with Applications to Marketing Mix Modeling
Authors:
Edwin Ng,
Zhishi Wang,
Athena Dai
Abstract:
Both Bayesian and varying coefficient models are very useful tools in practice as they can be used to model parameter heterogeneity in a generalizable way. Motivated by the need of enhancing Marketing Mix Modeling at Uber, we propose a Bayesian Time Varying Coefficient model, equipped with a hierarchical Bayesian structure. This model is different from other time varying coefficient models in the…
▽ More
Both Bayesian and varying coefficient models are very useful tools in practice as they can be used to model parameter heterogeneity in a generalizable way. Motivated by the need of enhancing Marketing Mix Modeling at Uber, we propose a Bayesian Time Varying Coefficient model, equipped with a hierarchical Bayesian structure. This model is different from other time varying coefficient models in the sense that the coefficients are weighted over a set of local latent variables following certain probabilistic distributions. Stochastic Variational Inference is used to approximate the posteriors of latent variables and dynamic coefficients. The proposed model also helps address many challenges faced by traditional MMM approaches. We used simulations as well as real world marketing datasets to demonstrate our model superior performance in terms of both accuracy and interpretability.
△ Less
Submitted 4 September, 2021; v1 submitted 6 June, 2021;
originally announced June 2021.
-
Benchmark Study of Quantum Algorithms for Combinatorial Optimization: Unitary versus Dissipative
Authors:
Krishanu Sankar,
Artur Scherer,
Satoshi Kako,
Sam Reifenstein,
Navid Ghadermarzy,
Willem B. Krayenhoff,
Yoshitaka Inui,
Edwin Ng,
Tatsuhiro Onodera,
Pooya Ronagh,
Yoshihisa Yamamoto
Abstract:
We study the performance scaling of three quantum algorithms for combinatorial optimization: measurement-feedback coherent Ising machines (MFB-CIM), discrete adiabatic quantum computation (DAQC), and the Dürr-Hoyer algorithm for quantum minimum finding (DH-QMF) that is based on Grover's search. We use MaxCut problems as our reference for comparison, and time-to-solution (TTS) as a practical measur…
▽ More
We study the performance scaling of three quantum algorithms for combinatorial optimization: measurement-feedback coherent Ising machines (MFB-CIM), discrete adiabatic quantum computation (DAQC), and the Dürr-Hoyer algorithm for quantum minimum finding (DH-QMF) that is based on Grover's search. We use MaxCut problems as our reference for comparison, and time-to-solution (TTS) as a practical measure of performance for these optimization algorithms. We empirically observe a $Θ(2^{\sqrt{n}})$ scaling for the median TTS for MFB-CIM, in comparison to the exponential scaling with the exponent $n$ for DAQC and the provable $\widetilde{\mathcal O}\left(\sqrt{2^n}\right)$ scaling for DH-QMF. We conclude that these scaling complexities result in a dramatic performance advantage for MFB-CIM in comparison to the other two algorithms for solving MaxCut problems.
△ Less
Submitted 7 May, 2021;
originally announced May 2021.
-
Shufflecast: An Optical, Data-rate Agnostic and Low-Power Multicast Architecture for Next-Generation Compute Clusters
Authors:
Sushovan Das,
Afsaneh Rahbar,
Xinyu Crystal Wu,
Zhuang Wang,
Weitao Wang,
Ang Chen,
T. S. Eugene Ng
Abstract:
An optical circuit-switched network core has the potential to overcome the inherent challenges of a conventional electrical packet-switched core of today's compute clusters. As optical circuit switches (OCS) directly handle the photon beams without any optical-electrical-optical (O/E/O) conversion and packet processing, OCS-based network cores have the following desirable properties: a) agnostic t…
▽ More
An optical circuit-switched network core has the potential to overcome the inherent challenges of a conventional electrical packet-switched core of today's compute clusters. As optical circuit switches (OCS) directly handle the photon beams without any optical-electrical-optical (O/E/O) conversion and packet processing, OCS-based network cores have the following desirable properties: a) agnostic to data-rate, b) negligible/zero power consumption, c) no need of transceivers, d) negligible forwarding latency, and e) no need for frequent upgrade. Unfortunately, OCS can only provide point-to-point (unicast) circuits. They do not have built-in support for one-to-many (multicast) communication, yet multicast is fundamental to a plethora of data-intensive applications running on compute clusters nowadays. In this paper, we propose Shufflecast, a novel optical network architecture for next-generation compute clusters that can support high-performance multicast satisfying all the properties of an OCS-based network core. Shufflecast leverages small fanout, inexpensive, passive optical splitters to connect the Top-of-rack (ToR) switch ports, ensuring data-rate agnostic, low-power, physical-layer multicast. We thoroughly analyze Shufflecast's highly scalable data plane, light-weight control plane, and graceful failure handling. Further, we implement a complete prototype of Shufflecast in our testbed and extensively evaluate the network. Shufflecast is more power-efficient than the state-of-the-art multicast mechanisms. Also, Shufflecast is more cost-efficient than a conventional packet-switched network. By adding Shufflecast alongside an OCS-based unicast network, an all-optical network core with the aforementioned desirable properties supporting both unicast and multicast can be realized.
△ Less
Submitted 19 April, 2021;
originally announced April 2021.
-
Mid-infrared nonlinear optics in thin-film lithium niobate on sapphire
Authors:
Jatadhari Mishra,
Timothy P. McKenna,
Edwin Ng,
Hubert S. Stokowski,
Marc Jankowski,
Carsten Langrock,
David Heydari,
Hideo Mabuchi,
M. M. Fejer,
Amir H. Safavi-Naeini
Abstract:
Periodically poled thin-film lithium niobate (TFLN) waveguides have emerged as a leading platform for highly efficient frequency conversion in the near-infrared. However, the commonly used silica bottom-cladding results in high absorption loss at wavelengths beyond 2.5 $μ$m. In this work, we demonstrate efficient frequency conversion in a TFLN-on-sapphire platform, which features high transparency…
▽ More
Periodically poled thin-film lithium niobate (TFLN) waveguides have emerged as a leading platform for highly efficient frequency conversion in the near-infrared. However, the commonly used silica bottom-cladding results in high absorption loss at wavelengths beyond 2.5 $μ$m. In this work, we demonstrate efficient frequency conversion in a TFLN-on-sapphire platform, which features high transparency up to 4.5 $μ$m. In particular, we report generating mid-infrared light up to 3.66 $μ$m via difference-frequency generation of a fixed 1-$μ$m source and a tunable telecom source, with normalized efficiencies up to 200%/W-cm$^2$. These results show TFLN-on-sapphire to be a promising platform for integrated nonlinear nanophotonics in the mid-infrared.
△ Less
Submitted 13 April, 2021;
originally announced April 2021.
-
Graph Partitioning and Sparse Matrix Ordering using Reinforcement Learning and Graph Neural Networks
Authors:
Alice Gatti,
Zhixiong Hu,
Tess Smidt,
Esmond G. Ng,
Pieter Ghysels
Abstract:
We present a novel method for graph partitioning, based on reinforcement learning and graph convolutional neural networks. Our approach is to recursively partition coarser representations of a given graph. The neural network is implemented using SAGE graph convolution layers, and trained using an advantage actor critic (A2C) agent. We present two variants, one for finding an edge separator that mi…
▽ More
We present a novel method for graph partitioning, based on reinforcement learning and graph convolutional neural networks. Our approach is to recursively partition coarser representations of a given graph. The neural network is implemented using SAGE graph convolution layers, and trained using an advantage actor critic (A2C) agent. We present two variants, one for finding an edge separator that minimizes the normalized cut or quotient cut, and one that finds a small vertex separator. The vertex separators are then used to construct a nested dissection ordering to permute a sparse matrix so that its triangular factorization will incur less fill-in. The partitioning quality is compared with partitions obtained using METIS and SCOTCH, and the nested dissection ordering is evaluated in the sparse solver SuperLU. Our results show that the proposed method achieves similar partitioning quality as METIS and SCOTCH. Furthermore, the method generalizes across different classes of graphs, and works well on a variety of graphs from the SuiteSparse sparse matrix collection.
△ Less
Submitted 28 June, 2021; v1 submitted 8 April, 2021;
originally announced April 2021.
-
Pushing the Limits of Non-Autoregressive Speech Recognition
Authors:
Edwin G. Ng,
Chung-Cheng Chiu,
Yu Zhang,
William Chan
Abstract:
We combine recent advancements in end-to-end speech recognition to non-autoregressive automatic speech recognition. We push the limits of non-autoregressive state-of-the-art results for multiple datasets: LibriSpeech, Fisher+Switchboard and Wall Street Journal. Key to our recipe, we leverage CTC on giant Conformer neural network architectures with SpecAugment and wav2vec2 pre-training. We achieve…
▽ More
We combine recent advancements in end-to-end speech recognition to non-autoregressive automatic speech recognition. We push the limits of non-autoregressive state-of-the-art results for multiple datasets: LibriSpeech, Fisher+Switchboard and Wall Street Journal. Key to our recipe, we leverage CTC on giant Conformer neural network architectures with SpecAugment and wav2vec2 pre-training. We achieve 1.8%/3.6% WER on LibriSpeech test/test-other sets, 5.1%/9.8% WER on Switchboard, and 3.4% on the Wall Street Journal, all without a language model.
△ Less
Submitted 11 September, 2021; v1 submitted 7 April, 2021;
originally announced April 2021.
-
MergeComp: A Compression Scheduler for Scalable Communication-Efficient Distributed Training
Authors:
Zhuang Wang,
Xinyu Wu,
T. S. Eugene Ng
Abstract:
Large-scale distributed training is increasingly becoming communication bound. Many gradient compression algorithms have been proposed to reduce the communication overhead and improve scalability. However, it has been observed that in some cases gradient compression may even harm the performance of distributed training.
In this paper, we propose MergeComp, a compression scheduler to optimize the…
▽ More
Large-scale distributed training is increasingly becoming communication bound. Many gradient compression algorithms have been proposed to reduce the communication overhead and improve scalability. However, it has been observed that in some cases gradient compression may even harm the performance of distributed training.
In this paper, we propose MergeComp, a compression scheduler to optimize the scalability of communication-efficient distributed training. It automatically schedules the compression operations to optimize the performance of compression algorithms without the knowledge of model architectures or system parameters. We have applied MergeComp to nine popular compression algorithms. Our evaluations show that MergeComp can improve the performance of compression algorithms by up to 3.83x without losing accuracy. It can even achieve a scaling factor of distributed training up to 99% over high-speed networks.
△ Less
Submitted 28 March, 2021;
originally announced March 2021.
-
Stabilizing multiple topological fermions on a quantum computer
Authors:
Jin Ming Koh,
Tommy Tai,
Yong Han Phee,
Wei En Ng,
Ching Hua Lee
Abstract:
In classical and single-particle settings, non-trivial band topology always gives rise to robust boundary modes. For quantum many-body systems, however, multiple topological fermions are not always able to coexist, since Pauli exclusion prevents additional fermions from occupying the limited number of available topological modes. In this work, we show, through IBM quantum computers, how one can ro…
▽ More
In classical and single-particle settings, non-trivial band topology always gives rise to robust boundary modes. For quantum many-body systems, however, multiple topological fermions are not always able to coexist, since Pauli exclusion prevents additional fermions from occupying the limited number of available topological modes. In this work, we show, through IBM quantum computers, how one can robustly stabilize more fermions than the number of topological modes through specially designed 2-fermion interactions. Our demonstration hinges on the realization of BDI- and D-class topological Hamiltonians of unprecedented complexity on transmon-based quantum hardware, and crucially relied on tensor network-aided circuit recompilation approaches beyond conventional trotterization. We also achieved the full reconstruction of multiple-fermion topological band structures through iterative quantum phase estimation (IQPE). All in all, our work showcases how advances in quantum algorithm implementation enables NISQ-era quantum computers to be exploited for topological stabilization beyond the context of single-particle topological invariants.
△ Less
Submitted 25 March, 2021; v1 submitted 23 March, 2021;
originally announced March 2021.
-
Efficient sampling of ground and low-energy Ising spin configurations with a coherent Ising machine
Authors:
Edwin Ng,
Tatsuhiro Onodera,
Satoshi Kako,
Peter L. McMahon,
Hideo Mabuchi,
Yoshihisa Yamamoto
Abstract:
We show that the nonlinear stochastic dynamics of a measurement-feedback-based coherent Ising machine (MFB-CIM) in the presence of quantum noise can be exploited to sample degenerate ground and low-energy spin configurations of the Ising model. We formulate a general discrete-time Gaussian-state model of the MFB-CIM which faithfully captures the nonlinear dynamics present at and above system thres…
▽ More
We show that the nonlinear stochastic dynamics of a measurement-feedback-based coherent Ising machine (MFB-CIM) in the presence of quantum noise can be exploited to sample degenerate ground and low-energy spin configurations of the Ising model. We formulate a general discrete-time Gaussian-state model of the MFB-CIM which faithfully captures the nonlinear dynamics present at and above system threshold. This model overcomes the limitations of both mean-field models, which neglect quantum noise, and continuous-time models, which assume long photon lifetimes. Numerical simulations of our model show that when the MFB-CIM is operated in a quantum-noise-dominated regime with short photon lifetimes (i.e., low cavity finesse), homodyne monitoring of the system can efficiently produce samples of low-energy Ising spin configurations, requiring many fewer roundtrips to sample than suggested by established high-finesse, continuous-time models. We find that sampling performance is robust to, or even improved by, turning off or altogether reversing the sign of the parametric drive, but performance is critically reduced in the absence of optical nonlinearity. For the class of MAX-CUT problems with binary-signed edge weights, the number of roundtrips sufficient to fully sample all spin configurations up to the first-excited Ising energy, including all degeneracies, scales as $1.08^N$. At a problem size of $N = 100$ with a few dozen (median of 20) such desired configurations per instance, we have found median sufficient sampling times of $6\times10^6$ roundtrips; in an experimental implementation of an MFB-CIM with a 10 GHz repetition rate, this corresponds to a wall-clock sampling time of 60 ms.
△ Less
Submitted 27 January, 2022; v1 submitted 9 March, 2021;
originally announced March 2021.
-
Towards an Engineering Framework for Ultrafast Quantum Nonlinear Optics
Authors:
Ryotatsu Yanagimoto,
Edwin Ng,
Tatsuhiro Onodera,
Hideo Mabuchi
Abstract:
The advent of dispersion-engineered and highly nonlinear nanophotonics is expected to open up an all-optical path towards the strong-interaction regime of quantum optics by combining high transverse field confinement with ultra-short-pulse operation. Obtaining a full understanding of photon dynamics in such broadband devices, however, poses major challenges in the modeling and simulation of multim…
▽ More
The advent of dispersion-engineered and highly nonlinear nanophotonics is expected to open up an all-optical path towards the strong-interaction regime of quantum optics by combining high transverse field confinement with ultra-short-pulse operation. Obtaining a full understanding of photon dynamics in such broadband devices, however, poses major challenges in the modeling and simulation of multimode non-Gaussian quantum physics, highlighting the need for sophisticated reduced models that facilitate efficient numerical study while providing useful physical insight. In this manuscript, we review our recent efforts in modeling broadband optical systems at varying levels of abstraction and generality, ranging from multimode extensions of quantum input-output theory for sync-pumped oscillators to the development of numerical methods based on a field-theoretic description of nonlinear waveguides. We expect our work not only to guide ongoing theoretical and experimental efforts towards next-generation quantum devices but also to uncover essential physics of broadband quantum photonics.
△ Less
Submitted 17 February, 2021;
originally announced February 2021.
-
Efficient simulation of ultrafast quantum nonlinear optics with matrix product states
Authors:
Ryotatsu Yanagimoto,
Edwin Ng,
Logan G. Wright,
Tatsuhiro Onodera,
Hideo Mabuchi
Abstract:
Ultra-short pulses propagating in nonlinear nanophotonic waveguides can simultaneously leverage both temporal and spatial field confinement, promising a route towards single-photon nonlinearities in an all-photonic platform. In this multimode quantum regime, however, faithful numerical simulations of pulse dynamics naïvely require a representation of the state in an exponentially large Hilbert spa…
▽ More
Ultra-short pulses propagating in nonlinear nanophotonic waveguides can simultaneously leverage both temporal and spatial field confinement, promising a route towards single-photon nonlinearities in an all-photonic platform. In this multimode quantum regime, however, faithful numerical simulations of pulse dynamics naïvely require a representation of the state in an exponentially large Hilbert space. Here, we employ a time-domain, matrix product state (MPS) representation to enable efficient simulations by exploiting the entanglement structure of the system. In order to extract physical insight from these simulations, we develop an algorithm to unravel the MPS quantum state into constituent temporal supermodes, enabling, e.g., access to the phase-space portraits of arbitrary pulse waveforms. As a demonstration, we perform exact numerical simulations of a Kerr soliton in the quantum regime. We observe the development of non-classical Wigner-function negativity in the solitonic mode as well as quantum corrections to the semiclassical dynamics of the pulse. A similar analysis of $χ^{(2)}$ simultons reveals a unique entanglement structure between the fundamental and second harmonic. Our approach is also readily compatible with quantum trajectory theory, allowing full quantum treatment of propagation loss and decoherence. We expect this work to establish the MPS technique as part of a unified engineering framework for the emerging field of broadband quantum photonics.
△ Less
Submitted 11 February, 2021;
originally announced February 2021.
-
Understanding Guided Image Captioning Performance across Domains
Authors:
Edwin G. Ng,
Bo Pang,
Piyush Sharma,
Radu Soricut
Abstract:
Image captioning models generally lack the capability to take into account user interest, and usually default to global descriptions that try to balance readability, informativeness, and information overload. On the other hand, VQA models generally lack the ability to provide long descriptive answers, while expecting the textual question to be quite precise. We present a method to control the conc…
▽ More
Image captioning models generally lack the capability to take into account user interest, and usually default to global descriptions that try to balance readability, informativeness, and information overload. On the other hand, VQA models generally lack the ability to provide long descriptive answers, while expecting the textual question to be quite precise. We present a method to control the concepts that an image caption should focus on, using an additional input called the guiding text that refers to either groundable or ungroundable concepts in the image. Our model consists of a Transformer-based multimodal encoder that uses the guiding text together with global and object-level image features to derive early-fusion representations used to generate the guided caption. While models trained on Visual Genome data have an in-domain advantage of fitting well when guided with automatic object labels, we find that guided captioning models trained on Conceptual Captions generalize better on out-of-domain images and guiding texts. Our human-evaluation results indicate that attempting in-the-wild guided image captioning requires access to large, unrestricted-domain training datasets, and that increased style diversity (even without increasing the number of unique tokens) is a key factor for improved performance.
△ Less
Submitted 10 November, 2021; v1 submitted 3 December, 2020;
originally announced December 2020.
-
DCFIT: Initial Trigger-Based PFC Deadlock Detection in the Data Plane
Authors:
Xinyu Crystal Wu,
T. S. Eugene Ng
Abstract:
Recent data center applications rely on lossless networks to achieve high network performance. Lossless networks, however, can suffer from in-network deadlocks induced by hop-by-hop flow control protocols like PFC. Once deadlocks occur, large parts of the network could be blocked. Existing solutions mainly center on a deadlock avoidance strategy; unfortunately, they are not foolproof. Thus, deadlo…
▽ More
Recent data center applications rely on lossless networks to achieve high network performance. Lossless networks, however, can suffer from in-network deadlocks induced by hop-by-hop flow control protocols like PFC. Once deadlocks occur, large parts of the network could be blocked. Existing solutions mainly center on a deadlock avoidance strategy; unfortunately, they are not foolproof. Thus, deadlock detection is a necessary last resort. In this paper, we propose DCFIT, a new mechanism performed entirely in the data plane to detect and solve deadlocks for arbitrary network topologies and routing protocols. Unique to DCFIT is the use of deadlock initial triggers, which contribute to efficient deadlock detection and deadlock recurrence prevention. Preliminary results indicate that DCFIT can detect deadlocks quickly with minimal overhead and mitigate the recurrence of the same deadlocks effectively. This work does not raise any ethical issues.
△ Less
Submitted 28 September, 2020;
originally announced September 2020.
-
Broadband Parametric Downconversion as a Discrete-Continuum Fano Interaction
Authors:
Ryotatsu Yanagimoto,
Edwin Ng,
Marc P. Jankowski,
Tatsuhiro Onodera,
Martin M. Fejer,
Hideo Mabuchi
Abstract:
We introduce a theoretical framework based on Fano's theory of discrete-continuum interactions to analyze the quantum dynamics of broadband parametric downconversion (PDC) in the few-pump-photon regime of nonlinear quantum nanophotonics. Applying this unified analytic approach to 1D $χ^{(2)}$-nonlinear waveguides, we find a host of remarkable dynamical features due to the coupling of a discrete pu…
▽ More
We introduce a theoretical framework based on Fano's theory of discrete-continuum interactions to analyze the quantum dynamics of broadband parametric downconversion (PDC) in the few-pump-photon regime of nonlinear quantum nanophotonics. Applying this unified analytic approach to 1D $χ^{(2)}$-nonlinear waveguides, we find a host of remarkable dynamical features due to the coupling of a discrete pump state to the signal continuum, from unit-efficiency (i.e., complete) downconversion when the coupling is dissipative, to Rabi-like oscillations with sub-exponential decay when it is dispersive. The theory provides a straightforward way to analytically compute a full characterization of the PDC dynamics, including the complete eigensystem of the continuum Hamiltonian and expressions for the signal biphoton correlation function. We also apply the theory to study a pair of linearly coupled $χ^{(2)}$ waveguides, where two discrete pump states simultaneously downconvert into a common-mode signal continuum, resulting in Fano interference that critically affects the PDC rate. Under appropriate conditions, the theory predicts characteristic Fano lineshapes and even complete destructive interference resulting in the full suppression of PDC, due to the formation of a bound pump state in the continuum. Generalizing further, we show that the framework can also be applied to higher-order parametric processes such as parametric three-photon generation, and we also find numerical signatures that Fano-type interactions occur even for multi-photon PDC under stronger pumping. Our results establish broadband PDC as yet another physical system natively exhibiting Fano-type interactions and advance a theoretical framework in which to understand the complicated quantum dynamics of strongly nonlinear broadband quantum optics.
△ Less
Submitted 3 September, 2020;
originally announced September 2020.
-
Body2Hands: Learning to Infer 3D Hands from Conversational Gesture Body Dynamics
Authors:
Evonne Ng,
Shiry Ginosar,
Trevor Darrell,
Hanbyul Joo
Abstract:
We propose a novel learned deep prior of body motion for 3D hand shape synthesis and estimation in the domain of conversational gestures. Our model builds upon the insight that body motion and hand gestures are strongly correlated in non-verbal communication settings. We formulate the learning of this prior as a prediction task of 3D hand shape over time given body motion input alone. Trained with…
▽ More
We propose a novel learned deep prior of body motion for 3D hand shape synthesis and estimation in the domain of conversational gestures. Our model builds upon the insight that body motion and hand gestures are strongly correlated in non-verbal communication settings. We formulate the learning of this prior as a prediction task of 3D hand shape over time given body motion input alone. Trained with 3D pose estimations obtained from a large-scale dataset of internet videos, our hand prediction model produces convincing 3D hand gestures given only the 3D motion of the speaker's arms as input. We demonstrate the efficacy of our method on hand gesture synthesis from body motion input, and as a strong body prior for single-view image-based 3D hand pose estimation. We demonstrate that our method outperforms previous state-of-the-art approaches and can generalize beyond the monologue-based training data to multi-person conversations. Video results are available at http://people.eecs.berkeley.edu/~evonne_ng/projects/body2hands/.
△ Less
Submitted 7 April, 2021; v1 submitted 23 July, 2020;
originally announced July 2020.
-
Self-Evolving Adaptive Learning for Personalized Education
Authors:
Junhua Liu,
Lionell Loh,
Ernest Ng,
Yijia Chen,
Kristin L. Wood,
Kwan Hui Lim
Abstract:
Primary and secondary education is a crucial stage to build a strong foundation before diving deep into specialised subjects in colleges and universities. To excel in the current education system, students are required to have a deep understanding of knowledge according to standardized curriculums and syllabus, and exam-related problem solving skills. In current school settings, this learning norm…
▽ More
Primary and secondary education is a crucial stage to build a strong foundation before diving deep into specialised subjects in colleges and universities. To excel in the current education system, students are required to have a deep understanding of knowledge according to standardized curriculums and syllabus, and exam-related problem solving skills. In current school settings, this learning normally occurs in large classes of 30-40 students per class. Such a ``one size fits all'' approach may not be effective, as different students proceed on their learning in different ways and pace. To address this problem, we propose the Self-Evolving Adaptive Learning (SEAL) system for personalized education at scale.
△ Less
Submitted 28 August, 2020; v1 submitted 25 April, 2020;
originally announced May 2020.
-
Orbit: Probabilistic Forecast with Exponential Smoothing
Authors:
Edwin Ng,
Zhishi Wang,
Huigang Chen,
Steve Yang,
Slawek Smyl
Abstract:
Time series forecasting is an active research topic in academia as well as industry. Although we see an increasing amount of adoptions of machine learning methods in solving some of those forecasting challenges, statistical methods remain powerful while dealing with low granularity data. This paper introduces a refined Bayesian exponential smoothing model with the help of probabilistic programming…
▽ More
Time series forecasting is an active research topic in academia as well as industry. Although we see an increasing amount of adoptions of machine learning methods in solving some of those forecasting challenges, statistical methods remain powerful while dealing with low granularity data. This paper introduces a refined Bayesian exponential smoothing model with the help of probabilistic programming languages including Stan. Our model refinements include additional global trend, transformation for multiplicative form, noise distribution and choice of priors. A benchmark study is conducted on a rich set of time-series data sets for our models along with other well-known time series models.
△ Less
Submitted 22 January, 2021; v1 submitted 17 April, 2020;
originally announced April 2020.