-
Harnessing magnetic octupole Hall effect to induce torque in altermagnets
Authors:
Seungyun Han,
Daegeun Jo,
Insu Baek,
Peter M. Oppeneer,
Hyun-Woo Lee
Abstract:
d-wave altermagnets have magnetic octupoles as their order parameters [Phys. Rev. X 14, 011019 (2024)]. We theoretically show that magnetic octupoles injected from outside generate torque on the d-wave altermagnets. The injection can be achieved by the magnetic octupole Hall effect in an adjacent layer. We calculate the magnetic octupole Hall conductivity of the heavy metal Pt and find a sizable v…
▽ More
d-wave altermagnets have magnetic octupoles as their order parameters [Phys. Rev. X 14, 011019 (2024)]. We theoretically show that magnetic octupoles injected from outside generate torque on the d-wave altermagnets. The injection can be achieved by the magnetic octupole Hall effect in an adjacent layer. We calculate the magnetic octupole Hall conductivity of the heavy metal Pt and find a sizable value comparable to its spin Hall conductivity. Our work generalizes the spin Hall phenomenology (generation by heavy metals and detection by torque in ferromagnets) to the magnetic octupole Hall phenomenology (generation by heavy metals and detection by torque in altermagnets), which can be utilized to electrically control magnetic configurations of altermagnets.
△ Less
Submitted 22 September, 2024;
originally announced September 2024.
-
TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback
Authors:
Eunseop Yoon,
Hee Suk Yoon,
SooHwan Eom,
Gunsoo Han,
Daniel Wontae Nam,
Daejin Jo,
Kyoung-Woon On,
Mark A. Hasegawa-Johnson,
Sungwoong Kim,
Chang D. Yoo
Abstract:
Reinforcement Learning from Human Feedback (RLHF) leverages human preference data to train language models to align more closely with human essence. These human preference data, however, are labeled at the sequence level, creating a mismatch between sequence-level preference labels and tokens, which are autoregressively generated from the language model. Although several recent approaches have tri…
▽ More
Reinforcement Learning from Human Feedback (RLHF) leverages human preference data to train language models to align more closely with human essence. These human preference data, however, are labeled at the sequence level, creating a mismatch between sequence-level preference labels and tokens, which are autoregressively generated from the language model. Although several recent approaches have tried to provide token-level (i.e., dense) rewards for each individual token, these typically rely on predefined discrete reward values (e.g., positive: +1, negative: -1, neutral: 0), failing to account for varying degrees of preference inherent to each token. To address this limitation, we introduce TLCR (Token-Level Continuous Reward) for RLHF, which incorporates a discriminator trained to distinguish positive and negative tokens, and the confidence of the discriminator is used to assign continuous rewards to each token considering the context. Extensive experiments show that our proposed TLCR leads to consistent performance improvements over previous sequence-level or token-level discrete rewards on open-ended generation benchmarks.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models
Authors:
Dongwon Jo,
Taesu Kim,
Yulhwa Kim,
Jae-Joon Kim
Abstract:
Binarization, which converts weight parameters to binary values, has emerged as an effective strategy to reduce the size of large language models (LLMs). However, typical binarization techniques significantly diminish linguistic effectiveness of LLMs. To address this issue, we introduce a novel binarization technique called Mixture of Scales (BinaryMoS). Unlike conventional methods, BinaryMoS empl…
▽ More
Binarization, which converts weight parameters to binary values, has emerged as an effective strategy to reduce the size of large language models (LLMs). However, typical binarization techniques significantly diminish linguistic effectiveness of LLMs. To address this issue, we introduce a novel binarization technique called Mixture of Scales (BinaryMoS). Unlike conventional methods, BinaryMoS employs multiple scaling experts for binary weights, dynamically merging these experts for each token to adaptively generate scaling factors. This token-adaptive approach boosts the representational power of binarized LLMs by enabling contextual adjustments to the values of binary weights. Moreover, because this adaptive process only involves the scaling factors rather than the entire weight matrix, BinaryMoS maintains compression efficiency similar to traditional static binarization methods. Our experimental results reveal that BinaryMoS surpasses conventional binarization techniques in various natural language processing tasks and even outperforms 2-bit quantization methods, all while maintaining similar model size to static binarization techniques.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Endor: Hardware-Friendly Sparse Format for Offloaded LLM Inference
Authors:
Donghyeon Joo,
Ramyad Hadidi,
Soheil Feizi,
Bahar Asgari
Abstract:
The increasing size of large language models (LLMs) challenges their usage on resource-constrained platforms. For example, memory on modern GPUs is insufficient to hold LLMs that are hundreds of Gigabytes in size. Offloading is a popular method to escape this constraint by storing weights of an LLM model to host CPU memory and SSD, then loading each weight to GPU before every use. In our case stud…
▽ More
The increasing size of large language models (LLMs) challenges their usage on resource-constrained platforms. For example, memory on modern GPUs is insufficient to hold LLMs that are hundreds of Gigabytes in size. Offloading is a popular method to escape this constraint by storing weights of an LLM model to host CPU memory and SSD, then loading each weight to GPU before every use. In our case study of offloaded inference, we found that due to the low bandwidth between storage devices and GPU, the latency of transferring large model weights from its offloaded location to GPU memory becomes the critical bottleneck with actual compute taking nearly 0% of runtime. To effectively reduce the weight transfer latency, we propose a novel sparse format that compresses the unstructured sparse pattern of pruned LLM weights to non-zero values with high compression ratio and low decompression overhead. Endor achieves this by expressing the positions of non-zero elements with a bitmap. Compared to offloaded inference using the popular Huggingface Accelerate, applying Endor accelerates OPT-66B by 1.70x and Llama2-70B by 1.78x. When direct weight transfer from SSD to GPU is leveraged, Endor achieves 2.25x speedup on OPT-66B and 2.37x speedup on Llama2-70B.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Active Learning for Finely-Categorized Image-Text Retrieval by Selecting Hard Negative Unpaired Samples
Authors:
Dae Ung Jo,
Kyuewang Lee,
JaeHo Chung,
Jin Young Choi
Abstract:
Securing a sufficient amount of paired data is important to train an image-text retrieval (ITR) model, but collecting paired data is very expensive. To address this issue, in this paper, we propose an active learning algorithm for ITR that can collect paired data cost-efficiently. Previous studies assume that image-text pairs are given and their category labels are asked to the annotator. However,…
▽ More
Securing a sufficient amount of paired data is important to train an image-text retrieval (ITR) model, but collecting paired data is very expensive. To address this issue, in this paper, we propose an active learning algorithm for ITR that can collect paired data cost-efficiently. Previous studies assume that image-text pairs are given and their category labels are asked to the annotator. However, in the recent ITR studies, the importance of category label is decreased since a retrieval model can be trained with only image-text pairs. For this reason, we set up an active learning scenario where unpaired images (or texts) are given and the annotator provides corresponding texts (or images) to make paired data. The key idea of the proposed AL algorithm is to select unpaired images (or texts) that can be hard negative samples for existing texts (or images). To this end, we introduce a novel scoring function to choose hard negative samples. We validate the effectiveness of the proposed method on Flickr30K and MS-COCO datasets.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Direct observation of nanometer-scale orbital angular momentum accumulation
Authors:
Juan Carlos Idrobo,
Ján Rusz,
Gopal Datt,
Daegeun Jo,
Sanaz Alikhah,
David Muradas,
Ulrich Noumbe,
M. Venkata Kamalakar,
Peter M. Oppeneer
Abstract:
Conversion of charge to orbital angular momentum through the orbital Hall effect (OHE) holds transformative potential for the development of orbital-based electronics, however, it is challenging to directly observe the electrically generated orbital accumulation. Here, we detect the OHE by directly quantifying the orbital accumulation along the edges of a titanium thin film using a scanning transm…
▽ More
Conversion of charge to orbital angular momentum through the orbital Hall effect (OHE) holds transformative potential for the development of orbital-based electronics, however, it is challenging to directly observe the electrically generated orbital accumulation. Here, we detect the OHE by directly quantifying the orbital accumulation along the edges of a titanium thin film using a scanning transmission electron microscope. We measure the Ti L-edge using electron energy-loss spectroscopy with nanometer resolution and find a sizable orbital accumulation at the sample's outer perimeters, consistent with all signatures expected for the OHE, and determine an orbital diffusion length $\ell_o \approx 7.3$ nm. Our data points to a surprising dependence of the orbital diffusion length on the nano-structural morphology.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
High Q-factor diamond optomechanical resonators with silicon vacancy centers at millikelvin temperatures
Authors:
Graham D. Joe,
Cleaven Chia,
Benjamin Pingault,
Michael Haas,
Michelle Chalupnik,
Eliza Cornell,
Kazuhiro Kuruma,
Bartholomeus Machielse,
Neil Sinclair,
Srujan Meesala,
Marko Lončar
Abstract:
Phonons are envisioned as coherent intermediaries between different types of quantum systems. Engineered nanoscale devices such as optomechanical crystals (OMCs) provide a platform to utilize phonons as quantum information carriers. Here we demonstrate OMCs in diamond designed for strong interactions between phonons and a silicon vacancy (SiV) spin. Using optical measurements at millikelvin temper…
▽ More
Phonons are envisioned as coherent intermediaries between different types of quantum systems. Engineered nanoscale devices such as optomechanical crystals (OMCs) provide a platform to utilize phonons as quantum information carriers. Here we demonstrate OMCs in diamond designed for strong interactions between phonons and a silicon vacancy (SiV) spin. Using optical measurements at millikelvin temperatures, we measure a linewidth of 13 kHz (Q-factor of ~440,000) for 6 GHz acoustic modes, a record for diamond in the GHz frequency range and within an order of magnitude of state-of-the-art linewidths for OMCs in silicon. We investigate SiV optical and spin properties in these devices and outline a path towards a coherent spin-phonon interface.
△ Less
Submitted 28 October, 2023;
originally announced October 2023.
-
Hexa: Self-Improving for Knowledge-Grounded Dialogue System
Authors:
Daejin Jo,
Daniel Wontae Nam,
Gunsoo Han,
Kyoung-Woon On,
Taehwan Kwon,
Seungeun Rho,
Sungwoong Kim
Abstract:
A common practice in knowledge-grounded dialogue generation is to explicitly utilize intermediate steps (e.g., web-search, memory retrieval) with modular approaches. However, data for such steps are often inaccessible compared to those of dialogue responses as they are unobservable in an ordinary dialogue. To fill in the absence of these data, we develop a self-improving method to improve the gene…
▽ More
A common practice in knowledge-grounded dialogue generation is to explicitly utilize intermediate steps (e.g., web-search, memory retrieval) with modular approaches. However, data for such steps are often inaccessible compared to those of dialogue responses as they are unobservable in an ordinary dialogue. To fill in the absence of these data, we develop a self-improving method to improve the generative performances of intermediate steps without the ground truth data. In particular, we propose a novel bootstrapping scheme with a guided prompt and a modified loss function to enhance the diversity of appropriate self-generated responses. Through experiments on various benchmark datasets, we empirically demonstrate that our method successfully leverages a self-improving mechanism in generating intermediate and final responses and improves the performances on the task of knowledge-grounded dialogue generation.
△ Less
Submitted 2 April, 2024; v1 submitted 10 October, 2023;
originally announced October 2023.
-
Engineering Phonon-Qubit Interactions using Phononic Crystals
Authors:
Kazuhiro Kuruma,
Benjamin Pingault,
Cleaven Chia,
Michael Haas,
Graham D Joe,
Daniel Rimoli Assumpcao,
Sophie Weiyi Ding,
Chang Jin,
C. J. Xin,
Matthew Yeh,
Neil Sinclair,
Marko Lončar
Abstract:
The ability to control phonons in solids is key for diverse quantum applications, ranging from quantum information processing to sensing. Often, phonons are sources of noise and decoherence, since they can interact with a variety of solid-state quantum systems. To mitigate this, quantum systems typically operate at milli-Kelvin temperatures to reduce the number of thermal phonons. Here we demonstr…
▽ More
The ability to control phonons in solids is key for diverse quantum applications, ranging from quantum information processing to sensing. Often, phonons are sources of noise and decoherence, since they can interact with a variety of solid-state quantum systems. To mitigate this, quantum systems typically operate at milli-Kelvin temperatures to reduce the number of thermal phonons. Here we demonstrate an alternative approach that relies on engineering phononic density of states, drawing inspiration from photonic bandgap structures that have been used to control the spontaneous emission of quantum emitters. We design and fabricate diamond phononic crystals with a complete phononic bandgap spanning 50 - 70 gigahertz, tailored to suppress interactions of a single silicon-vacancy color center with resonant phonons of the thermal bath. At 4 Kelvin, we demonstrate a reduction of the phonon-induced orbital relaxation rate of the color center by a factor of 18 compared to bulk. Furthermore, we show that the phononic bandgap can efficiently suppress phonon-color center interactions up to 20 Kelvin. In addition to enabling operation of quantum memories at higher temperatures, the ability to engineer qubit-phonon interactions may enable new functionalities for quantum science and technology, where phonons are used as carriers of quantum information.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Oxide layer dependent orbital torque efficiency in ferromagnet/Cu/Oxide heterostructures
Authors:
Junyeon Kim,
Jun Uzuhashi,
Masafumi Horio,
Tomoaki Senoo,
Dongwook Go,
Daegeun Jo,
Toshihide Sumi,
Tetsuya Wada,
Iwao Matsuda,
Tadakatsu Ohkubo,
Seiji Mitani,
Hyun-Woo Lee,
YoshiChika Otani
Abstract:
The utilization of orbital transport provides a versatile and efficient spin manipulation mechanism. As interest in orbital-mediated spin manipulation grows, we face a new issue to identify the underlying physics that determines the efficiency of orbital torque (OT). In this study, we systematically investigate the variation of OT governed by orbital Rashba-Edelstein effect at the Cu/Oxide interfa…
▽ More
The utilization of orbital transport provides a versatile and efficient spin manipulation mechanism. As interest in orbital-mediated spin manipulation grows, we face a new issue to identify the underlying physics that determines the efficiency of orbital torque (OT). In this study, we systematically investigate the variation of OT governed by orbital Rashba-Edelstein effect at the Cu/Oxide interface, as we change the Oxide material. We find that OT varies by a factor of ~2, depending on the Oxide. Our results suggest that the active electronic interatomic interaction (hopping) between Cu and oxygen atom is critical in determining OT. This also gives us an idea of what type of material factors is critical in forming a chiral orbital Rashba texture at the Cu/Oxide interface.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
Squeezing Large-Scale Diffusion Models for Mobile
Authors:
Jiwoong Choi,
Minkyu Kim,
Daehyun Ahn,
Taesu Kim,
Yulhwa Kim,
Dongwon Jo,
Hyesung Jeon,
Jae-Joon Kim,
Hyungjun Kim
Abstract:
The emergence of diffusion models has greatly broadened the scope of high-fidelity image synthesis, resulting in notable advancements in both practical implementation and academic research. With the active adoption of the model in various real-world applications, the need for on-device deployment has grown considerably. However, deploying large diffusion models such as Stable Diffusion with more t…
▽ More
The emergence of diffusion models has greatly broadened the scope of high-fidelity image synthesis, resulting in notable advancements in both practical implementation and academic research. With the active adoption of the model in various real-world applications, the need for on-device deployment has grown considerably. However, deploying large diffusion models such as Stable Diffusion with more than one billion parameters to mobile devices poses distinctive challenges due to the limited computational and memory resources, which may vary according to the device. In this paper, we present the challenges and solutions for deploying Stable Diffusion on mobile devices with TensorFlow Lite framework, which supports both iOS and Android devices. The resulting Mobile Stable Diffusion achieves the inference latency of smaller than 7 seconds for a 512x512 image generation on Android devices with mobile GPUs.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Effortless Integration of Memory Management into Open-Domain Conversation Systems
Authors:
Eunbi Choi,
Kyoung-Woon On,
Gunsoo Han,
Sungwoong Kim,
Daniel Wontae Nam,
Daejin Jo,
Seung Eun Rho,
Taehwan Kwon,
Minjoon Seo
Abstract:
Open-domain conversation systems integrate multiple conversation skills into a single system through a modular approach. One of the limitations of the system, however, is the absence of management capability for external memory. In this paper, we propose a simple method to improve BlenderBot3 by integrating memory management ability into it. Since no training data exists for this purpose, we propo…
▽ More
Open-domain conversation systems integrate multiple conversation skills into a single system through a modular approach. One of the limitations of the system, however, is the absence of management capability for external memory. In this paper, we propose a simple method to improve BlenderBot3 by integrating memory management ability into it. Since no training data exists for this purpose, we propose an automating dataset creation for memory management. Our method 1) requires little cost for data construction, 2) does not affect performance in other tasks, and 3) reduces external memory. We show that our proposed model BlenderBot3-M^3, which is multi-task trained with memory management, outperforms BlenderBot3 with a relative 4% performance gain in terms of F1 score.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
MAGVLT: Masked Generative Vision-and-Language Transformer
Authors:
Sungwoong Kim,
Daejin Jo,
Donghoon Lee,
Jongmin Kim
Abstract:
While generative modeling on multimodal image-text data has been actively developed with large-scale paired datasets, there have been limited attempts to generate both image and text data by a single model rather than a generation of one fixed modality conditioned on the other modality. In this paper, we explore a unified generative vision-and-language (VL) model that can produce both images and t…
▽ More
While generative modeling on multimodal image-text data has been actively developed with large-scale paired datasets, there have been limited attempts to generate both image and text data by a single model rather than a generation of one fixed modality conditioned on the other modality. In this paper, we explore a unified generative vision-and-language (VL) model that can produce both images and text sequences. Especially, we propose a generative VL transformer based on the non-autoregressive mask prediction, named MAGVLT, and compare it with an autoregressive generative VL transformer (ARGVLT). In comparison to ARGVLT, the proposed MAGVLT enables bidirectional context encoding, fast decoding by parallel token predictions in an iterative refinement, and extended editing capabilities such as image and text infilling. For rigorous training of our MAGVLT with image-text pairs from scratch, we combine the image-to-text, text-to-image, and joint image-and-text mask prediction tasks. Moreover, we devise two additional tasks based on the step-unrolled mask prediction and the selective prediction on the mixture of two image-text pairs. Experimental results on various downstream generation tasks of VL benchmarks show that our MAGVLT outperforms ARGVLT by a large margin even with significant inference speedup. Particularly, MAGVLT achieves competitive results on both zero-shot image-to-text and text-to-image generation tasks from MS-COCO by one moderate-sized model (fewer than 500M parameters) even without the use of monomodal data and networks.
△ Less
Submitted 21 March, 2023;
originally announced March 2023.
-
LECO: Learnable Episodic Count for Task-Specific Intrinsic Reward
Authors:
Daejin Jo,
Sungwoong Kim,
Daniel Wontae Nam,
Taehwan Kwon,
Seungeun Rho,
Jongmin Kim,
Donghoon Lee
Abstract:
Episodic count has been widely used to design a simple yet effective intrinsic motivation for reinforcement learning with a sparse reward. However, the use of episodic count in a high-dimensional state space as well as over a long episode time requires a thorough state compression and fast hashing, which hinders rigorous exploitation of it in such hard and complex exploration environments. Moreove…
▽ More
Episodic count has been widely used to design a simple yet effective intrinsic motivation for reinforcement learning with a sparse reward. However, the use of episodic count in a high-dimensional state space as well as over a long episode time requires a thorough state compression and fast hashing, which hinders rigorous exploitation of it in such hard and complex exploration environments. Moreover, the interference from task-irrelevant observations in the episodic count may cause its intrinsic motivation to overlook task-related important changes of states, and the novelty in an episodic manner can lead to repeatedly revisit the familiar states across episodes. In order to resolve these issues, in this paper, we propose a learnable hash-based episodic count, which we name LECO, that efficiently performs as a task-specific intrinsic reward in hard exploration problems. In particular, the proposed intrinsic reward consists of the episodic novelty and the task-specific modulation where the former employs a vector quantized variational autoencoder to automatically obtain the discrete state codes for fast counting while the latter regulates the episodic novelty by learning a modulator to optimize the task-specific extrinsic reward. The proposed LECO specifically enables the automatic transition from exploration to exploitation during reinforcement learning. We experimentally show that in contrast to the previous exploration methods LECO successfully solves hard exploration problems and also scales to large state spaces through the most difficult tasks in MiniGrid and DMLab environments.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
Selective Token Generation for Few-shot Natural Language Generation
Authors:
Daejin Jo,
Taehwan Kwon,
Eun-Sol Kim,
Sungwoong Kim
Abstract:
Natural language modeling with limited training data is a challenging problem, and many algorithms make use of large-scale pretrained language models (PLMs) for this due to its great generalization ability. Among them, additive learning that incorporates a task-specific adapter on top of the fixed large-scale PLM has been popularly used in the few-shot setting. However, this added adapter is still…
▽ More
Natural language modeling with limited training data is a challenging problem, and many algorithms make use of large-scale pretrained language models (PLMs) for this due to its great generalization ability. Among them, additive learning that incorporates a task-specific adapter on top of the fixed large-scale PLM has been popularly used in the few-shot setting. However, this added adapter is still easy to disregard the knowledge of the PLM especially for few-shot natural language generation (NLG) since an entire sequence is usually generated by only the newly trained adapter. Therefore, in this work, we develop a novel additive learning algorithm based on reinforcement learning (RL) that selectively outputs language tokens between the task-general PLM and the task-specific adapter during both training and inference. This output token selection over the two generators allows the adapter to take into account solely the task-relevant parts in sequence generation, and therefore makes it more robust to overfitting as well as more stable in RL training. In addition, to obtain the complementary adapter from the PLM for each few-shot task, we exploit a separate selecting module that is also simultaneously trained using RL. Experimental results on various few-shot NLG tasks including question answering, data-to-text generation and text summarization demonstrate that the proposed selective token generation significantly outperforms the previous additive learning algorithms based on the PLMs.
△ Less
Submitted 16 September, 2022;
originally announced September 2022.
-
The PWLR Graph Representation: A Persistent Weisfeiler-Lehman scheme with Random Walks for Graph Classification
Authors:
Sun Woo Park,
Yun Young Choi,
Dosang Joe,
U Jin Choi,
Youngho Woo
Abstract:
This paper presents the Persistent Weisfeiler-Lehman Random walk scheme (abbreviated as PWLR) for graph representations, a novel mathematical framework which produces a collection of explainable low-dimensional representations of graphs with discrete and continuous node features. The proposed scheme effectively incorporates normalized Weisfeiler-Lehman procedure, random walks on graphs, and persis…
▽ More
This paper presents the Persistent Weisfeiler-Lehman Random walk scheme (abbreviated as PWLR) for graph representations, a novel mathematical framework which produces a collection of explainable low-dimensional representations of graphs with discrete and continuous node features. The proposed scheme effectively incorporates normalized Weisfeiler-Lehman procedure, random walks on graphs, and persistent homology. We thereby integrate three distinct properties of graphs, which are local topological features, node degrees, and global topological invariants, while preserving stability from graph perturbations. This generalizes many variants of Weisfeiler-Lehman procedures, which are primarily used to embed graphs with discrete node labels. Empirical results suggest that these representations can be efficiently utilized to produce comparable results to state-of-the-art techniques in classifying graphs with discrete node labels, and enhanced performances in classifying those with continuous node features.
△ Less
Submitted 29 August, 2022;
originally announced August 2022.
-
Insights From the NeurIPS 2021 NetHack Challenge
Authors:
Eric Hambro,
Sharada Mohanty,
Dmitrii Babaev,
Minwoo Byeon,
Dipam Chakraborty,
Edward Grefenstette,
Minqi Jiang,
Daejin Jo,
Anssi Kanervisto,
Jongmin Kim,
Sungwoong Kim,
Robert Kirk,
Vitaly Kurin,
Heinrich Küttler,
Taehwon Kwon,
Donghoon Lee,
Vegard Mella,
Nantas Nardelli,
Ivan Nazarov,
Nikita Ovsov,
Jack Parker-Holder,
Roberta Raileanu,
Karolis Ramanauskas,
Tim Rocktäschel,
Danielle Rothermel
, et al. (4 additional authors not shown)
Abstract:
In this report, we summarize the takeaways from the first NeurIPS 2021 NetHack Challenge. Participants were tasked with developing a program or agent that can win (i.e., 'ascend' in) the popular dungeon-crawler game of NetHack by interacting with the NetHack Learning Environment (NLE), a scalable, procedurally generated, and challenging Gym environment for reinforcement learning (RL). The challeng…
▽ More
In this report, we summarize the takeaways from the first NeurIPS 2021 NetHack Challenge. Participants were tasked with developing a program or agent that can win (i.e., 'ascend' in) the popular dungeon-crawler game of NetHack by interacting with the NetHack Learning Environment (NLE), a scalable, procedurally generated, and challenging Gym environment for reinforcement learning (RL). The challenge showcased community-driven progress in AI with many diverse approaches significantly beating the previously best results on NetHack. Furthermore, it served as a direct comparison between neural (e.g., deep RL) and symbolic AI, as well as hybrid systems, demonstrating that on NetHack symbolic bots currently outperform deep RL by a large margin. Lastly, no agent got close to winning the game, illustrating NetHack's suitability as a long-term benchmark for AI research.
△ Less
Submitted 22 March, 2022;
originally announced March 2022.
-
Observation of long-range orbital transport and giant orbital torque
Authors:
Hiroki Hayashi,
Daegeun Jo,
Dongwook Go,
Tenghua Gao,
Satoshi Haku,
Yuriy Mokrousov,
Hyun-Woo Lee,
Kazuya Ando
Abstract:
Modern spintronics relies on the generation of spin currents through spin-orbit coupling. The spin-current generation has been believed to be triggered by current-induced orbital dynamics, which governs the angular momentum transfer from the lattice to the electrons in solids. The fundamental role of the orbital response in the angular momentum dynamics suggests the importance of the orbital count…
▽ More
Modern spintronics relies on the generation of spin currents through spin-orbit coupling. The spin-current generation has been believed to be triggered by current-induced orbital dynamics, which governs the angular momentum transfer from the lattice to the electrons in solids. The fundamental role of the orbital response in the angular momentum dynamics suggests the importance of the orbital counterpart of spin currents: orbital currents. However, evidence for its existence has been elusive. Here, we demonstrate the generation of giant orbital currents and uncover fundamental features of the orbital response. We experimentally and theoretically show that orbital currents propagate over longer distances than spin currents by more than an order of magnitude in a ferromagnet and nonmagnets. Furthermore, we find that the orbital current enables electric manipulation of magnetization with efficiencies significantly higher than the spin counterpart. These findings open the door to orbitronics that exploits orbital transport and spin-orbital coupled dynamics in solid-state devices.
△ Less
Submitted 6 February, 2023; v1 submitted 28 February, 2022;
originally announced February 2022.
-
Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth
Authors:
Doyeon Kim,
Woonghyun Ka,
Pyungwhan Ahn,
Donggyu Joo,
Sehwan Chun,
Junmo Kim
Abstract:
Depth estimation from a single image is an important task that can be applied to various fields in computer vision, and has grown rapidly with the development of convolutional neural networks. In this paper, we propose a novel structure and training strategy for monocular depth estimation to further improve the prediction accuracy of the network. We deploy a hierarchical transformer encoder to cap…
▽ More
Depth estimation from a single image is an important task that can be applied to various fields in computer vision, and has grown rapidly with the development of convolutional neural networks. In this paper, we propose a novel structure and training strategy for monocular depth estimation to further improve the prediction accuracy of the network. We deploy a hierarchical transformer encoder to capture and convey the global context, and design a lightweight yet powerful decoder to generate an estimated depth map while considering local connectivity. By constructing connected paths between multi-scale local features and the global decoding stream with our proposed selective feature fusion module, the network can integrate both representations and recover fine details. In addition, the proposed decoder shows better performance than the previously proposed decoders, with considerably less computational complexity. Furthermore, we improve the depth-specific augmentation method by utilizing an important observation in depth estimation to enhance the model. Our network achieves state-of-the-art performance over the challenging depth dataset NYU Depth V2. Extensive experiments have been conducted to validate and show the effectiveness of the proposed approach. Finally, our model shows better generalisation ability and robustness than other comparative models.
△ Less
Submitted 29 October, 2022; v1 submitted 19 January, 2022;
originally announced January 2022.
-
Observation of the orbital Hall effect in a light metal Ti
Authors:
Young-Gwan Choi,
Daegeun Jo,
Kyung-Hun Ko,
Dongwook Go,
Kyung-Han Kim,
Hee Gyum Park,
Changyoung Kim,
Byoung-Chul Min,
Gyung-Min Choi,
Hyun-Woo Lee
Abstract:
The orbital angular momentum is a core ingredient of orbital magnetism, spin Hall effect, giant Rashba spin splitting, orbital Edelstein effect, and spin-orbit torque. However, its experimental detection is tricky. In particular, direct detection of the orbital Hall effect remains elusive despite its importance for electrical control of magnetic nanodevices. Here we report the direct observation o…
▽ More
The orbital angular momentum is a core ingredient of orbital magnetism, spin Hall effect, giant Rashba spin splitting, orbital Edelstein effect, and spin-orbit torque. However, its experimental detection is tricky. In particular, direct detection of the orbital Hall effect remains elusive despite its importance for electrical control of magnetic nanodevices. Here we report the direct observation of the orbital Hall effect in a light metal Ti. The Kerr rotation by the accumulated orbital magnetic moment is measured at Ti surfaces, whose result agrees with theoretical calculations semiquantitatively and is supported by the orbital torque measurement in Ti-based magnetic heterostructures. The results confirm the electron orbital angular momentum as an essential dynamic degree of freedom, which may provide a novel mechanism for the electric control of magnetism. The results may also deepen the understanding of spin, valley, phonon, and magnon dynamics coupled with orbital dynamics.
△ Less
Submitted 30 September, 2021;
originally announced September 2021.
-
Gigantic current control of coercive field and magnetic memory based on nm-thin ferromagnetic van der Waals Fe3GeTe2
Authors:
Kaixuan Zhang,
Seungyun Han,
Youjin Lee,
Matthew J. Coak,
Junghyun Kim,
Inho Hwang,
Suhan Son,
Jeacheol Shin,
Mijin Lim,
Daegeun Jo,
Kyoo Kim,
Dohun Kim,
Hyun-Woo Lee,
Je-Geun Park
Abstract:
Controlling magnetic states by a small current is essential for the next-generation of energy-efficient spintronic devices. However, it invariably requires considerable energy to change a magnetic ground state of intrinsically quantum nature governed by fundamental Hamiltonian, once stabilized below a phase transition temperature. We report that surprisingly an in-plane current can tune the magnet…
▽ More
Controlling magnetic states by a small current is essential for the next-generation of energy-efficient spintronic devices. However, it invariably requires considerable energy to change a magnetic ground state of intrinsically quantum nature governed by fundamental Hamiltonian, once stabilized below a phase transition temperature. We report that surprisingly an in-plane current can tune the magnetic state of nm-thin van der Waals ferromagnet Fe3GeTe2 from a hard magnetic state to a soft magnetic state. It is the direct demonstration of the current-induced substantial reduction of the coercive field. This surprising finding is possible because the in-plane current produces a highly unusual type of gigantic spin-orbit torque for Fe3GeTe2. And we further demonstrate a working model of a new nonvolatile magnetic memory based on the principle of our discovery in Fe3GeTe2, controlled by a tiny current. Our findings open up a new window of exciting opportunities for magnetic van der Waals materials with potentially huge impacts on the future development of spintronic and magnetic memory.
△ Less
Submitted 1 September, 2021; v1 submitted 27 August, 2021;
originally announced August 2021.
-
Orbitronics: Orbital Currents in Solids
Authors:
Dongwook Go,
Daegeun Jo,
Hyun-Woo Lee,
Mathias Kläui,
Yuriy Mokrousov
Abstract:
In solids, electronic Bloch states are formed by atomic orbitals. While it is natural to expect that orbital composition and information about Bloch states can be manipulated and transported, in analogy to the spin degree of freedom extensively studied in past decades, it has been assumed that orbital quenching by the crystal field prevents significant dynamics of orbital degrees of freedom. Howev…
▽ More
In solids, electronic Bloch states are formed by atomic orbitals. While it is natural to expect that orbital composition and information about Bloch states can be manipulated and transported, in analogy to the spin degree of freedom extensively studied in past decades, it has been assumed that orbital quenching by the crystal field prevents significant dynamics of orbital degrees of freedom. However, recent studies reveal that an orbital current, given by the flow of electrons with a finite orbital angular momentum, can be electrically generated and transported in wide classes of materials despite the effect of orbital quenching in the ground state. Orbital currents also play a fundamental role in the mechanisms of other transport phenomena such as spin Hall effect and valley Hall effect. Most importantly, it has been proposed that orbital currents can be used to induce magnetization dynamics, which is one of the most pivotal and explored aspects of magnetism. Here, we give an overview of recent progress and the current status of research on orbital currents. We review proposed physical mechanisms for generating orbital currents and discuss candidate materials where orbital currents are manifest. We review recent experiments on orbital current generation and transport and discuss various experimental methods to quantify this elusive object at the heart of $orbitronics$ $-$ an area which exploits the orbital degree of freedom as an information carrier in solid-state devices.
△ Less
Submitted 18 July, 2021;
originally announced July 2021.
-
Beyond 5G URLLC Evolution: New Service Modes and Practical Considerations
Authors:
Hirley Alves,
Gweon Do Jo,
JaeSheung Shin,
Choongil Yeh,
Nurul Huda Mahmood,
Carlos Lima,
Chanho Yoon,
Nandana Rahatheva,
Ok-Sun Park,
Seokki Kim,
Eunah Kim,
Ville Niemelä,
Hyeon Woo Lee,
Ari Pouttu,
Hyun Kyu Chung,
Matti Latva-aho
Abstract:
Ultra-reliable low latency communications (URLLC) arose to serve industrial IoT (IIoT) use cases within the 5G. Currently, it has inherent limitations to support future services. Based on state-of-the-art research and practical deployment experience, in this article, we introduce and advocate for three variants: broadband, scalable and extreme URLLC. We discuss use cases and key performance indica…
▽ More
Ultra-reliable low latency communications (URLLC) arose to serve industrial IoT (IIoT) use cases within the 5G. Currently, it has inherent limitations to support future services. Based on state-of-the-art research and practical deployment experience, in this article, we introduce and advocate for three variants: broadband, scalable and extreme URLLC. We discuss use cases and key performance indicators and identify technology enablers for the new service modes. We bring practical considerations from the IIoT testbed and provide an outlook toward some new research directions.
△ Less
Submitted 16 June, 2022; v1 submitted 7 June, 2021;
originally announced June 2021.
-
Long-Range Orbital Magnetoelectric Torque in Ferromagnets
Authors:
Dongwook Go,
Daegeun Jo,
Kyoung-Whan Kim,
Soogil Lee,
Min-Gu Kang,
Byong-Guk Park,
Stefan Blügel,
Hyun-Woo Lee,
Yuriy Mokrousov
Abstract:
While it is often assumed that the orbital response is suppressed and short-ranged due to strong crystal field potential and orbital quenching, we show that the orbital magnetoelectric response can be remarkably long-ranged in ferromagnets. In a bilayer consisting of a nonmagnet and a ferromagnet, spin injection from the interface results in spin accumulation and torque in the ferromagnet, which r…
▽ More
While it is often assumed that the orbital response is suppressed and short-ranged due to strong crystal field potential and orbital quenching, we show that the orbital magnetoelectric response can be remarkably long-ranged in ferromagnets. In a bilayer consisting of a nonmagnet and a ferromagnet, spin injection from the interface results in spin accumulation and torque in the ferromagnet, which rapidly oscillate and decay by spin dephasing. In contrast, we find that even when an external electric field is applied only on the nonmagnet, we find substantially long-ranged orbital magnetoelectric response in the FM, which can go far beyond the spin dephasing length. This unusual feature is attributed to nearly degenerate orbital characters imposed by the crystal symmetry, which form hotspots for the intrinsic orbital response. Because only the states near the hotspots contribute dominantly, the induced orbital angular momentum does not exhibit destructive interference among states with different momentum as in the case of the spin dephasing. This gives rise to a distinct type of orbital torque on the magnetization, increasing with the thickness of the ferromagnet. Such behavior may serve as critical long-sought evidence of orbital transport to be directly tested in experiments. Our findings open the possibility of using long-range orbital magnetoelectric effect in orbitronic device applications.
△ Less
Submitted 16 May, 2022; v1 submitted 15 June, 2021;
originally announced June 2021.
-
Influential Rank: A New Perspective of Post-training for Robust Model against Noisy Labels
Authors:
Seulki Park,
Hwanjun Song,
Daeho Um,
Dae Ung Jo,
Sangdoo Yun,
Jin Young Choi
Abstract:
Deep neural network can easily overfit to even noisy labels due to its high capacity, which degrades the generalization performance of a model. To overcome this issue, we propose a new approach for learning from noisy labels (LNL) via post-training, which can significantly improve the generalization performance of any pre-trained model on noisy label data. To this end, we rather exploit the overfi…
▽ More
Deep neural network can easily overfit to even noisy labels due to its high capacity, which degrades the generalization performance of a model. To overcome this issue, we propose a new approach for learning from noisy labels (LNL) via post-training, which can significantly improve the generalization performance of any pre-trained model on noisy label data. To this end, we rather exploit the overfitting property of a trained model to identify mislabeled samples. Specifically, our post-training approach gradually removes samples with high influence on the decision boundary and refines the decision boundary to improve generalization performance. Our post-training approach creates great synergies when combined with the existing LNL methods. Experimental results on various real-world and synthetic benchmark datasets demonstrate the validity of our approach in diverse realistic scenarios.
△ Less
Submitted 19 April, 2023; v1 submitted 14 June, 2021;
originally announced June 2021.
-
Low dimensional flow polytopes and their toric ideals
Authors:
Mátyás Domokos,
Dániel Joó
Abstract:
The toric ideal of a $d$-dimensional flow polytope has an initial ideal generated by square-free monomials of degree at most $d$. The toric ideal of a flow polytope of dimension at most four has an initial ideal generated by square-free monomials of degree at most two, with the only exception of the four-dimensional Birkhoff polytope, whose toric ideal has an initial ideal generated by a square-fr…
▽ More
The toric ideal of a $d$-dimensional flow polytope has an initial ideal generated by square-free monomials of degree at most $d$. The toric ideal of a flow polytope of dimension at most four has an initial ideal generated by square-free monomials of degree at most two, with the only exception of the four-dimensional Birkhoff polytope, whose toric ideal has an initial ideal generated by a square-free cubic monomial. The proof is based on a method to classify certain compressed flow polytopes, and a construction of a quadratic pulling triangulation of them. Along the way compressed flow polytopes are classified up to dimension four, and their Ehrhart polynomials are computed.
△ Less
Submitted 9 May, 2021;
originally announced May 2021.
-
Orbital Rashba effect in surface oxidized Cu film
Authors:
Dongwook Go,
Daegeun Jo,
Tenghua Gao,
Kazuya Ando,
Stefan Blügel,
Hyun-Woo Lee,
Yuriy Mokrousov
Abstract:
Recent experimental observation of unexpectedly large current-induced spin-orbit torque in surface oxidized Cu on top of a ferromagnet suggested a possible role of the orbital Rashba effect (ORE). With this motivation, we investigate the ORE from first principles by considering an oxygen monolayer on top of a Cu(111) film. We show that surface oxidization of Cu film leads to gigantic enhancement o…
▽ More
Recent experimental observation of unexpectedly large current-induced spin-orbit torque in surface oxidized Cu on top of a ferromagnet suggested a possible role of the orbital Rashba effect (ORE). With this motivation, we investigate the ORE from first principles by considering an oxygen monolayer on top of a Cu(111) film. We show that surface oxidization of Cu film leads to gigantic enhancement of the ORE for states near the Fermi surface. The resulting chiral orbital texture in the momentum space is exceptionally strong, reaching $\sim 0.5\hbar$ in magnitude. We find that resonant hybridization between O $p$-states and Cu $d$-states is responsible for the emergence of the ORE. We demonstrate that application of an external electric field generates huge orbital Hall current, which is an order of magnitude larger than the spin Hall current found in heavy metals. This implies that "orbital torque" mechanism may be significant in surface oxidized Cu/ferromagnet structures. It also encourages experimental verification of the orbital texture in surface oxidized Cu through optical measurements such as angle-resolved photoemission spectroscopy.
△ Less
Submitted 17 November, 2020;
originally announced November 2020.
-
TiVGAN: Text to Image to Video Generation with Step-by-Step Evolutionary Generator
Authors:
Doyeon Kim,
Donggyu Joo,
Junmo Kim
Abstract:
Advances in technology have led to the development of methods that can create desired visual multimedia. In particular, image generation using deep learning has been extensively studied across diverse fields. In comparison, video generation, especially on conditional inputs, remains a challenging and less explored area. To narrow this gap, we aim to train our model to produce a video corresponding…
▽ More
Advances in technology have led to the development of methods that can create desired visual multimedia. In particular, image generation using deep learning has been extensively studied across diverse fields. In comparison, video generation, especially on conditional inputs, remains a challenging and less explored area. To narrow this gap, we aim to train our model to produce a video corresponding to a given text description. We propose a novel training framework, Text-to-Image-to-Video Generative Adversarial Network (TiVGAN), which evolves frame-by-frame and finally produces a full-length video. In the first phase, we focus on creating a high-quality single video frame while learning the relationship between the text and an image. As the steps proceed, our model is trained gradually on more number of consecutive frames.This step-by-step learning process helps stabilize the training and enables the creation of high-resolution video based on conditional text descriptions. Qualitative and quantitative experimental results on various datasets demonstrate the effectiveness of the proposed method.
△ Less
Submitted 27 June, 2021; v1 submitted 4 September, 2020;
originally announced September 2020.
-
Class-Attentive Diffusion Network for Semi-Supervised Classification
Authors:
Jongin Lim,
Daeho Um,
Hyung Jin Chang,
Dae Ung Jo,
Jin Young Choi
Abstract:
Recently, graph neural networks for semi-supervised classification have been widely studied. However, existing methods only use the information of limited neighbors and do not deal with the inter-class connections in graphs. In this paper, we propose Adaptive aggregation with Class-Attentive Diffusion (AdaCAD), a new aggregation scheme that adaptively aggregates nodes probably of the same class am…
▽ More
Recently, graph neural networks for semi-supervised classification have been widely studied. However, existing methods only use the information of limited neighbors and do not deal with the inter-class connections in graphs. In this paper, we propose Adaptive aggregation with Class-Attentive Diffusion (AdaCAD), a new aggregation scheme that adaptively aggregates nodes probably of the same class among K-hop neighbors. To this end, we first propose a novel stochastic process, called Class-Attentive Diffusion (CAD), that strengthens attention to intra-class nodes and attenuates attention to inter-class nodes. In contrast to the existing diffusion methods with a transition matrix determined solely by the graph structure, CAD considers both the node features and the graph structure with the design of our class-attentive transition matrix that utilizes a classifier. Then, we further propose an adaptive update scheme that leverages different reflection ratios of the diffusion result for each node depending on the local class-context. As the main advantage, AdaCAD alleviates the problem of undesired mixing of inter-class features caused by discrepancies between node labels and the graph topology. Built on AdaCAD, we construct a simple model called Class-Attentive Diffusion Network (CAD-Net). Extensive experiments on seven benchmark datasets consistently demonstrate the efficacy of the proposed method and our CAD-Net significantly outperforms the state-of-the-art methods. Code is available at https://github.com/ljin0429/CAD-Net.
△ Less
Submitted 29 December, 2020; v1 submitted 17 June, 2020;
originally announced June 2020.
-
Streaming Language Identification using Combination of Acoustic Representations and ASR Hypotheses
Authors:
Chander Chandak,
Zeynab Raeesy,
Ariya Rastrow,
Yuzong Liu,
Xiangyang Huang,
Siyu Wang,
Dong Kwon Joo,
Roland Maas
Abstract:
This paper presents our modeling and architecture approaches for building a highly accurate low-latency language identification system to support multilingual spoken queries for voice assistants. A common approach to solve multilingual speech recognition is to run multiple monolingual ASR systems in parallel and rely on a language identification (LID) component that detects the input language. Con…
▽ More
This paper presents our modeling and architecture approaches for building a highly accurate low-latency language identification system to support multilingual spoken queries for voice assistants. A common approach to solve multilingual speech recognition is to run multiple monolingual ASR systems in parallel and rely on a language identification (LID) component that detects the input language. Conventionally, LID relies on acoustic only information to detect input language. We propose an approach that learns and combines acoustic level representations with embeddings estimated on ASR hypotheses resulting in up to 50% relative reduction of identification error rate, compared to a model that uses acoustic only features. Furthermore, to reduce the processing cost and latency, we exploit a streaming architecture to identify the spoken language early when the system reaches a predetermined confidence level, alleviating the need to run multiple ASR systems until the end of input query. The combined acoustic and text LID, coupled with our proposed streaming runtime architecture, results in an average of 1500ms early identification for more than 50% of utterances, with almost no degradation in accuracy. We also show improved results by adopting a semi-supervised learning (SSL) technique using the newly proposed model architecture as a teacher model.
△ Less
Submitted 1 June, 2020;
originally announced June 2020.
-
Token Manipulation Generative Adversarial Network for Text Generation
Authors:
DaeJin Jo
Abstract:
MaskGAN opens the query for the conditional language model by filling in the blanks between the given tokens. In this paper, we focus on addressing the limitations caused by having to specify blanks to be filled. We decompose conditional text generation problem into two tasks, make-a-blank and fill-in-the-blank, and extend the former to handle more complex manipulations on the given tokens. We cas…
▽ More
MaskGAN opens the query for the conditional language model by filling in the blanks between the given tokens. In this paper, we focus on addressing the limitations caused by having to specify blanks to be filled. We decompose conditional text generation problem into two tasks, make-a-blank and fill-in-the-blank, and extend the former to handle more complex manipulations on the given tokens. We cast these tasks as a hierarchical multi agent RL problem and introduce a conditional adversarial learning that allows the agents to reach a goal, producing realistic texts, in cooperative setting. We show that the proposed model not only addresses the limitations but also provides good results without compromising the performance in terms of quality and diversity.
△ Less
Submitted 11 May, 2020; v1 submitted 6 May, 2020;
originally announced May 2020.
-
Continual Learning with Extended Kronecker-factored Approximate Curvature
Authors:
Janghyeon Lee,
Hyeong Gwon Hong,
Donggyu Joo,
Junmo Kim
Abstract:
We propose a quadratic penalty method for continual learning of neural networks that contain batch normalization (BN) layers. The Hessian of a loss function represents the curvature of the quadratic penalty function, and a Kronecker-factored approximate curvature (K-FAC) is used widely to practically compute the Hessian of a neural network. However, the approximation is not valid if there is depen…
▽ More
We propose a quadratic penalty method for continual learning of neural networks that contain batch normalization (BN) layers. The Hessian of a loss function represents the curvature of the quadratic penalty function, and a Kronecker-factored approximate curvature (K-FAC) is used widely to practically compute the Hessian of a neural network. However, the approximation is not valid if there is dependence between examples, typically caused by BN layers in deep network architectures. We extend the K-FAC method so that the inter-example relations are taken into account and the Hessian of deep neural networks can be properly approximated under practical assumptions. We also propose a method of weight merging and reparameterization to properly handle statistical parameters of BN, which plays a critical role for continual learning with BN, and a method that selects hyperparameters without source task data. Our method shows better performance than baselines in the permuted MNIST task with BN layers and in sequential learning from the ImageNet classification task to fine-grained classification tasks with ResNet-50, without any explicit or implicit use of source task data for hyperparameter selection.
△ Less
Submitted 16 April, 2020;
originally announced April 2020.
-
Residual Continual Learning
Authors:
Janghyeon Lee,
Donggyu Joo,
Hyeong Gwon Hong,
Junmo Kim
Abstract:
We propose a novel continual learning method called Residual Continual Learning (ResCL). Our method can prevent the catastrophic forgetting phenomenon in sequential learning of multiple tasks, without any source task information except the original network. ResCL reparameterizes network parameters by linearly combining each layer of the original network and a fine-tuned network; therefore, the siz…
▽ More
We propose a novel continual learning method called Residual Continual Learning (ResCL). Our method can prevent the catastrophic forgetting phenomenon in sequential learning of multiple tasks, without any source task information except the original network. ResCL reparameterizes network parameters by linearly combining each layer of the original network and a fine-tuned network; therefore, the size of the network does not increase at all. To apply the proposed method to general convolutional neural networks, the effects of batch normalization layers are also considered. By utilizing residual-learning-like reparameterization and a special weight decay loss, the trade-off between source and target performance is effectively controlled. The proposed method exhibits state-of-the-art performance in various continual learning scenarios.
△ Less
Submitted 17 February, 2020;
originally announced February 2020.
-
Non-trivial charge-to-spin conversion in ferromagnetic metal/Cu/Al2O3 by orbital transport
Authors:
Junyeon Kim,
Dongwook Go,
Hanshen Tsai,
Daegeun Jo,
Kouta Kondou,
Hyun-Woo Lee,
YoshiChika Otani
Abstract:
Efficient spin/charge interconversion is desired to develop innovative spin-based devices. So far, the interconversion has been performed by using heavy atomic elements, strong spin-orbit interaction of which realizes the interconversion through the spin Hall effect and the Edelstein effect. We demonstrate highly efficient charge-to-spin conversion in a ferromagnetic metal/Cu/Al2O3 trilayers, whic…
▽ More
Efficient spin/charge interconversion is desired to develop innovative spin-based devices. So far, the interconversion has been performed by using heavy atomic elements, strong spin-orbit interaction of which realizes the interconversion through the spin Hall effect and the Edelstein effect. We demonstrate highly efficient charge-to-spin conversion in a ferromagnetic metal/Cu/Al2O3 trilayers, which do not contain any heavy element. The resulting spin torque efficiency is higher than those of conventional spin Hall and Rashba systems consisting of heavy elements such as Pt and Bi. Our experimental results qualitatively deviate from typical behaviors arising from spin transport. However, they are surprisingly consistent with the behaviors arising from the orbital transport. Our results thus demonstrate a new direction for efficient charge-to-spin conversion through the orbital transport.
△ Less
Submitted 25 February, 2020; v1 submitted 3 February, 2020;
originally announced February 2020.
-
Cross-modal Variational Auto-encoder with Distributed Latent Spaces and Associators
Authors:
Dae Ung Jo,
ByeongJu Lee,
Jongwon Choi,
Haanju Yoo,
Jin Young Choi
Abstract:
In this paper, we propose a novel structure for a cross-modal data association, which is inspired by the recent research on the associative learning structure of the brain. We formulate the cross-modal association in Bayesian inference framework realized by a deep neural network with multiple variational auto-encoders and variational associators. The variational associators transfer the latent spa…
▽ More
In this paper, we propose a novel structure for a cross-modal data association, which is inspired by the recent research on the associative learning structure of the brain. We formulate the cross-modal association in Bayesian inference framework realized by a deep neural network with multiple variational auto-encoders and variational associators. The variational associators transfer the latent spaces between auto-encoders that represent different modalities. The proposed structure successfully associates even heterogeneous modal data and easily incorporates the additional modality to the entire network via the proposed cross-modal associator. Furthermore, the proposed structure can be trained with only a small amount of paired data since auto-encoders can be trained by unsupervised manner. Through experiments, the effectiveness of the proposed structure is validated on various datasets including visual and auditory data.
△ Less
Submitted 30 May, 2019;
originally announced May 2019.
-
Backbone Can Not be Trained at Once: Rolling Back to Pre-trained Network for Person Re-Identification
Authors:
Youngmin Ro,
Jongwon Choi,
Dae Ung Jo,
Byeongho Heo,
Jongin Lim,
Jin Young Choi
Abstract:
In person re-identification (ReID) task, because of its shortage of trainable dataset, it is common to utilize fine-tuning method using a classification network pre-trained on a large dataset. However, it is relatively difficult to sufficiently fine-tune the low-level layers of the network due to the gradient vanishing problem. In this work, we propose a novel fine-tuning strategy that allows low-…
▽ More
In person re-identification (ReID) task, because of its shortage of trainable dataset, it is common to utilize fine-tuning method using a classification network pre-trained on a large dataset. However, it is relatively difficult to sufficiently fine-tune the low-level layers of the network due to the gradient vanishing problem. In this work, we propose a novel fine-tuning strategy that allows low-level layers to be sufficiently trained by rolling back the weights of high-level layers to their initial pre-trained weights. Our strategy alleviates the problem of gradient vanishing in low-level layers and robustly trains the low-level layers to fit the ReID dataset, thereby increasing the performance of ReID tasks. The improved performance of the proposed strategy is validated via several experiments. Furthermore, without any add-ons such as pose estimation or segmentation, our strategy exhibits state-of-the-art performance using only vanilla deep convolutional neural network architecture.
△ Less
Submitted 18 January, 2019;
originally announced January 2019.
-
A Proof of the Beierle-Kranz-Leander Conjecture related to Lightweight Multiplication in $\mathds{F}_{2^n}$
Authors:
Sihem Mesnager,
Kwang Ho Kim,
Dujin Jo,
Junyop Choe,
Munhyon Han,
Dok Nam Lee
Abstract:
Lightweight cryptography is a key tool for building strong security solutions for pervasive devices with limited resources. Due to the stringent cost constraints inherent in extremely large applications (ranging from RFIDs and smart cards to mobile devices), the efficient implementation of cryptographic hardware and software algorithms is of utmost importance to realize the vision of generalized c…
▽ More
Lightweight cryptography is a key tool for building strong security solutions for pervasive devices with limited resources. Due to the stringent cost constraints inherent in extremely large applications (ranging from RFIDs and smart cards to mobile devices), the efficient implementation of cryptographic hardware and software algorithms is of utmost importance to realize the vision of generalized computing.
In CRYPTO 2016, Beierle, Kranz and Leander have considered lightweight multiplication in $\mathds{F}_{2^n}$. Specifically, they have considered the fundamental question of optimizing finite field multiplications with one fixed element and investigated which field representation, that is which choice of basis, allows for an optimal implementation. They have left open a conjecture related to two XOR-count. Using the theory of linear algebra, we prove in the present paper that their conjecture is correct. Consequently, this proved conjecture can be used as a reference for further developing and implementing cryptography algorithms in lightweight devices.
△ Less
Submitted 23 December, 2018;
originally announced December 2018.
-
Gigantic intrinsic orbital Hall effects in weakly spin-orbit coupled metals
Authors:
Daegeun Jo,
Dongwook Go,
Hyun-Woo Lee
Abstract:
A recent paper [Go $\textit{et al}$., Phys. Rev. Lett. $\textbf{121}$, 086602 (2018)] proposed that the intrinsic orbital Hall effect (OHE) can emerge from momentum-space orbital texture in centrosymmetric materials. In searching for real materials with strong OHE, we investigate the intrinsic OHE in metals with small spin-orbit coupling (SOC) in face-centered cubic and body-centered cubic structu…
▽ More
A recent paper [Go $\textit{et al}$., Phys. Rev. Lett. $\textbf{121}$, 086602 (2018)] proposed that the intrinsic orbital Hall effect (OHE) can emerge from momentum-space orbital texture in centrosymmetric materials. In searching for real materials with strong OHE, we investigate the intrinsic OHE in metals with small spin-orbit coupling (SOC) in face-centered cubic and body-centered cubic structures (Li, Al, V, Cr, Mn, Ni, and Cu). We find that orbital Hall conductivities (OHCs) in these materials are gigantic $\sim 10^3-10^4\ (\hbar/e)(Ω\cdot\mathrm{cm})^{-1}$, which are comparable or larger than spin Hall conductivity (SHC) of Pt. Although SHCs in these materials are smaller than OHCs due to small SOC, we found that SHCs are still sizable and the spin Hall angles may be of the order of 0.1. We discuss implications on recent spin-charge interconversion experiments on materials having small SOC.
△ Less
Submitted 15 November, 2018; v1 submitted 16 August, 2018;
originally announced August 2018.
-
Generating a Fusion Image: One's Identity and Another's Shape
Authors:
Donggyu Joo,
Doyeon Kim,
Junmo Kim
Abstract:
Generating a novel image by manipulating two input images is an interesting research problem in the study of generative adversarial networks (GANs). We propose a new GAN-based network that generates a fusion image with the identity of input image x and the shape of input image y. Our network can simultaneously train on more than two image datasets in an unsupervised manner. We define an identity l…
▽ More
Generating a novel image by manipulating two input images is an interesting research problem in the study of generative adversarial networks (GANs). We propose a new GAN-based network that generates a fusion image with the identity of input image x and the shape of input image y. Our network can simultaneously train on more than two image datasets in an unsupervised manner. We define an identity loss LI to catch the identity of image x and a shape loss LS to get the shape of y. In addition, we propose a novel training method called Min-Patch training to focus the generator on crucial parts of an image, rather than its entirety. We show qualitative results on the VGG Youtube Pose dataset, Eye dataset (MPIIGaze and UnityEyes), and the Photo-Sketch-Cartoon dataset.
△ Less
Submitted 25 January, 2022; v1 submitted 20 April, 2018;
originally announced April 2018.
-
Intrinsic Spin and Orbital Hall Effects from Orbital Texture
Authors:
Dongwook Go,
Daegeun Jo,
Changyoung Kim,
Hyun-Woo Lee
Abstract:
We show theoretically that both intrinsic spin Hall effect (SHE) and orbital Hall effect (OHE) can arise in centrosymmetric systems through momentum-space orbital texture, which is ubiquitous even in centrosymmetric systems unlike spin texture. OHE occurs even without spin-orbit coupling (SOC) and is converted into SHE through SOC. The resulting spin Hall conductivity is large (comparable to that…
▽ More
We show theoretically that both intrinsic spin Hall effect (SHE) and orbital Hall effect (OHE) can arise in centrosymmetric systems through momentum-space orbital texture, which is ubiquitous even in centrosymmetric systems unlike spin texture. OHE occurs even without spin-orbit coupling (SOC) and is converted into SHE through SOC. The resulting spin Hall conductivity is large (comparable to that of Pt) but depends on the SOC strength in a nonmonotonic way. This mechanism is stable against orbital quenching. This work suggests a path for an ongoing search for materials with stronger SHE. It also calls for experimental efforts to probe orbital degrees of freedom in OHE and SHE. Possible ways for experimental detection are briefly discussed.
△ Less
Submitted 12 July, 2018; v1 submitted 5 April, 2018;
originally announced April 2018.
-
Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras
Authors:
Keunwoo Choi,
Deokjin Joo,
Juho Kim
Abstract:
We introduce Kapre, Keras layers for audio and music signal preprocessing. Music research using deep neural networks requires a heavy and tedious preprocessing stage, for which audio processing parameters are often ignored in parameter optimisation. To solve this problem, Kapre implements time-frequency conversions, normalisation, and data augmentation as Keras layers. We report simple benchmark r…
▽ More
We introduce Kapre, Keras layers for audio and music signal preprocessing. Music research using deep neural networks requires a heavy and tedious preprocessing stage, for which audio processing parameters are often ignored in parameter optimisation. To solve this problem, Kapre implements time-frequency conversions, normalisation, and data augmentation as Keras layers. We report simple benchmark results, showing real-time on-GPU preprocessing adds a reasonable amount of computation.
△ Less
Submitted 19 June, 2017;
originally announced June 2017.
-
Automatic Content-aware Projection for 360° Videos
Authors:
Yeong Won Kim,
Dae-Yong Jo,
Chang-Ryeol Lee,
Hyeok-Jae Choi,
Yong Hoon Kwon,
Kuk-Jin Yoon
Abstract:
To watch 360° videos on normal 2D displays, we need to project the selected part of the 360° image onto the 2D display plane. In this paper, we propose a fully-automated framework for generating content-aware 2D normal-view perspective videos from 360° videos. Especially, we focus on the projection step preserving important image contents and reducing image distortion. Basically, our projection me…
▽ More
To watch 360° videos on normal 2D displays, we need to project the selected part of the 360° image onto the 2D display plane. In this paper, we propose a fully-automated framework for generating content-aware 2D normal-view perspective videos from 360° videos. Especially, we focus on the projection step preserving important image contents and reducing image distortion. Basically, our projection method is based on Pannini projection model. At first, the salient contents such as linear structures and salient regions in the image are preserved by optimizing the single Panini projection model. Then, the multiple Panini projection models at salient regions are interpolated to suppress image distortion globally. Finally, the temporal consistency for image projection is enforced for producing temporally stable normal-view videos. Our proposed projection method does not require any user-interaction and is much faster than previous content-preserving methods. It can be applied to not only images but also videos taking the temporal consistency of projection into account. Experiments on various 360° videos show the superiority of the proposed projection method quantitatively and qualitatively.
△ Less
Submitted 10 September, 2017; v1 submitted 24 April, 2017;
originally announced April 2017.
-
Toric quiver cells
Authors:
M. Domokos,
Dániel Joó
Abstract:
It is shown that up to dimension four, the toric ideal of a quiver polytope is generated in degree two, with the only exception of the four-dimensional Birkhoff polytope. As a consequence, Bøgvad's conjecture holds for quiver polytopes of dimension at most four. In arbitrary dimension, the toric ideal of a compressed polytope is generated in degree two if the polytope has no neighbouring singular…
▽ More
It is shown that up to dimension four, the toric ideal of a quiver polytope is generated in degree two, with the only exception of the four-dimensional Birkhoff polytope. As a consequence, Bøgvad's conjecture holds for quiver polytopes of dimension at most four. In arbitrary dimension, the toric ideal of a compressed polytope is generated in degree two if the polytope has no neighbouring singular vertices. Furthermore, the toric ideal of a compressed polytope with at most one singular vertex has a quadratic Gröbner basis.
△ Less
Submitted 3 July, 2017; v1 submitted 12 September, 2016;
originally announced September 2016.
-
On the dimension of polynomial semirings
Authors:
Dániel Joó,
Kalina Mincheva
Abstract:
In our previous work, motivated by the study of tropical polynomials, a definition for prime congruences was given for an arbitrary commutative semiring. It was shown that for additively idempotent semirings this class exhibits some analogous properties to prime ideals in ring theory. The current paper focuses on the resulting notion of Krull dimension, which is defined as the length of the longes…
▽ More
In our previous work, motivated by the study of tropical polynomials, a definition for prime congruences was given for an arbitrary commutative semiring. It was shown that for additively idempotent semirings this class exhibits some analogous properties to prime ideals in ring theory. The current paper focuses on the resulting notion of Krull dimension, which is defined as the length of the longest chain of prime congruences. Our main result states that for any additively idempotent semiring $A$, the semiring of polynomials $A[x]$ and the semiring of Laurent polynomials $A(x)$, we have $\dim A[x] = \dim A(x) = \dim A + 1$.
△ Less
Submitted 8 October, 2015;
originally announced October 2015.
-
Prime congruences of idempotent semirings and a Nullstellensatz for tropical polynomials
Authors:
Dániel Joó,
Kalina Mincheva
Abstract:
A new definition of prime congruences in additively idempotent semirings is given using twisted products. This class turns out to exhibit some analogous properties to the prime ideals of commutative rings. In order to establish a good notion of radical congruences it is shown that the intersection of all primes of a semiring can be characterized by certain twisted power formulas. A complete descri…
▽ More
A new definition of prime congruences in additively idempotent semirings is given using twisted products. This class turns out to exhibit some analogous properties to the prime ideals of commutative rings. In order to establish a good notion of radical congruences it is shown that the intersection of all primes of a semiring can be characterized by certain twisted power formulas. A complete description of prime congruences is given in the polynomial and Laurent polynomial semirings over the tropical semifield ${\pmb T}$, the semifield $\mathbb{Z}_{max}$ and the two element semifield $\mathbb{B}$. The minimal primes of these semirings correspond to monomial orderings, and their intersection is the congruence that identifies polynomials that have the same Newton polytope. It is then shown that every finitely generated congruence in each of these cases is an intersection of prime congruences with quotients of Krull dimension $1$. An improvement of a result of A. Bertram and R. Easton from 2013 is proven which can be regarded as a Nullstellensatz for tropical polynomials.
△ Less
Submitted 13 September, 2017; v1 submitted 17 August, 2014;
originally announced August 2014.
-
On the equations and classification of toric quiver varieties
Authors:
M. Domokos,
Dániel Joó
Abstract:
Toric quiver varieties (moduli spaces of quiver representations) are studied. Given a quiver and a weight there is an associated quasiprojective toric variety together with a canonical embedding into projective space. It is shown that for a quiver with no oriented cycles the homogeneous ideal of this embedded projective variety is generated by elements of degree at most $3$. In each fixed dimensio…
▽ More
Toric quiver varieties (moduli spaces of quiver representations) are studied. Given a quiver and a weight there is an associated quasiprojective toric variety together with a canonical embedding into projective space. It is shown that for a quiver with no oriented cycles the homogeneous ideal of this embedded projective variety is generated by elements of degree at most $3$. In each fixed dimension $d$ up to isomorphism there are only finitely many $d$-dimensional toric quiver varieties. A procedure for their classification is outlined.
△ Less
Submitted 20 February, 2014;
originally announced February 2014.
-
Jacobian Conjecture in two dimension
Authors:
Dosang Joe
Abstract:
Let $(P, Q)$ be a pair of Jacobian polynomials. We can show that $ <P, Q>+l+2g(P)-2= 0= <P, [P,Q]>$, where $<f, g>$ is the intersection number of $f, g\in \CC[x, y]$ in the affine plane, $l$ is the number of branch at point at infinity and $g(P)$ is the geometric genus of affine curve defined by $P$. Hence we can show that every Jacobian polynomial defines a smooth rational curve with one point at…
▽ More
Let $(P, Q)$ be a pair of Jacobian polynomials. We can show that $ <P, Q>+l+2g(P)-2= 0= <P, [P,Q]>$, where $<f, g>$ is the intersection number of $f, g\in \CC[x, y]$ in the affine plane, $l$ is the number of branch at point at infinity and $g(P)$ is the geometric genus of affine curve defined by $P$. Hence we can show that every Jacobian polynomial defines a smooth rational curve with one point at infinity. It is sufficient to fix the Jacobian conjecture in two dimension by the Abhyankar theorem or the Abhyankar-Moh-Suzuki theorem.
△ Less
Submitted 13 September, 2013; v1 submitted 14 June, 2013;
originally announced June 2013.
-
Structure analysis of single- and multi-frequency subspace migrations in inverse scattering problems
Authors:
Young Deuk Jo,
Young Mi Kwon,
Joo Young Huh,
Won-Kwang Park
Abstract:
In this literature, we carefully investigate the structure of single- and multi-frequency imaging functions, that are usually employed in inverse scattering problems. Based on patterns of the singular vectors of the Multi-Static Response (MSR) matrix, we establish a relationship between imaging functions and the Bessel function. This relationship indicates certain properties of imaging functions a…
▽ More
In this literature, we carefully investigate the structure of single- and multi-frequency imaging functions, that are usually employed in inverse scattering problems. Based on patterns of the singular vectors of the Multi-Static Response (MSR) matrix, we establish a relationship between imaging functions and the Bessel function. This relationship indicates certain properties of imaging functions and the reason behind enhancement in the imaging performance by multiple frequencies. Several numerical simulations with a large amount of noisy data are performed in order to support our investigation.
△ Less
Submitted 9 January, 2013; v1 submitted 2 August, 2012;
originally announced August 2012.
-
Synchronous imaging for rapid visualization of complex vibration profiles in electromechanical microresonators
Authors:
Yoav Linzon,
Daniel J. Joe,
Slava Krylov,
Bojan Ilic,
Juraj Topolancik,
Jeevak M. Parpia,
Halrod G. Craighead
Abstract:
Synchronous imaging is used in dynamic space-domain vibration profile studies of capacitively driven, thin n+ doped poly-silicon microbridges oscillating at rf frequencies. Fast and high-resolution actuation profile measurements of micromachined resonators are useful when significant device nonlinearities are present. For example, bridges under compressive stress near the critical Euler value ofte…
▽ More
Synchronous imaging is used in dynamic space-domain vibration profile studies of capacitively driven, thin n+ doped poly-silicon microbridges oscillating at rf frequencies. Fast and high-resolution actuation profile measurements of micromachined resonators are useful when significant device nonlinearities are present. For example, bridges under compressive stress near the critical Euler value often reveal complex dynamics stemming from a state close to the onset of buckling. This leads to enhanced sensitivity of the vibration modes to external conditions, such as pressure, temperatures, and chemical composition, the global behavior of which is conveniently evaluated using synchronous imaging combined with spectral measurements. We performed an experimental study of the effects of high drive amplitude and ambient pressure on the resonant vibration profiles in electrically-driven microbridges near critical buckling. Numerical analysis of electrostatically driven post-buckled microbridges supports the richness of complex vibration dynamics that are possible in such micro-electromechanical devices.
△ Less
Submitted 31 October, 2011;
originally announced October 2011.
-
Complete intersection quiver settings with one dimensional vertices
Authors:
Dániel Joó
Abstract:
We describe the class of quiver settings with one dimensional vertices whose semi-simple representations are parametrized by a complete intersection variety. We show that these quivers can be reduced to a one vertex quiver with some combinatorial reduction steps. We also show that this class consists of the quivers from which we can not obtain two specific non complete intersection quivers via con…
▽ More
We describe the class of quiver settings with one dimensional vertices whose semi-simple representations are parametrized by a complete intersection variety. We show that these quivers can be reduced to a one vertex quiver with some combinatorial reduction steps. We also show that this class consists of the quivers from which we can not obtain two specific non complete intersection quivers via contracting strongly connected components and deleting subquivers. The class of coregular quiver settings with arbitrary dimension vector, that has been described by an earlier result via reduction steps, can also be characterized by not containing a specific subquiver in the above sense.
△ Less
Submitted 17 July, 2011; v1 submitted 16 May, 2011;
originally announced May 2011.