Search SciRate

60 results for au:Christensen_H in:cs

Show all abstracts

Enhancing Online Road Network Perception and Reasoning with Standard Definition Maps
Hengyuan Zhang, David Paz, Yuliang Guo, Arun Das, Xinyu Huang, Karsten Haug, Henrik I. Christensen, Liu Ren
Aug 06 2024 cs.CV cs.RO arXiv:2408.01471v1

@misc{2408.01471, author = {Hengyuan Zhang and David Paz and Yuliang Guo and Arun Das and Xinyu Huang and Karsten Haug and Henrik I.~Christensen and Liu Ren}, title = {{E}nhancing {O}nline {R}oad {N}etwork {P}erception and {R}easoning with {S}tandard {D}efinition {M}aps}, year = {2024}, eprint = {2408.01471}, note = {arXiv:2408.01471v1} }
PDF
Autonomous driving for urban and highway driving applications often requires High Definition (HD) maps to generate a navigation plan. Nevertheless, various challenges arise when generating and maintaining HD maps at scale. While recent online mapping methods have started to emerge, their performance especially for longer ranges is limited by heavy occlusion in dynamic environments. With these considerations in mind, our work focuses on leveraging lightweight and scalable priors-Standard Definition (SD) maps-in the development of online vectorized HD map representations. We first examine the integration of prototypical rasterized SD map representations into various online mapping architectures. Furthermore, to identify lightweight strategies, we extend the OpenLane-V2 dataset with OpenStreetMaps and evaluate the benefits of graphical SD map representations. A key finding from designing SD map integration components is that SD map encoders are model agnostic and can be quickly adapted to new architectures that utilize bird's eye view (BEV) encoders. Our results show that making use of SD maps as priors for the online mapping task can significantly speed up convergence and boost the performance of the online centerline perception task by 30% (mAP). Furthermore, we show that the introduction of the SD maps leads to a reduction of the number of parameters in the perception and reasoning task by leveraging SD map graphs while improving the overall performance. Project Page: https://henryzhangzhy.github.io/sdhdmap/.
A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning
Abdulaziz Almuzairee, Nicklas Hansen, Henrik I. Christensen
May 28 2024 cs.LG cs.CV cs.RO arXiv:2405.17416v2

@misc{2405.17416, author = {Abdulaziz Almuzairee and Nicklas Hansen and Henrik I.~Christensen}, title = {{A} {R}ecipe for {U}nbounded {D}ata {A}ugmentation in {V}isual {R}einforcement {L}earning}, year = {2024}, eprint = {2405.17416}, note = {arXiv:2405.17416v2} }
PDF
Q-learning algorithms are appealing for real-world applications due to their data-efficiency, but they are very prone to overfitting and training instabilities when trained from visual observations. Prior work, namely SVEA, finds that selective application of data augmentation can improve the visual generalization of RL agents without destabilizing training. We revisit its recipe for data augmentation, and find an assumption that limits its effectiveness to augmentations of a photometric nature. Addressing these limitations, we propose a generalized recipe, SADA, that works with wider varieties of augmentations. We benchmark its effectiveness on DMC-GB2 - our proposed extension of the popular DMControl Generalization Benchmark - as well as tasks from Meta-World and the Distracting Control Suite, and find that our method, SADA, greatly improves training stability and generalization of RL agents across a diverse set of augmentations. For visualizations, code and benchmark: see https://aalmuzairee.github.io/SADA/
Defining error accumulation in ML atmospheric simulators
Raghul Parthipan, Mohit Anand, Hannah M. Christensen, J. Scott Hosking, Damon J. Wischik
May 24 2024 cs.LG arXiv:2405.14714v1

@misc{2405.14714, author = {Raghul Parthipan and Mohit Anand and Hannah M.~Christensen and J.~Scott Hosking and Damon J.~Wischik}, title = {{D}efining error accumulation in {ML} atmospheric simulators}, year = {2024}, eprint = {2405.14714}, note = {arXiv:2405.14714v1} }
PDF
Machine learning (ML) has recently shown significant promise in modelling atmospheric systems, such as the weather. Many of these ML models are autoregressive, and error accumulation in their forecasts is a key problem. However, there is no clear definition of what `error accumulation' actually entails. In this paper, we propose a definition and an associated metric to measure it. Our definition distinguishes between errors which are due to model deficiencies, which we may hope to fix, and those due to the intrinsic properties of atmospheric systems (chaos, unobserved variables), which are not fixable. We illustrate the usefulness of this definition by proposing a simple regularization loss penalty inspired by it. This approach shows performance improvements (according to RMSE and spread/skill) in a selection of atmospheric systems, including the real-world weather prediction task.
SemVecNet: Generalizable Vector Map Generation for Arbitrary Sensor Configurations
Narayanan Elavathur Ranganatha, Hengyuan Zhang, Shashank Venkatramani, Jing-Yan Liao, Henrik I. Christensen
May 02 2024 cs.CV cs.RO arXiv:2405.00250v1

@misc{2405.00250, author = {Narayanan Elavathur Ranganatha and Hengyuan Zhang and Shashank Venkatramani and Jing-Yan Liao and Henrik I.~Christensen}, title = {{S}em{V}ec{N}et: {G}eneralizable {V}ector {M}ap {G}eneration for {A}rbitrary {S}ensor {C}onfigurations}, year = {2024}, eprint = {2405.00250}, note = {arXiv:2405.00250v1} }
PDF
Vector maps are essential in autonomous driving for tasks like localization and planning, yet their creation and maintenance are notably costly. While recent advances in online vector map generation for autonomous vehicles are promising, current models lack adaptability to different sensor configurations. They tend to overfit to specific sensor poses, leading to decreased performance and higher retraining costs. This limitation hampers their practical use in real-world applications. In response to this challenge, we propose a modular pipeline for vector map generation with improved generalization to sensor configurations. The pipeline leverages probabilistic semantic mapping to generate a bird's-eye-view (BEV) semantic map as an intermediate representation. This intermediate representation is then converted to a vector map using the MapTRv2 decoder. By adopting a BEV semantic map robust to different sensor configurations, our proposed approach significantly improves the generalization performance. We evaluate the model on datasets with sensor configurations not used during training. Our evaluation sets includes larger public datasets, and smaller scale private data collected on our platform. Our model generalizes significantly better than the state-of-the-art methods.
Robust Surgical Tool Tracking with Pixel-based Probabilities for Projected Geometric Primitives
Christopher D'Ambrosia, Florian Richter, Zih-Yun Chiu, Nikhil Shinde, Fei Liu, Henrik I. Christensen, Michael C. Yip
Mar 11 2024 cs.RO cs.CV arXiv:2403.04971v1

@misc{2403.04971, author = {Christopher D'Ambrosia and Florian Richter and Zih-Yun Chiu and Nikhil Shinde and Fei Liu and Henrik I.~Christensen and Michael C.~Yip}, title = {{R}obust {S}urgical {T}ool {T}racking with {P}ixel-based {P}robabilities for {P}rojected {G}eometric {P}rimitives}, year = {2024}, eprint = {2403.04971}, note = {arXiv:2403.04971v1} }
PDF
Controlling robotic manipulators via visual feedback requires a known coordinate frame transformation between the robot and the camera. Uncertainties in mechanical systems as well as camera calibration create errors in this coordinate frame transformation. These errors result in poor localization of robotic manipulators and create a significant challenge for applications that rely on precise interactions between manipulators and the environment. In this work, we estimate the camera-to-base transform and joint angle measurement errors for surgical robotic tools using an image based insertion-shaft detection algorithm and probabilistic models. We apply our proposed approach in both a structured environment as well as an unstructured environment and measure to demonstrate the efficacy of our methods.
Machine Learning for Stochastic Parametrisation
Hannah M. Christensen, Salah Kouhen, Greta Miller, Raghul Parthipan
Feb 16 2024 cs.LG arXiv:2402.09471v1

@misc{2402.09471, author = {Hannah M.~Christensen and Salah Kouhen and Greta Miller and Raghul Parthipan}, title = {{M}achine {L}earning for {S}tochastic {P}arametrisation}, year = {2024}, eprint = {2402.09471}, note = {arXiv:2402.09471v1} }
PDF
Atmospheric models used for weather and climate prediction are traditionally formulated in a deterministic manner. In other words, given a particular state of the resolved scale variables, the most likely forcing from the sub-grid scale processes is estimated and used to predict the evolution of the large-scale flow. However, the lack of scale-separation in the atmosphere means that this approach is a large source of error in forecasts. Over recent years, an alternative paradigm has developed: the use of stochastic techniques to characterise uncertainty in small-scale processes. These techniques are now widely used across weather, sub-seasonal, seasonal, and climate timescales. In parallel, recent years have also seen significant progress in replacing parametrisation schemes using machine learning (ML). This has the potential to both speed up and improve our numerical models. However, the focus to date has largely been on deterministic approaches. In this position paper, we bring together these two key developments, and discuss the potential for data-driven approaches for stochastic parametrisation. We highlight early studies in this area, and draw attention to the novel challenges that remain.
Household navigation and manipulation for everyday object rearrangement tasks
Shrutheesh R. Iyer, Anwesan Pal, Jiaming Hu, Akanimoh Adeleye, Aditya Aggarwal, Henrik I. Christensen
Dec 12 2023 cs.RO arXiv:2312.06129v1

@misc{2312.06129, author = {Shrutheesh R.~Iyer and Anwesan Pal and Jiaming Hu and Akanimoh Adeleye and Aditya Aggarwal and Henrik I.~Christensen}, title = {{H}ousehold navigation and manipulation for everyday object rearrangement tasks}, year = {2023}, eprint = {2312.06129}, note = {arXiv:2312.06129v1} }
PDF
We consider the problem of building an assistive robotic system that can help humans in daily household cleanup tasks. Creating such an autonomous system in real-world environments is inherently quite challenging, as a general solution may not suit the preferences of a particular customer. Moreover, such a system consists of multi-objective tasks comprising -- (i) Detection of misplaced objects and prediction of their potentially correct placements, (ii) Fine-grained manipulation for stable object grasping, and (iii) Room-to-room navigation for transferring objects in unseen environments. This work systematically tackles each component and integrates them into a complete object rearrangement pipeline. To validate our proposed system, we conduct multiple experiments on a real robotic platform involving multi-room object transfer, user preference-based placement, and complex pick-and-place tasks. Project page: https://sites.google.com/eng.ucsd.edu/home-robot
OSM vs HD Maps: Map Representations for Trajectory Prediction
Jing-Yan Liao, Parth Doshi, Zihan Zhang, David Paz, Henrik Christensen
Nov 07 2023 cs.CV cs.AI cs.RO arXiv:2311.02305v1

@misc{2311.02305, author = {Jing-Yan Liao and Parth Doshi and Zihan Zhang and David Paz and Henrik Christensen}, title = {{OSM} vs {HD} {M}aps: {M}ap {R}epresentations for {T}rajectory {P}rediction}, year = {2023}, eprint = {2311.02305}, note = {arXiv:2311.02305v1} }
PDF
While High Definition (HD) Maps have long been favored for their precise depictions of static road elements, their accessibility constraints and susceptibility to rapid environmental changes impede the widespread deployment of autonomous driving, especially in the motion forecasting task. In this context, we propose to leverage OpenStreetMap (OSM) as a promising alternative to HD Maps for long-term motion forecasting. The contributions of this work are threefold: firstly, we extend the application of OSM to long-horizon forecasting, doubling the forecasting horizon compared to previous studies. Secondly, through an expanded receptive field and the integration of intersection priors, our OSM-based approach exhibits competitive performance, narrowing the gap with HD Map-based models. Lastly, we conduct an exhaustive context-aware analysis, providing deeper insights in motion forecasting across diverse scenarios as well as conducting class-aware comparisons. This research not only advances long-term motion forecasting with coarse map representations but additionally offers a potential scalable solution within the domain of autonomous driving.
Occlusion-Aware 2D and 3D Centerline Detection for Urban Driving via Automatic Label Generation
David Paz, Narayanan E. Ranganatha, Srinidhi K. Srinivas, Yunchao Yao, Henrik I. Christensen
Nov 06 2023 cs.CV arXiv:2311.02044v1

@misc{2311.02044, author = {David Paz and Narayanan E.~Ranganatha and Srinidhi K.~Srinivas and Yunchao Yao and Henrik I.~Christensen}, title = {{O}cclusion-{A}ware 2{D} and 3{D} {C}enterline {D}etection for {U}rban {D}riving via {A}utomatic {L}abel {G}eneration}, year = {2023}, eprint = {2311.02044}, note = {arXiv:2311.02044v1} }
PDF
This research work seeks to explore and identify strategies that can determine road topology information in 2D and 3D under highly dynamic urban driving scenarios. To facilitate this exploration, we introduce a substantial dataset comprising nearly one million automatically labeled data frames. A key contribution of our research lies in developing an automatic label-generation process and an occlusion handling strategy. This strategy is designed to model a wide range of occlusion scenarios, from mild disruptions to severe blockages. Furthermore, we present a comprehensive ablation study wherein multiple centerline detection methods are developed and evaluated. This analysis not only benchmarks the performance of various approaches but also provides valuable insights into the interpretability of these methods. Finally, we demonstrate the practicality of our methods and assess their adaptability across different sensor configurations, highlighting their versatility and relevance in real-world scenarios. Our dataset and experimental models are publicly available.
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, et al (272)
Oct 16 2023 cs.RO arXiv:2310.08864v8

@misc{2310.08864, author = {Open X-Embodiment Collaboration and Abby O'Neill and Abdul Rehman and Abhinav Gupta and Abhiram Maddukuri and Abhishek Gupta and Abhishek Padalkar and Abraham Lee and Acorn Pooley and Agrim Gupta and Ajay Mandlekar and Ajinkya Jain and Albert Tung and Alex Bewley and Alex Herzog and Alex Irpan and Alexander Khazatsky and Anant Rai and Anchit Gupta and Andrew Wang and Andrey Kolobov and Anikait Singh and Animesh Garg and Aniruddha Kembhavi and Annie Xie and Anthony Brohan and Antonin Raffin and Archit Sharma and Arefeh Yavary and Arhan Jain and Ashwin Balakrishna and Ayzaan Wahid and Ben Burgess-Limerick and Beomjoon Kim and Bernhard Schölkopf and Blake Wulfe and Brian Ichter and Cewu Lu and Charles Xu and Charlotte Le and Chelsea Finn and Chen Wang and Chenfeng Xu and Cheng Chi and Chenguang Huang and Christine Chan and Christopher Agia and Chuer Pan and Chuyuan Fu and Coline Devin and Danfei Xu and Daniel Morton and Danny Driess and Daphne Chen and Deepak Pathak and Dhruv Shah and Dieter Büchler and Dinesh Jayaraman and Dmitry Kalashnikov and Dorsa Sadigh and Edward Johns and Ethan Foster and Fangchen Liu and Federico Ceola and Fei Xia and Feiyu Zhao and Felipe Vieira Frujeri and Freek Stulp and Gaoyue Zhou and Gaurav S.~Sukhatme and Gautam Salhotra and Ge Yan and Gilbert Feng and Giulio Schiavi and Glen Berseth and Gregory Kahn and Guangwen Yang and Guanzhi Wang and Hao Su and Hao-Shu Fang and Haochen Shi and Henghui Bao and Heni Ben Amor and Henrik I Christensen and Hiroki Furuta and Homanga Bharadhwaj and Homer Walke and Hongjie Fang and Huy Ha and Igor Mordatch and Ilija Radosavovic and Isabel Leal and Jacky Liang and Jad Abou-Chakra and Jaehyung Kim and Jaimyn Drake and Jan Peters and Jan Schneider and Jasmine Hsu and Jay Vakil and Jeannette Bohg and Jeffrey Bingham and Jeffrey Wu and Jensen Gao and Jiaheng Hu and Jiajun Wu and Jialin Wu and Jiankai Sun and Jianlan Luo and Jiayuan Gu and Jie Tan and Jihoon Oh and Jimmy Wu and Jingpei Lu and Jingyun Yang and Jitendra Malik and João Silvério and Joey Hejna and Jonathan Booher and Jonathan Tompson and Jonathan Yang and Jordi Salvador and Joseph J.~Lim and Junhyek Han and Kaiyuan Wang and Kanishka Rao and Karl Pertsch and Karol Hausman and Keegan Go and Keerthana Gopalakrishnan and Ken Goldberg and Kendra Byrne and Kenneth Oslund and Kento Kawaharazuka and Kevin Black and Kevin Lin and Kevin Zhang and Kiana Ehsani and Kiran Lekkala and Kirsty Ellis and Krishan Rana and Krishnan Srinivasan and Kuan Fang and Kunal Pratap Singh and Kuo-Hao Zeng and Kyle Hatch and Kyle Hsu and Laurent Itti and Lawrence Yunliang Chen and Lerrel Pinto and Li Fei-Fei and Liam Tan and Linxi "Jim" Fan and Lionel Ott and Lisa Lee and Luca Weihs and Magnum Chen and Marion Lepert and Marius Memmel and Masayoshi Tomizuka and Masha Itkina and Mateo Guaman Castro and Max Spero and Maximilian Du and Michael Ahn and Michael C.~Yip and Mingtong Zhang and Mingyu Ding and Minho Heo and Mohan Kumar Srirama and Mohit Sharma and Moo Jin Kim and Naoaki Kanazawa and Nicklas Hansen and Nicolas Heess and Nikhil J Joshi and Niko Suenderhauf and Ning Liu and Norman Di Palo and Nur Muhammad Mahi Shafiullah and Oier Mees and Oliver Kroemer and Osbert Bastani and Pannag R Sanketi and Patrick "Tree" Miller and Patrick Yin and Paul Wohlhart and Peng Xu and Peter David Fagan and Peter Mitrano and Pierre Sermanet and Pieter Abbeel and Priya Sundaresan and Qiuyu Chen and Quan Vuong and Rafael Rafailov and Ran Tian and Ria Doshi and Roberto Mart'in-Mart'in and Rohan Baijal and Rosario Scalise and Rose Hendrix and Roy Lin and Runjia Qian and Ruohan Zhang and Russell Mendonca and Rutav Shah and Ryan Hoque and Ryan Julian and Samuel Bustamante and Sean Kirmani and Sergey Levine and Shan Lin and Sherry Moore and Shikhar Bahl and Shivin Dass and Shubham Sonawani and Shubham Tulsiani and Shuran Song and Sichun Xu and Siddhant Haldar and Siddharth Karamcheti and Simeon Adebola and Simon Guist and Soroush Nasiriany and Stefan Schaal and Stefan Welker and Stephen Tian and Subramanian Ramamoorthy and Sudeep Dasari and Suneel Belkhale and Sungjae Park and Suraj Nair and Suvir Mirchandani and Takayuki Osa and Tanmay Gupta and Tatsuya Harada and Tatsuya Matsushima and Ted Xiao and Thomas Kollar and Tianhe Yu and Tianli Ding and Todor Davchev and Tony Z.~Zhao and Travis Armstrong and Trevor Darrell and Trinity Chung and Vidhi Jain and Vikash Kumar and Vincent Vanhoucke and Wei Zhan and Wenxuan Zhou and Wolfram Burgard and Xi Chen and Xiangyu Chen and Xiaolong Wang and Xinghao Zhu and Xinyang Geng and Xiyuan Liu and Xu Liangwei and Xuanlin Li and Yansong Pang and Yao Lu and Yecheng Jason Ma and Yejin Kim and Yevgen Chebotar and Yifan Zhou and Yifeng Zhu and Yilin Wu and Ying Xu and Yixuan Wang and Yonatan Bisk and Yongqiang Dou and Yoonyoung Cho and Youngwoon Lee and Yuchen Cui and Yue Cao and Yueh-Hua Wu and Yujin Tang and Yuke Zhu and Yunchu Zhang and Yunfan Jiang and Yunshuang Li and Yunzhu Li and Yusuke Iwasawa and Yutaka Matsuo and Zehan Ma and Zhuo Xu and Zichen Jeff Cui and Zichen Zhang and Zipeng Fu and Zipeng Lin}, title = {{O}pen {X}-{E}mbodiment: {R}obotic {L}earning {D}atasets and {RT}-{X} {M}odels}, year = {2023}, eprint = {2310.08864}, note = {arXiv:2310.08864v8} }
PDF
Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io.
An Experience-based TAMP Framework for Foliated Manifolds
Jiaming Hu, Shrutheesh R. Iyer, Henrik I. Christensen
Oct 13 2023 cs.RO arXiv:2310.08494v1

@misc{2310.08494, author = {Jiaming Hu and Shrutheesh R.~Iyer and Henrik I.~Christensen}, title = {{A}n {E}xperience-based {TAMP} {F}ramework for {F}oliated {M}anifolds}, year = {2023}, eprint = {2310.08494}, note = {arXiv:2310.08494v1} }
PDF
Due to their complexity, foliated structure problems often pose intricate challenges to task and motion planning in robotics manipulation. To counter this, our study presents the ``Foliated Repetition Roadmap.'' This roadmap assists task and motion planners by transforming the complex foliated structure problem into a more accessible graph format. By leveraging query experiences from different foliated manifolds, our framework can dynamically and efficiently update this graph. The refined graph can generate distribution sets, optimizing motion planning performance in foliated structure problems. In our paper, we lay down the theoretical groundwork and illustrate its practical applications through real-world examples.
Multi-Modal Planning on Regrasping for Stable Manipulation
Jiaming Hu, Zhao Tang, Henrik I. Christensen
Sep 28 2023 cs.RO arXiv:2309.15283v1

@misc{2309.15283, author = {Jiaming Hu and Zhao Tang and Henrik I.~Christensen}, title = {{M}ulti-{M}odal {P}lanning on {R}egrasping for {S}table {M}anipulation}, year = {2023}, eprint = {2309.15283}, note = {arXiv:2309.15283v1} }
PDF
Nowadays, a number of grasping algorithms have been proposed, that can predict a candidate of grasp poses, even for unseen objects. This enables a robotic manipulator to pick-and-place such objects. However, some of the predicted grasp poses to stably lift a target object may not be directly approachable due to workspace limitations. In such cases, the robot will need to re-grasp the desired object to enable successful grasping on it. This involves planning a sequence of continuous actions such as sliding, re-grasping, and transferring. To address this multi-modal problem, we propose a Markov-Decision Process-based multi-modal planner that can rearrange the object into a position suitable for stable manipulation. We demonstrate improved performance in both simulation and the real world for pick-and-place tasks.
FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory
Anwesan Pal, Sahil Wadhwa, Ayush Jaiswal, Xu Zhang, Yue Wu, Rakesh Chada, Pradeep Natarajan, Henrik I. Christensen
Aug 22 2023 cs.CV cs.CL arXiv:2308.10170v1

@misc{2308.10170, author = {Anwesan Pal and Sahil Wadhwa and Ayush Jaiswal and Xu Zhang and Yue Wu and Rakesh Chada and Pradeep Natarajan and Henrik I.~Christensen}, title = {{F}ashion{NTM}: {M}ulti-turn {F}ashion {I}mage {R}etrieval via {C}ascaded {M}emory}, year = {2023}, eprint = {2308.10170}, note = {arXiv:2308.10170v1} }
PDF
Multi-turn textual feedback-based fashion image retrieval focuses on a real-world setting, where users can iteratively provide information to refine retrieval results until they find an item that fits all their requirements. In this work, we present a novel memory-based method, called FashionNTM, for such a multi-turn system. Our framework incorporates a new Cascaded Memory Neural Turing Machine (CM-NTM) approach for implicit state management, thereby learning to integrate information across all past turns to retrieve new images, for a given turn. Unlike vanilla Neural Turing Machine (NTM), our CM-NTM operates on multiple inputs, which interact with their respective memories via individual read and write heads, to learn complex relationships. Extensive evaluation results show that our proposed method outperforms the previous state-of-the-art algorithm by 50.5%, on Multi-turn FashionIQ -- the only existing multi-turn fashion dataset currently, in addition to having a relative improvement of 12.6% on Multi-turn Shoes -- an extension of the single-turn Shoes dataset that we created in this work. Further analysis of the model in a real-world interactive setting demonstrates two important capabilities of our model -- memory retention across turns, and agnosticity to turn order for non-contradictory feedback. Finally, user study results show that images retrieved by FashionNTM were favored by 83.1% over other multi-turn models. Project page: https://sites.google.com/eng.ucsd.edu/fashionntm
3D Scene Graph Prediction on Point Clouds Using Knowledge Graphs
Yiding Qiu, Henrik I. Christensen
Aug 15 2023 cs.RO cs.CV arXiv:2308.06719v1

@misc{2308.06719, author = {Yiding Qiu and Henrik I.~Christensen}, title = {3{D} {S}cene {G}raph {P}rediction on {P}oint {C}louds {U}sing {K}nowledge {G}raphs}, year = {2023}, eprint = {2308.06719}, note = {arXiv:2308.06719v1} }
PDF
3D scene graph prediction is a task that aims to concurrently predict object classes and their relationships within a 3D environment. As these environments are primarily designed by and for humans, incorporating commonsense knowledge regarding objects and their relationships can significantly constrain and enhance the prediction of the scene graph. In this paper, we investigate the application of commonsense knowledge graphs for 3D scene graph prediction on point clouds of indoor scenes. Through experiments conducted on a real-world indoor dataset, we demonstrate that integrating external commonsense knowledge via the message-passing method leads to a 15.0 % improvement in scene graph prediction accuracy with external knowledge and $7.96\%$ with internal knowledge when compared to state-of-the-art algorithms. We also tested in the real world with 10 frames per second for scene graph generation to show the usage of the model in a more realistic robotics setting.
CLiNet: Joint Detection of Road Network Centerlines in 2D and 3D
David Paz, Srinidhi Kalgundi Srinivas, Yunchao Yao, Henrik I. Christensen
Feb 07 2023 cs.CV arXiv:2302.02259v1

@misc{2302.02259, author = {David Paz and Srinidhi Kalgundi Srinivas and Yunchao Yao and Henrik I.~Christensen}, title = {{CL}i{N}et: {J}oint {D}etection of {R}oad {N}etwork {C}enterlines in 2{D} and 3{D}}, year = {2023}, eprint = {2302.02259}, note = {arXiv:2302.02259v1} }
PDF
This work introduces a new approach for joint detection of centerlines based on image data by localizing the features jointly in 2D and 3D. In contrast to existing work that focuses on detection of visual cues, we explore feature extraction methods that are directly amenable to the urban driving task. To develop and evaluate our approach, a large urban driving dataset dubbed AV Breadcrumbs is automatically labeled by leveraging vector map representations and projective geometry to annotate over 900,000 images. Our results demonstrate potential for dynamic scene modeling across various urban driving scenarios. Our model achieves an F1 score of 0.684 and an average normalized depth error of 2.083. The code and data annotations are publicly available.
Robust Human Identity Anonymization using Pose Estimation
Hengyuan Zhang, Jing-Yan Liao, David Paz, Henrik I. Christensen
Jan 12 2023 cs.CV arXiv:2301.04243v1

@misc{2301.04243, author = {Hengyuan Zhang and Jing-Yan Liao and David Paz and Henrik I.~Christensen}, title = {{R}obust {H}uman {I}dentity {A}nonymization using {P}ose {E}stimation}, year = {2023}, eprint = {2301.04243}, howpublished = {2022 IEEE 18th International Conference on Automation Science and Engineering (CASE), Mexico City, Mexico, 2022, pp. 619-626}, doi = {10.1109/CASE49997.2022.9926568}, note = {arXiv:2301.04243v1} }
PDF
Many outdoor autonomous mobile platforms require more human identity anonymized data to power their data-driven algorithms. The human identity anonymization should be robust so that less manual intervention is needed, which remains a challenge for current face detection and anonymization systems. In this paper, we propose to use the skeleton generated from the state-of-the-art human pose estimation model to help localize human heads. We develop criteria to evaluate the performance and compare it with the face detection approach. We demonstrate that the proposed algorithm can reduce missed faces and thus better protect the identity information for the pedestrians. We also develop a confidence-based fusion method to further improve the performance.
A Real2Sim2Real Method for Robust Object Grasping with Neural Surface Reconstruction
Luobin Wang, Runlin Guo, Quan Vuong, Yuzhe Qin, Hao Su, Henrik Christensen
Oct 07 2022 cs.RO arXiv:2210.02685v3

@misc{2210.02685, author = {Luobin Wang and Runlin Guo and Quan Vuong and Yuzhe Qin and Hao Su and Henrik Christensen}, title = {{A} {R}eal2{S}im2{R}eal {M}ethod for {R}obust {O}bject {G}rasping with {N}eural {S}urface {R}econstruction}, year = {2022}, eprint = {2210.02685}, note = {arXiv:2210.02685v3} }
PDF
Recent 3D-based manipulation methods either directly predict the grasp pose using 3D neural networks, or solve the grasp pose using similar objects retrieved from shape databases. However, the former faces generalizability challenges when testing with new robot arms or unseen objects; and the latter assumes that similar objects exist in the databases. We hypothesize that recent 3D modeling methods provides a path towards building digital replica of the evaluation scene that affords physical simulation and supports robust manipulation algorithm learning. We propose to reconstruct high-quality meshes from real-world point clouds using state-of-the-art neural surface reconstruction method (the Real2Sim step). Because most simulators take meshes for fast simulation, the reconstructed meshes enable grasp pose labels generation without human efforts. The generated labels can train grasp network that performs robustly in the real evaluation scene (the Sim2Real step). In synthetic and real experiments, we show that the Real2Sim2Real pipeline performs better than baseline grasp networks trained with a large dataset and a grasp sampling method with retrieval-based reconstruction. The benefit of the Real2Sim2Real pipeline comes from 1) decoupling scene modeling and grasp sampling into sub-problems, and 2) both sub-problems can be solved with sufficiently high quality using recent 3D learning algorithms and mesh-based physical simulation techniques.
Role of reward shaping in object-goal navigation
Srirangan Madhavan, Anwesan Pal, Henrik I. Christensen
Jul 19 2022 cs.RO arXiv:2207.08021v1

@misc{2207.08021, author = {Srirangan Madhavan and Anwesan Pal and Henrik I.~Christensen}, title = {{R}ole of reward shaping in object-goal navigation}, year = {2022}, eprint = {2207.08021}, note = {arXiv:2207.08021v1} }
PDF
Deep reinforcement learning approaches have been a popular method for visual navigation tasks in the computer vision and robotics community of late. In most cases, the reward function has a binary structure, i.e., a large positive reward is provided when the agent reaches goal state, and a negative step penalty is assigned for every other state in the environment. A sparse signal like this makes the learning process challenging, specially in big environments, where a large number of sequential actions need to be taken to reach the target. We introduce a reward shaping mechanism which gradually adjusts the reward signal based on distance to the goal. Detailed experiments conducted using the AI2-THOR simulation environment demonstrate the efficacy of the proposed approach for object-goal navigation tasks.
Automatic Detection of Expressed Emotion from Five-Minute Speech Samples: Challenges and Opportunities
Bahman Mirheidari, André Bittar, Nicholas Cummins, Johnny Downs, Helen L. Fisher, Heidi Christensen
Apr 01 2022 cs.SD cs.LG eess.AS arXiv:2203.17242v1

@misc{2203.17242, author = {Bahman Mirheidari and André Bittar and Nicholas Cummins and Johnny Downs and Helen L.~Fisher and Heidi Christensen}, title = {{A}utomatic {D}etection of {E}xpressed {E}motion from {F}ive-{M}inute {S}peech {S}amples: {C}hallenges and {O}pportunities}, year = {2022}, eprint = {2203.17242}, note = {arXiv:2203.17242v1} }
PDF
We present a novel feasibility study on the automatic recognition of Expressed Emotion (EE), a family environment concept based on caregivers speaking freely about their relative/family member. We describe an automated approach for determining the \textitdegree of warmth, a key component of EE, from acoustic and text features acquired from a sample of 37 recorded interviews. These recordings, collected over 20 years ago, are derived from a nationally representative birth cohort of 2,232 British twin children and were manually coded for EE. We outline the core steps of extracting usable information from recordings with highly variable audio quality and assess the efficacy of four machine learning approaches trained with different combinations of acoustic and text features. Despite the challenges of working with this legacy data, we demonstrated that the degree of warmth can be predicted with an $F_{1}$-score of \textbf61.5\%. In this paper, we summarise our learning and provide recommendations for future work using real-world speech samples.
Using Probabilistic Machine Learning to Better Model Temporal Patterns in Parameterizations: a case study with the Lorenz 96 model
Raghul Parthipan, Hannah M. Christensen, J. Scott Hosking, Damon J. Wischik
Mar 30 2022 cs.LG physics.ao-ph arXiv:2203.14814v4

@misc{2203.14814, author = {Raghul Parthipan and Hannah M.~Christensen and J.~Scott Hosking and Damon J.~Wischik}, title = {{U}sing {P}robabilistic {M}achine {L}earning to {B}etter {M}odel {T}emporal {P}atterns in {P}arameterizations: a case study with the {L}orenz 96 model}, year = {2022}, eprint = {2203.14814}, note = {arXiv:2203.14814v4} }
PDF
The modelling of small-scale processes is a major source of error in climate models, hindering the accuracy of low-cost models which must approximate such processes through parameterization. Red noise is essential to many operational parameterization schemes, helping model temporal correlations. We show how to build on the successes of red noise by combining the known benefits of stochasticity with machine learning. This is done using a physically-informed recurrent neural network within a probabilistic framework. Our model is competitive and often superior to both a bespoke baseline and an existing probabilistic machine learning approach (GAN) when applied to the Lorenz 96 atmospheric simulation. This is due to its superior ability to model temporal patterns compared to standard first-order autoregressive schemes. It also generalises to unseen scenarios. We evaluate across a number of metrics from the literature, and also discuss the benefits of using the probabilistic metric of hold-out likelihood.
TridentNetV2: Lightweight Graphical Global Plan Representations for Dynamic Trajectory Generation
David Paz, Hao Xiang, Andrew Liang, Henrik I. Christensen
Mar 29 2022 cs.RO arXiv:2203.14019v1

@misc{2203.14019, author = {David Paz and Hao Xiang and Andrew Liang and Henrik I.~Christensen}, title = {{T}rident{N}et{V}2: {L}ightweight {G}raphical {G}lobal {P}lan {R}epresentations for {D}ynamic {T}rajectory {G}eneration}, year = {2022}, eprint = {2203.14019}, note = {arXiv:2203.14019v1} }
PDF
We present a framework for dynamic trajectory generation for autonomous navigation, which does not rely on HD maps as the underlying representation. High Definition (HD) maps have become a key component in most autonomous driving frameworks, which include complete road network information annotated at a centimeter-level that include traversable waypoints, lane information, and traffic signals. Instead, the presented approach models the distributions of feasible ego-centric trajectories in real-time given a nominal graph-based global plan and a lightweight scene representation. By embedding contextual information, such as crosswalks, stop signs, and traffic signals, our approach achieves low errors across multiple urban navigation datasets that include diverse intersection maneuvers, while maintaining real-time performance and reducing network complexity. Underlying datasets introduced are available online.
Single RGB-D Camera Teleoperation for General Robotic Manipulation
Quan Vuong, Yuzhe Qin, Runlin Guo, Xiaolong Wang, Hao Su, Henrik Christensen
Jun 29 2021 cs.RO cs.AI arXiv:2106.14396v3

@misc{2106.14396, author = {Quan Vuong and Yuzhe Qin and Runlin Guo and Xiaolong Wang and Hao Su and Henrik Christensen}, title = {{S}ingle {RGB}-{D} {C}amera {T}eleoperation for {G}eneral {R}obotic {M}anipulation}, year = {2021}, eprint = {2106.14396}, note = {arXiv:2106.14396v3} }
PDF
We propose a teleoperation system that uses a single RGB-D camera as the human motion capture device. Our system can perform general manipulation tasks such as cloth folding, hammering and 3mm clearance peg in hole. We propose the use of non-Cartesian oblique coordinate frame, dynamic motion scaling and reposition of operator frames to increase the flexibility of our teleoperation system. We hypothesize that lowering the barrier of entry to teleoperation will allow for wider deployment of supervised autonomy system, which will in turn generates realistic datasets that unlock the potential of machine learning for robotic manipulation. Demo of our systems are available online https://sites.google.com/view/manipulation-teleop-with-rgbd
Lessons Learned Developing an Assembly System for WRS 2020 Assembly Challenge
Aayush Naik, Priyam Parashar, Jiaming Hu, Henrik I. Christensen
Mar 30 2021 cs.RO arXiv:2103.15236v1

@misc{2103.15236, author = {Aayush Naik and Priyam Parashar and Jiaming Hu and Henrik I.~Christensen}, title = {{L}essons {L}earned {D}eveloping an {A}ssembly {S}ystem for {WRS} 2020 {A}ssembly {C}hallenge}, year = {2021}, eprint = {2103.15236}, note = {arXiv:2103.15236v1} }
PDF
The World Robot Summit (WRS) 2020 Assembly Challenge is designed to allow teams to demonstrate how one can build flexible, robust systems for assembly of machined objects. We present our approach to assembly based on integration of machine vision, robust planning and execution using behavior trees and a hierarchy of recovery strategies to ensure robust operation. Our system was selected for the WRS 2020 Assembly Challenge finals based on robust performance in the qualifying rounds. We present the systems approach adopted for the challenge.
Meta-Modeling of Assembly Contingencies and Planning for Repair
Priyam Parashar, Aayush Naik, Jiaming Hu, Henrik I. Christensen
Mar 16 2021 cs.RO cs.AI arXiv:2103.07544v1

@misc{2103.07544, author = {Priyam Parashar and Aayush Naik and Jiaming Hu and Henrik I.~Christensen}, title = {{M}eta-{M}odeling of {A}ssembly {C}ontingencies and {P}lanning for {R}epair}, year = {2021}, eprint = {2103.07544}, note = {arXiv:2103.07544v1} }
PDF
The World Robotics Challenge (2018 & 2020) was designed to challenge teams to design systems that are easy to adapt to new tasks and to ensure robust operation in a semi-structured environment. We present a layered strategy to transform missions into tasks and actions and provide a set of strategies to address simple and complex failures. We propose a model for characterizing failures using this model and discuss repairs. Simple failures are by far the most common in our WRC system and we also present how we repaired them.
TridentNet: A Conditional Generative Model for Dynamic Trajectory Generation
David Paz, Hengyuan Zhang, Henrik I. Christensen
Jan 19 2021 cs.RO arXiv:2101.06374v4

@misc{2101.06374, author = {David Paz and Hengyuan Zhang and Henrik I.~Christensen}, title = {{T}rident{N}et: {A} {C}onditional {G}enerative {M}odel for {D}ynamic {T}rajectory {G}eneration}, year = {2021}, eprint = {2101.06374}, note = {arXiv:2101.06374v4} }
PDF
In recent years, various state of the art autonomous vehicle systems and architectures have been introduced. These methods include planners that depend on high-definition (HD) maps and models that learn an autonomous agent's controls in an end-to-end fashion. While end-to-end models are geared towards solving the scalability constraints from HD maps, they do not generalize for different vehicles and sensor configurations. To address these shortcomings, we introduce an approach that leverages lightweight map representations, explicitly enforcing geometric constraints, and learns feasible trajectories using a conditional generative model. Additional contributions include a new dataset that is used to verify our proposed models quantitatively. The results indicate low relative errors that can potentially translate to traversable trajectories. The dataset created as part of this work has been made available online.
Robotics Enabling the Workforce
Henrik Christensen, Maria Gini, Odest Chadwicke Jenkins, Holly Yanco
Dec 18 2020 cs.RO cs.CY arXiv:2012.09309v1

@misc{2012.09309, author = {Henrik Christensen and Maria Gini and Odest Chadwicke Jenkins and Holly Yanco}, title = {{R}obotics {E}nabling the {W}orkforce}, year = {2020}, eprint = {2012.09309}, note = {arXiv:2012.09309v1} }
PDF
Robotics has the potential to magnify the skilled workforce of the nation by complementing our workforce with automation: teams of people and robots will be able to do more than either could alone. The economic engine of the U.S. runs on the productivity of our people. The rise of automation offers new opportunities to enhance the work of our citizens and drive the innovation and prosperity of our industries. Most critically, we need research to understand how future robot technologies can best complement our workforce to get the best of both human and automated labor in a collaborative team. Investments made in robotics research and workforce development will lead to increased GDP, an increased export-import ratio, a growing middle class of skilled workers, and a U.S.-based supply chain that can withstand global pandemics and other disruptions. In order to make the United States a leader in robotics, we need to invest in basic research, technology development, K-16 education, and lifelong learning.
Pose Estimation of Specular and Symmetrical Objects
Jiaming Hu, Hongyi Ling, Priyam Parashar, Aayush Naik, Henrik Christensen
Nov 03 2020 cs.RO cs.CV arXiv:2011.00372v1

@misc{2011.00372, author = {Jiaming Hu and Hongyi Ling and Priyam Parashar and Aayush Naik and Henrik Christensen}, title = {{P}ose {E}stimation of {S}pecular and {S}ymmetrical {O}bjects}, year = {2020}, eprint = {2011.00372}, note = {arXiv:2011.00372v1} }
PDF
In the robotic industry, specular and textureless metallic components are ubiquitous. The 6D pose estimation of such objects with only a monocular RGB camera is difficult because of the absence of rich texture features. Furthermore, the appearance of specularity heavily depends on the camera viewpoint and environmental light conditions making traditional methods, like template matching, fail. In the last 30 years, pose estimation of the specular object has been a consistent challenge, and most related works require massive knowledge modeling effort for light setups, environment, or the object surface. On the other hand, recent works exhibit the feasibility of 6D pose estimation on a monocular camera with convolutional neural networks(CNNs) however they mostly use opaque objects for evaluation. This paper provides a data-driven solution to estimate the 6D pose of specular objects for grasping them, proposes a cost function for handling symmetry, and demonstrates experimental results showing the system's feasibility.
Auto-calibration Method Using Stop Signs for Urban Autonomous Driving Applications
Yunhai Han, Yuhan Liu, David Paz, Henrik Christensen
Oct 16 2020 cs.CV cs.RO arXiv:2010.07441v2

@misc{2010.07441, author = {Yunhai Han and Yuhan Liu and David Paz and Henrik Christensen}, title = {{A}uto-calibration {M}ethod {U}sing {S}top {S}igns for {U}rban {A}utonomous {D}riving {A}pplications}, year = {2020}, eprint = {2010.07441}, note = {arXiv:2010.07441v2} }
PDF
Calibration of sensors is fundamental to robust performance for intelligent vehicles. In natural environments, disturbances can easily challenge calibration. One possibility is to use natural objects of known shape to recalibrate sensors. An approach based on recognition of traffic signs, such as stop signs, and use of them for recalibration of cameras is presented. The approach is based on detection, geometry estimation, calibration, and recursive updating. Results from natural environments are presented that clearly show convergence and improved performance.
Probabilistic Semantic Mapping for Urban Autonomous Driving Applications
David Paz, Hengyuan Zhang, Qinru Li, Hao Xiang, Henrik Christensen
Jun 11 2020 cs.CV arXiv:2006.04894v2

@misc{2006.04894, author = {David Paz and Hengyuan Zhang and Qinru Li and Hao Xiang and Henrik Christensen}, title = {{P}robabilistic {S}emantic {M}apping for {U}rban {A}utonomous {D}riving {A}pplications}, year = {2020}, eprint = {2006.04894}, note = {arXiv:2006.04894v2} }
PDF
Recent advancements in statistical learning and computational abilities have enabled autonomous vehicle technology to develop at a much faster rate. While many of the architectures previously introduced are capable of operating under highly dynamic environments, many of these are constrained to smaller-scale deployments, require constant maintenance due to the associated scalability cost with high-definition (HD) maps, and involve tedious manual labeling. As an attempt to tackle this problem, we propose to fuse image and pre-built point cloud map information to perform automatic and accurate labeling of static landmarks such as roads, sidewalks, crosswalks, and lanes. The method performs semantic segmentation on 2D images, associates the semantic labels with point cloud maps to accurately localize them in the world, and leverages the confusion matrix formulation to construct a probabilistic semantic map in bird's eye view from semantic point clouds. Experiments from data collected in an urban environment show that this model is able to predict most road features and can be extended for automatically incorporating road features into HD maps with potential future work directions.
Autonomous Vehicle Benchmarking using Unbiased Metrics
David Paz, Po-jung Lai, Nathan Chan, Yuqing Jiang, Henrik I. Christensen
Jun 05 2020 cs.RO arXiv:2006.02518v2

@misc{2006.02518, author = {David Paz and Po-jung Lai and Nathan Chan and Yuqing Jiang and Henrik I.~Christensen}, title = {{A}utonomous {V}ehicle {B}enchmarking using {U}nbiased {M}etrics}, year = {2020}, eprint = {2006.02518}, note = {arXiv:2006.02518v2} }
PDF
With the recent development of autonomous vehicle technology, there have been active efforts on the deployment of this technology at different scales that include urban and highway driving. While many of the prototypes showcased have been shown to operate under specific cases, little effort has been made to better understand their shortcomings and generalizability to new areas. Distance, uptime and number of manual disengagements performed during autonomous driving provide a high-level idea on the performance of an autonomous system but without proper data normalization, testing location information, and the number of vehicles involved in testing, the disengagement reports alone do not fully encompass system performance and robustness. Thus, in this study a complete set of metrics are applied for benchmarking autonomous vehicle systems in a variety of scenarios that can be extended for comparison with human drivers and other autonomous vehicle systems. These metrics have been used to benchmark UC San Diego's autonomous vehicle platforms during early deployments for micro-transit and autonomous mail delivery applications.
Data augmentation using generative networks to identify dementia
Bahman Mirheidari, Yilin Pan, Daniel Blackburn, Ronan O'Malley, Traci Walker, Annalena Venneri, Markus Reuber, Heidi Christensen
Apr 14 2020 eess.AS cs.CL cs.LG cs.SD arXiv:2004.05989v1

@misc{2004.05989, author = {Bahman Mirheidari and Yilin Pan and Daniel Blackburn and Ronan O'Malley and Traci Walker and Annalena Venneri and Markus Reuber and Heidi Christensen}, title = {{D}ata augmentation using generative networks to identify dementia}, year = {2020}, eprint = {2004.05989}, note = {arXiv:2004.05989v1} }
PDF
Data limitation is one of the most common issues in training machine learning classifiers for medical applications. Due to ethical concerns and data privacy, the number of people that can be recruited to such experiments is generally smaller than the number of participants contributing to non-healthcare datasets. Recent research showed that generative models can be used as an effective approach for data augmentation, which can ultimately help to train more robust classifiers sparse data domains. A number of studies proved that this data augmentation technique works for image and audio data sets. In this paper, we investigate the application of a similar approach to different types of speech and audio-based features extracted from interactions recorded with our automatic dementia detection system. Using two generative models we show how the generated synthesized samples can improve the performance of a DNN based classifier. The variational autoencoder increased the F-score of a four-way classifier distinguishing the typical patient groups seen in memory clinics from 58% to around 74%, a 16% improvement
Learning hierarchical relationships for object-goal navigation
Yiding Qiu, Anwesan Pal, Henrik I. Christensen
Mar 17 2020 cs.RO cs.CV cs.LG arXiv:2003.06749v2

@misc{2003.06749, author = {Yiding Qiu and Anwesan Pal and Henrik I.~Christensen}, title = {{L}earning hierarchical relationships for object-goal navigation}, year = {2020}, eprint = {2003.06749}, note = {arXiv:2003.06749v2} }
PDF
Direct search for objects as part of navigation poses a challenge for small items. Utilizing context in the form of object-object relationships enable hierarchical search for targets efficiently. Most of the current approaches tend to directly incorporate sensory input into a reward-based learning approach, without learning about object relationships in the natural environment, and thus generalize poorly across domains. We present Memory-utilized Joint hierarchical Object Learning for Navigation in Indoor Rooms (MJOLNIR), a target-driven navigation algorithm, which considers the inherent relationship between target objects, and the more salient contextual objects occurring in its surrounding. Extensive experiments conducted across multiple environment settings show an $82.9\%$ and $93.5\%$ gain over existing state-of-the-art navigation methods in terms of the success rate (SR), and success weighted by path length (SPL), respectively. We also show that our model learns to converge much faster than other algorithms, without suffering from the well-known overfitting problem. Additional details regarding the supplementary material and code are available at https://sites.google.com/eng.ucsd.edu/mjolnir.
Looking at the right stuff: Guided semantic-gaze for autonomous driving
Anwesan Pal, Sayan Mondal, Henrik I. Christensen
Nov 26 2019 cs.CV cs.RO arXiv:1911.10455v2

@misc{1911.10455, author = {Anwesan Pal and Sayan Mondal and Henrik I.~Christensen}, title = {{L}ooking at the right stuff: {G}uided semantic-gaze for autonomous driving}, year = {2019}, eprint = {1911.10455}, note = {arXiv:1911.10455v2} }
PDF
In recent years, predicting driver's focus of attention has been a very active area of research in the autonomous driving community. Unfortunately, existing state-of-the-art techniques achieve this by relying only on human gaze information, thereby ignoring scene semantics. We propose a novel Semantics Augmented GazE (SAGE) detection approach that captures driving specific contextual information, in addition to the raw gaze. Such a combined attention mechanism serves as a powerful tool to focus on the relevant regions in an image frame in order to make driving both safe and efficient. Using this, we design a complete saliency prediction framework - SAGE-Net, which modifies the initial prediction from SAGE by taking into account vital aspects such as distance to objects (depth), ego vehicle speed, and pedestrian crossing intent. Exhaustive experiments conducted through four popular saliency algorithms show that on $\mathbf{49/56\text{ }(87.5\%)}$ cases - considering both the overall dataset and crucial driving scenarios, SAGE outperforms existing techniques without any additional computational overhead during the training process. The augmented dataset along with the relevant code are available as part of the supplementary material.
Detecting Alzheimer's Disease by estimating attention and elicitation path through the alignment of spoken picture descriptions with the picture prompt
Bahman Mirheidari, Yilin Pan, Traci Walker, Markus Reuber, Annalena Venneri, Daniel Blackburn, Heidi Christensen
Oct 02 2019 cs.CL cs.LG arXiv:1910.00515v1

@misc{1910.00515, author = {Bahman Mirheidari and Yilin Pan and Traci Walker and Markus Reuber and Annalena Venneri and Daniel Blackburn and Heidi Christensen}, title = {{D}etecting {A}lzheimer's {D}isease by estimating attention and elicitation path through the alignment of spoken picture descriptions with the picture prompt}, year = {2019}, eprint = {1910.00515}, note = {arXiv:1910.00515v1} }
PDF
Cognitive decline is a sign of Alzheimer's disease (AD), and there is evidence that tracking a person's eye movement, using eye tracking devices, can be used for the automatic identification of early signs of cognitive decline. However, such devices are expensive and may not be easy-to-use for people with cognitive problems. In this paper, we present a new way of capturing similar visual features, by using the speech of people describing the Cookie Theft picture - a common cognitive testing task - to identify regions in the picture prompt that will have caught the speaker's attention and elicited their speech. After aligning the automatically recognised words with different regions of the picture prompt, we extract information inspired by eye tracking metrics such as coordinates of the area of interests (AOI)s, time spent in AOI, time to reach the AOI, and the number of AOI visits. Using the DementiaBank dataset we train a binary classifier (AD vs. healthy control) using 10-fold cross-validation and achieve an 80% F1-score using the timing information from the forced alignments of the automatic speech recogniser (ASR); this achieved around 72% using the timing information from the ASR outputs.
Multi-task Batch Reinforcement Learning with Metric Learning
Jiachen Li, Quan Vuong, Shuang Liu, Minghua Liu, Kamil Ciosek, Keith Ross, Henrik Iskov Christensen, Hao Su
Sep 26 2019 cs.LG cs.AI stat.ML arXiv:1909.11373v6

@misc{1909.11373, author = {Jiachen Li and Quan Vuong and Shuang Liu and Minghua Liu and Kamil Ciosek and Keith Ross and Henrik Iskov Christensen and Hao Su}, title = {{M}ulti-task {B}atch {R}einforcement {L}earning with {M}etric {L}earning}, year = {2019}, eprint = {1909.11373}, note = {arXiv:1909.11373v6} }
PDF
We tackle the Multi-task Batch Reinforcement Learning problem. Given multiple datasets collected from different tasks, we train a multi-task policy to perform well in unseen tasks sampled from the same distribution. The task identities of the unseen tasks are not provided. To perform well, the policy must infer the task identity from collected transitions by modelling its dependency on states, actions and rewards. Because the different datasets may have state-action distributions with large divergence, the task inference module can learn to ignore the rewards and spuriously correlate $\textit{only}$ state-action pairs to the task identity, leading to poor test time performance. To robustify task inference, we propose a novel application of the triplet loss. To mine hard negative examples, we relabel the transitions from the training tasks by approximating their reward functions. When we allow further training on the unseen tasks, using the trained policy as an initialization leads to significantly faster convergence compared to randomly initialized policies (up to $80\%$ improvement and across 5 different Mujoco task distributions). We name our method $\textbf{MBML}$ ($\textbf{M}\text{ulti-task}$ $\textbf{B}\text{atch}$ RL with $\textbf{M}\text{etric}$ $\textbf{L}\text{earning}$).
Machine Learning for Stochastic Parameterization: Generative Adversarial Networks in the Lorenz '96 Model
David John Gagne II, Hannah M. Christensen, Aneesh C. Subramanian, Adam H. Monahan
Sep 12 2019 physics.ao-ph cs.LG nlin.CD stat.ML arXiv:1909.04711v1

@misc{1909.04711, author = {David John Gagne II and Hannah M.~Christensen and Aneesh C.~Subramanian and Adam H.~Monahan}, title = {{M}achine {L}earning for {S}tochastic {P}arameterization: {G}enerative {A}dversarial {N}etworks in the {L}orenz '96 {M}odel}, year = {2019}, eprint = {1909.04711}, note = {arXiv:1909.04711v1} }
PDF
Stochastic parameterizations account for uncertainty in the representation of unresolved sub-grid processes by sampling from the distribution of possible sub-grid forcings. Some existing stochastic parameterizations utilize data-driven approaches to characterize uncertainty, but these approaches require significant structural assumptions that can limit their scalability. Machine learning models, including neural networks, are able to represent a wide range of distributions and build optimized mappings between a large number of inputs and sub-grid forcings. Recent research on machine learning parameterizations has focused only on deterministic parameterizations. In this study, we develop a stochastic parameterization using the generative adversarial network (GAN) machine learning framework. The GAN stochastic parameterization is trained and evaluated on output from the Lorenz '96 model, which is a common baseline model for evaluating both parameterization and data assimilation techniques. We evaluate different ways of characterizing the input noise for the model and perform model runs with the GAN parameterization at weather and climate timescales. Some of the GAN configurations perform better than a baseline bespoke parameterization at both timescales, and the networks closely reproduce the spatio-temporal correlations and regimes of the Lorenz '96 system. We also find that in general those models which produce skillful forecasts are also associated with the best climate simulations.
Network Reconnaissance and Vulnerability Excavation of Secure DDS Systems
Ruffin White, Gianluca Caiazza, Chenxu Jiang, Xinyue Ou, Zhiyue Yang, Agostino Cortesi, Henrik Christensen
Aug 16 2019 cs.CR cs.NI arXiv:1908.05310v1

@misc{1908.05310, author = {Ruffin White and Gianluca Caiazza and Chenxu Jiang and Xinyue Ou and Zhiyue Yang and Agostino Cortesi and Henrik Christensen}, title = {{N}etwork {R}econnaissance and {V}ulnerability {E}xcavation of {S}ecure {DDS} {S}ystems}, year = {2019}, eprint = {1908.05310}, howpublished = {Workshop on Software Security for Internet of Things (SSIoT) at IEEE EuroS&P 2019}, note = {arXiv:1908.05310v1} }
PDF
Distribution Service (DDS) is a realtime peer-to-peer protocol that serves as a scalable middleware between distributed networked systems found in many Industrial IoT domains such as automotive, medical, energy, and defense. Since the initial ratification of the standard, specifications have introduced a Security Model and Service Plugin Interface (SPI) architecture, facilitating authenticated encryption and data centric access control while preserving interoperable data exchange. However, as Secure DDS v1.1, the default plugin specifications presently exchanges digitally signed capability lists of both participants in the clear during the crypto handshake for permission attestation; thus breaching confidentiality of the context of the connection. In this work, we present an attacker model that makes use of network reconnaissance afforded by this leaked context in conjunction with formal verification and model checking to arbitrarily reason about the underlying topology and reachability of information flow, enabling targeted attacks such as selective denial of service, adversarial partitioning of the data bus, or vulnerability excavation of vendor implementations.
DEDUCE: Diverse scEne Detection methods in Unseen Challenging Environments
Anwesan Pal, Carlos Nieto-Granda, Henrik I. Christensen
Aug 02 2019 cs.RO cs.CV arXiv:1908.00191v1

@misc{1908.00191, author = {Anwesan Pal and Carlos Nieto-Granda and Henrik I.~Christensen}, title = {{DEDUCE}: {D}iverse sc{E}ne {D}etection methods in {U}nseen {C}hallenging {E}nvironments}, year = {2019}, eprint = {1908.00191}, note = {arXiv:1908.00191v1} }
PDF
In recent years, there has been a rapid increase in the number of service robots deployed for aiding people in their daily activities. Unfortunately, most of these robots require human input for training in order to do tasks in indoor environments. Successful domestic navigation often requires access to semantic information about the environment, which can be learned without human guidance. In this paper, we propose a set of DEDUCE - Diverse scEne Detection methods in Unseen Challenging Environments algorithms which incorporate deep fusion models derived from scene recognition systems and object detectors. The five methods described here have been evaluated on several popular recent image datasets, as well as real-world videos acquired through multiple mobile platforms. The final results show an improvement over the existing state-of-the-art visual place recognition systems.
How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?
Quan Vuong, Sharad Vikram, Hao Su, Sicun Gao, Henrik I. Christensen
Mar 29 2019 cs.LG cs.AI stat.ML arXiv:1903.11774v1

@misc{1903.11774, author = {Quan Vuong and Sharad Vikram and Hao Su and Sicun Gao and Henrik I.~Christensen}, title = {{H}ow to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?}, year = {2019}, eprint = {1903.11774}, note = {arXiv:1903.11774v1} }
PDF
Recently, reinforcement learning (RL) algorithms have demonstrated remarkable success in learning complicated behaviors from minimally processed input. However, most of this success is limited to simulation. While there are promising successes in applying RL algorithms directly on real systems, their performance on more complex systems remains bottle-necked by the relative data inefficiency of RL algorithms. Domain randomization is a promising direction of research that has demonstrated impressive results using RL algorithms to control real robots. At a high level, domain randomization works by training a policy on a distribution of environmental conditions in simulation. If the environments are diverse enough, then the policy trained on this distribution will plausibly generalize to the real world. A human-specified design choice in domain randomization is the form and parameters of the distribution of simulated environments. It is unclear how to the best pick the form and parameters of this distribution and prior work uses hand-tuned distributions. This extended abstract demonstrates that the choice of the distribution plays a major role in the performance of the trained policies in the real world and that the parameter of this distribution can be optimized to maximize the performance of the trained policies in the real world
The relationship between linguistic expression and symptoms of depression, anxiety, and suicidal thoughts: A longitudinal study of blog content
B. ODea, T.W. Boonstra, M.E. Larsen, T. Nguyen, S. Venkatesh, H. Christensen
Nov 08 2018 cs.CL stat.AP arXiv:1811.02750v1

@misc{1811.02750, author = {B.~ODea and T.W.~Boonstra and M.E.~Larsen and T.~Nguyen and S.~Venkatesh and H.~Christensen}, title = {{T}he relationship between linguistic expression and symptoms of depression, anxiety, and suicidal thoughts: {A} longitudinal study of blog content}, year = {2018}, eprint = {1811.02750}, note = {arXiv:1811.02750v1} }
PDF
Due to its popularity and availability, social media data may present a new way to identify individuals who are experiencing mental illness. By analysing blog content, this study aimed to investigate the associations between linguistic features and symptoms of depression, generalised anxiety, and suicidal ideation. This study utilised a longitudinal study design. Individuals who blogged were invited to participate in a study in which they completed fortnightly mental health questionnaires including the PHQ9 and GAD7 for a period of 36 weeks. Linguistic features were extracted from blog data using the LIWC tool. Bivariate and multivariate analyses were performed to investigate the correlations between the linguistic features and mental health scores between subjects. We then used the multivariate regression model to predict longitudinal changes in mood within subjects. A total of 153 participants consented to taking part, with 38 participants completing the required number of questionnaires and blog posts during the study period. Between-subject analysis revealed that several linguistic features, including tentativeness and non-fluencies, were significantly associated with depression and anxiety symptoms, but not suicidal thoughts. Within-subject analysis showed no robust correlations between linguistic features and changes in mental health score. This study provides further support for the relationship between linguistic features within social media data and symptoms of depression and anxiety. The lack of robust within-subject correlations indicate that the relationship observed at the group level may not generalise to individual changes over time.
Procedurally Provisioned Access Control for Robotic Systems
Ruffin White, Gianluca Caiazza, Henrik I. Christensen, Agostino Cortesi
Oct 19 2018 cs.RO cs.CR arXiv:1810.08125v1

@misc{1810.08125, author = {Ruffin White and Gianluca Caiazza and Henrik I.~Christensen and Agostino Cortesi}, title = {{P}rocedurally {P}rovisioned {A}ccess {C}ontrol for {R}obotic {S}ystems}, year = {2018}, eprint = {1810.08125}, howpublished = {2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, note = {arXiv:1810.08125v1} }
PDF
Security of robotics systems, as well as of the related middleware infrastructures, is a critical issue for industrial and domestic IoT, and it needs to be continuously assessed throughout the whole development lifecycle. The next generation open source robotic software stack, ROS2, is now targeting support for Secure DDS, providing the community with valuable tools for secure real world robotic deployments. In this work, we introduce a framework for procedural provisioning access control policies for robotic software, as well as for verifying the compliance of generated transport artifacts and decision point implementations.
Purely Geometric Scene Association and Retrieval - A Case for Macro Scale 3D Geometry
Rahul Sawhney, Fuxin Li, Henrik I. Christensen, Charles L. Isbell
Aug 07 2018 cs.CV cs.RO arXiv:1808.01343v1

@misc{1808.01343, author = {Rahul Sawhney and Fuxin Li and Henrik I.~Christensen and Charles L.~Isbell}, title = {{P}urely {G}eometric {S}cene {A}ssociation and {R}etrieval - {A} {C}ase for {M}acro {S}cale 3{D} {G}eometry}, year = {2018}, eprint = {1808.01343}, note = {arXiv:1808.01343v1} }
PDF
We address the problems of measuring geometric similarity between 3D scenes, represented through point clouds or range data frames, and associating them. Our approach leverages macro-scale 3D structural geometry - the relative configuration of arbitrary surfaces and relationships among structures that are potentially far apart. We express such discriminative information in a viewpoint-invariant feature space. These are subsequently encoded in a frame-level signature that can be utilized to measure geometric similarity. Such a characterization is robust to noise, incomplete and partially overlapping data besides viewpoint changes. We show how it can be employed to select a diverse set of data frames which have structurally similar content, and how to validate whether views with similar geometric content are from the same scene. The problem is formulated as one of general purpose retrieval from an unannotated, spatio-temporally unordered database. Empirical analysis indicates that the presented approach thoroughly outperforms baselines on depth / range data. Its depth-only performance is competitive with state-of-the-art approaches with RGB or RGB-D inputs, including ones based on deep learning. Experiments show retrieval performance to hold up well with much sparser databases, which is indicative of the approach's robustness. The approach generalized well - it did not require dataset specific training, and scaled up in our experiments. Finally, we also demonstrate how geometrically diverse selection of views can result in richer 3D reconstructions.
Using mobile phone sensor technology for mental health research: Integrated analysis to identify hidden challenges and potential solutions
Tjeerd W Boonstra, Jennifer Nicholas, Quincy JJ Wong, Frances Shaw, Samuel Townsend, Helen Christensen
May 24 2018 cs.CY arXiv:1805.09158v2

@misc{1805.09158, author = {Tjeerd W Boonstra and Jennifer Nicholas and Quincy JJ Wong and Frances Shaw and Samuel Townsend and Helen Christensen}, title = {{U}sing mobile phone sensor technology for mental health research: {I}ntegrated analysis to identify hidden challenges and potential solutions}, year = {2018}, eprint = {1805.09158}, note = {arXiv:1805.09158v2} }
PDF
Background: Mobile phone sensor technology has great potential in providing behavioral markers of mental health. However, this promise has not yet been brought to fruition. Objective: The objective of our study was to examine challenges involved in developing an app to extract behavioral markers of mental health from passive sensor data. Methods: Both technical challenges and acceptability of passive data collection for mental health research were assessed based on literature review and results obtained from a feasibility study. Socialise, a mobile phone app developed at the Black Dog Institute, was used to collect sensor data (Bluetooth, global positioning system, and battery status) and investigate views and experiences of a group of people with lived experience of mental health challenges (N=32). Results: On average, sensor data were obtained for 55% (Android) and 45% (iPhone OS) of scheduled scans. Battery life was reduced from 21.3 hours to 18.8 hours when scanning every 5 minutes with a reduction of 2.5 hours or 12%. Despite this relatively small reduction, most participants reported that the app had a noticeable effect on their battery life. In addition to battery life, the purpose of data collection, trust in the organization that collects data, and perceived impact on privacy were identified as main factors for acceptability. Conclusions: Based on the findings of the feasibility study and literature review, we recommend a commitment to open science and transparent reporting and stronger partnerships and communication with users. Sensing technology has the potential to greatly enhance the delivery and impact of mental health care. Realizing this requires all aspects of mobile phone sensor technology to be rigorously assessed.
Efficient Hierarchical Graph-Based Segmentation of RGBD Videos
Steven Hickson, Stan Birchfield, Irfan Essa, Henrik Christensen
Jan 30 2018 cs.CV arXiv:1801.08981v1

@misc{1801.08981, author = {Steven Hickson and Stan Birchfield and Irfan Essa and Henrik Christensen}, title = {{E}fficient {H}ierarchical {G}raph-{B}ased {S}egmentation of {RGBD} {V}ideos}, year = {2018}, eprint = {1801.08981}, doi = {10.1109/CVPR.2014.51}, note = {arXiv:1801.08981v1} }
PDF
We present an efficient and scalable algorithm for segmenting 3D RGBD point clouds by combining depth, color, and temporal information using a multistage, hierarchical graph-based approach. Our algorithm processes a moving window over several point clouds to group similar regions over a graph, resulting in an initial over-segmentation. These regions are then merged to yield a dendrogram using agglomerative clustering via a minimum spanning tree algorithm. Bipartite graph matching at a given level of the hierarchical tree yields the final segmentation of the point clouds by maintaining region identities over arbitrarily long periods of time. We show that a multistage segmentation with depth then color yields better results than a linear combination of depth and color. Due to its incremental processing, our algorithm can process videos of any length and in a streaming pipeline. The algorithm's ability to produce robust, efficient segmentation is demonstrated with numerous experimental results on challenging sequences from our own as well as public RGBD data sets.
Context Aware Robot Navigation using Interactively Built Semantic Maps
Akansel Cosgun, Henrik Christensen
Oct 25 2017 cs.RO arXiv:1710.08682v2

@misc{1710.08682, author = {Akansel Cosgun and Henrik Christensen}, title = {{C}ontext {A}ware {R}obot {N}avigation using {I}nteractively {B}uilt {S}emantic {M}aps}, year = {2017}, eprint = {1710.08682}, note = {arXiv:1710.08682v2} }
PDF
We discuss the process of building semantic maps, how to interactively label entities in them, and how to use them to enable context-aware navigation behaviors in human environments. We utilize planar surfaces, such as walls and tables, and static objects, such as door signs, as features for our semantic mapping approach. Users can interactively annotate these features by having the robot follow him/her, entering the label through a mobile app, and performing a pointing gesture toward the landmark of interest. Our gesture based approach can reliably estimate which object is being pointed at and detect ambiguous gestures with probabilistic modeling. Our person following method attempts to maximize future utility by a search for future actions assuming constant velocity model for the human. We describe a method to extract metric goals from a semantic map landmark and to plan a human aware path that takes into account the personal spaces of people. Finally, we demonstrate context-awareness for person following in two scenarios: interactive labeling and door passing. We believe that future navigation approaches and service robotics applications can be made more effective by further exploiting the structure of human environments.
Semantic Instance Labeling Leveraging Hierarchical Segmentation
Steven Hickson, Irfan Essa, Henrik Christensen
Aug 04 2017 cs.CV arXiv:1708.00946v1

@misc{1708.00946, author = {Steven Hickson and Irfan Essa and Henrik Christensen}, title = {{S}emantic {I}nstance {L}abeling {L}everaging {H}ierarchical {S}egmentation}, year = {2017}, eprint = {1708.00946}, note = {arXiv:1708.00946v1} }
PDF
Most of the approaches for indoor RGBD semantic la- beling focus on using pixels or superpixels to train a classi- fier. In this paper, we implement a higher level segmentation using a hierarchy of superpixels to obtain a better segmen- tation for training our classifier. By focusing on meaningful segments that conform more directly to objects, regardless of size, we train a random forest of decision trees as a clas- sifier using simple features such as the 3D size, LAB color histogram, width, height, and shape as specified by a his- togram of surface normals. We test our method on the NYU V2 depth dataset, a challenging dataset of cluttered indoor environments. Our experiments using the NYU V2 depth dataset show that our method achieves state of the art re- sults on both a general semantic labeling introduced by the dataset (floor, structure, furniture, and objects) and a more object specific semantic labeling. We show that training a classifier on a segmentation from a hierarchy of super pixels yields better results than training directly on super pixels, patches, or pixels as in previous work.
Validation of a smartphone app to map social networks of proximity
Tjeerd W. Boonstra, Mark E. Larsen, Samuel Townsend, Helen Christensen
Jun 28 2017 cs.SI arXiv:1706.08777v2

@misc{1706.08777, author = {Tjeerd W.~Boonstra and Mark E.~Larsen and Samuel Townsend and Helen Christensen}, title = {{V}alidation of a smartphone app to map social networks of proximity}, year = {2017}, eprint = {1706.08777}, note = {arXiv:1706.08777v2} }
PDF
Social network analysis is a prominent approach to investigate interpersonal relationships. Most studies use self-report data to quantify the connections between participants and construct social networks. In recent years smartphones have been used as an alternative to map networks by assessing the proximity between participants based on Bluetooth and GPS data. While most studies have handed out specially programmed smartphones to study participants, we developed an application for iOS and Android to collect Bluetooth data from participants own smartphones. In this study, we compared the networks estimated with the smartphone app to those obtained from sociometric badges and self-report data. Participants (n=21) installed the app on their phone and wore a sociometric badge during office hours. Proximity data was collected for 4 weeks. A contingency table revealed a significant association between proximity data (rho = 0.17, p<0.0001), but the marginal odds were higher for the app (8.6%) than for the badges (1.3%), indicating that dyads were more often detected by the app. We then compared the networks that were estimated using the proximity and self-report data. All three networks were significantly correlated, although the correlation with self-reported data was lower for the app (rho = 0.25) than for badges (rho = 0.67). The scanning rates of the app varied considerably between devices and was lower on iOS than on Android. The association between the app and the badges increased when the network was estimated between participants whose app recorded more regularly. These findings suggest that the accuracy of proximity networks can be further improved by reducing missing data and restricting the interpersonal distance at which interactions are detected.
Distributed Mapping with Privacy and Communication Constraints: Lightweight Algorithms and Object-based Models
Siddharth Choudhary, Luca Carlone, Carlos Nieto, John Rogers, Henrik I. Christensen, Frank Dellaert
Feb 14 2017 cs.RO cs.CV arXiv:1702.03435v1

@misc{1702.03435, author = {Siddharth Choudhary and Luca Carlone and Carlos Nieto and John Rogers and Henrik I.~Christensen and Frank Dellaert}, title = {{D}istributed {M}apping with {P}rivacy and {C}ommunication {C}onstraints: {L}ightweight {A}lgorithms and {O}bject-based {M}odels}, year = {2017}, eprint = {1702.03435}, note = {arXiv:1702.03435v1} }
PDF
We consider the following problem: a team of robots is deployed in an unknown environment and it has to collaboratively build a map of the area without a reliable infrastructure for communication. The backbone for modern mapping techniques is pose graph optimization, which estimates the trajectory of the robots, from which the map can be easily built. The first contribution of this paper is a set of distributed algorithms for pose graph optimization: rather than sending all sensor data to a remote sensor fusion server, the robots exchange very partial and noisy information to reach an agreement on the pose graph configuration. Our approach can be considered as a distributed implementation of the two-stage approach of Carlone et al., where we use the Successive Over-Relaxation (SOR) and the Jacobi Over-Relaxation (JOR) as workhorses to split the computation among the robots. As a second contribution, we extend %and demonstrate the applicability of the proposed distributed algorithms to work with object-based map models. The use of object-based models avoids the exchange of raw sensor measurements (e.g., point clouds) further reducing the communication burden. Our third contribution is an extensive experimental evaluation of the proposed techniques, including tests in realistic Gazebo simulations and field experiments in a military test facility. Abundant experimental evidence suggests that one of the proposed algorithms (the Distributed Gauss-Seidel method or DGS) has excellent performance. The DGS requires minimal information exchange, has an anytime flavor, scales well to large teams, is robust to noise, and is easy to implement. Our field tests show that the combined use of our distributed algorithms and object-based models reduces the communication requirements by several orders of magnitude and enables distributed mapping with large teams of robots in real-world problems.
Smartphone app to investigate the relationship between social connectivity and mental health
Tjeerd W. Boonstra, Aliza Werner-Seidler, Bridianne O'Dea, Mark E. Larsen, Helen Christensen
Feb 10 2017 cs.SI cs.CY arXiv:1702.02644v2

@misc{1702.02644, author = {Tjeerd W.~Boonstra and Aliza Werner-Seidler and Bridianne O'Dea and Mark E.~Larsen and Helen Christensen}, title = {{S}martphone app to investigate the relationship between social connectivity and mental health}, year = {2017}, eprint = {1702.02644}, note = {arXiv:1702.02644v2} }
PDF
Interpersonal relationships are necessary for successful daily functioning and wellbeing. Numerous studies have demonstrated the importance of social connectivity for mental health, both through direct peer-to-peer influence and by the location of individuals within their social network. Passive monitoring using smartphones provides an advanced tool to map social networks based on the proximity between individuals. This study investigates the feasibility of using a smartphone app to measure and assess the relationship between social network metrics and mental health. The app collected Bluetooth and mental health data in 63 participants. Social networks of proximity were estimated from Bluetooth data and 95% of the edges were scanned at least every 30 minutes. The majority of participants found this method of data collection acceptable and reported that they would be likely to participate in future studies using this app. These findings demonstrate the feasibility of using a smartphone app that participants can install on their own phone to investigate the relationship between social connectivity and mental health.
StuffNet: Using 'Stuff' to Improve Object Detection
Samarth Brahmbhatt, Henrik I. Christensen, James Hays
Oct 20 2016 cs.CV arXiv:1610.05861v2

@misc{1610.05861, author = {Samarth Brahmbhatt and Henrik I.~Christensen and James Hays}, title = {{S}tuff{N}et: {U}sing '{S}tuff' to {I}mprove {O}bject {D}etection}, year = {2016}, eprint = {1610.05861}, note = {arXiv:1610.05861v2} }
PDF
We propose a Convolutional Neural Network (CNN) based algorithm - StuffNet - for object detection. In addition to the standard convolutional features trained for region proposal and object detection [31], StuffNet uses convolutional features trained for segmentation of objects and 'stuff' (amorphous categories such as ground and water). Through experiments on Pascal VOC 2010, we show the importance of features learnt from stuff segmentation for improving object detection performance. StuffNet improves performance from 18.8% mAP to 23.9% mAP for small objects. We also devise a method to train StuffNet on datasets that do not have stuff segmentation labels. Through experiments on Pascal VOC 2007 and 2012, we demonstrate the effectiveness of this method and show that StuffNet also significantly improves object detection performance on such datasets.
Toward a Science of Autonomy for Physical Systems: Service
Peter Allen, Henrik I. Christensen
Sep 20 2016 cs.CY arXiv:1609.05818v1

@misc{1609.05818, author = {Peter Allen and Henrik I.~Christensen}, title = {{T}oward a {S}cience of {A}utonomy for {P}hysical {S}ystems: {S}ervice}, year = {2016}, eprint = {1609.05818}, note = {arXiv:1609.05818v1} }
PDF
A recent study by the Robotic Industries Association has highlighted how service robots are increasingly broadening our horizons beyond the factory floor. From robotic vacuums, bomb retrievers, exoskeletons and drones, to robots used in surgery, space exploration, agriculture, home assistance and construction, service robots are building a formidable resume. In just the last few years we have seen service robots deliver room service meals, assist shoppers in finding items in a large home improvement store, checking in customers and storing their luggage at hotels, and pour drinks on cruise ships. Personal robots are here to educate, assist and entertain at home. These domestic robots can perform daily chores, assist people with disabilities and serve as companions or pets for entertainment. By all accounts, the growth potential for service robotics is quite large.
Next Generation Robotics
Henrik I Christensen, Allison Okamura, Maja Mataric, Vijay Kumar, Greg Hager, Howie Choset
Jun 30 2016 cs.CY cs.RO arXiv:1606.09205v1

@misc{1606.09205, author = {Henrik I Christensen and Allison Okamura and Maja Mataric and Vijay Kumar and Greg Hager and Howie Choset}, title = {{N}ext {G}eneration {R}obotics}, year = {2016}, eprint = {1606.09205}, note = {arXiv:1606.09205v1} }
PDF
The National Robotics Initiative (NRI) was launched 2011 and is about to celebrate its 5 year anniversary. In parallel with the NRI, the robotics community, with support from the Computing Community Consortium, engaged in a series of road mapping exercises. The first version of the roadmap appeared in September 2009; a second updated version appeared in 2013. While not directly aligned with the NRI, these road-mapping documents have provided both a useful charting of the robotics research space, as well as a metric by which to measure progress. This report sets forth a perspective of progress in robotics over the past five years, and provides a set of recommendations for the future. The NRI has in its formulation a strong emphasis on co-robot, i.e., robots that work directly with people. An obvious question is if this should continue to be the focus going forward? To try to assess what are the main trends, what has happened the last 5 years and what may be promising directions for the future a small CCC sponsored study was launched to have two workshops, one in Washington DC (March 5th, 2016) and another in San Francisco, CA (March 11th, 2016). In this report we brief summarize some of the main discussions and observations from those workshops. We will present a variety of background information in Section 2, and outline various issues related to progress over the last 5 years in Section 3. In Section 4 we will outline a number of opportunities for moving forward. Finally, we will summarize the main points in Section 5.
Toward a Science of Autonomy for Physical Systems
Gregory D. Hager, Daniela Rus, Vijay Kumar, Henrik Christensen
Apr 12 2016 cs.CY arXiv:1604.02979v1

@misc{1604.02979, author = {Gregory D.~Hager and Daniela Rus and Vijay Kumar and Henrik Christensen}, title = {{T}oward a {S}cience of {A}utonomy for {P}hysical {S}ystems}, year = {2016}, eprint = {1604.02979}, note = {arXiv:1604.02979v1} }
PDF
Our lives have been immensely improved by decades of automation research -- we are more comfortable, more productive and safer than ever before. Just imagine a world where familiar automation technologies have failed. In that world, thermostats don't work -- you have to monitor your home heating system manually. Cruise control for your car doesn't exist. Every elevator has to have a human operator to hit the right floor, most manufactured products are assembled by hand, and you have to wash your own dishes. Who would willingly adopt that world -- the world of last century -- today? Physical systems -- elevators, cars, home appliances, manufacturing equipment -- were more troublesome, ore time consuming, less safe, and far less convenient. Now, suppose we put ourselves in the place someone 20 years in the future, a future of autonomous systems. A future where transportation is largely autonomous, more efficient, and far safer; a future where dangerous occupations like mining or disaster response are performed by autonomous systems supervised remotely by humans; a future where manufacturing and healthcare are twice as productive per person-hour by having smart monitoring and readily re-tasked autonomous physical agents; a future where the elderly and infirm have 24 hour in-home autonomous support for the basic activities, both physical and social, of daily life. In a future world where these capabilities are commonplace, why would someone come back to today's world where someone has to put their life at risk to do a menial job, we lose time to mindless activities that have no intrinsic value, or be consumed with worry that a loved one is at risk in their own home? In what follows, and in a series of associated essays, we expand on these ideas, and frame both the opportunities and challenges posed by autonomous physical systems.
Multi-modal Tracking for Object based SLAM
Prateek Singhal, Ruffin White, Henrik Christensen
Mar 15 2016 cs.CV arXiv:1603.04117v1

@misc{1603.04117, author = {Prateek Singhal and Ruffin White and Henrik Christensen}, title = {{M}ulti-modal {T}racking for {O}bject based {SLAM}}, year = {2016}, eprint = {1603.04117}, note = {arXiv:1603.04117v1} }
PDF
We present an on-line 3D visual object tracking framework for monocular cameras by incorporating spatial knowledge and uncertainty from semantic mapping along with high frequency measurements from visual odometry. Using a combination of vision and odometry that are tightly integrated we can increase the overall performance of object based tracking for semantic mapping. We present a framework for integration of the two data-sources into a coherent framework through information based fusion/arbitration. We demonstrate the framework in the context of OmniMapper[1] and present results on 6 challenging sequences over multiple objects compared to data obtained from a motion capture systems. We are able to achieve a mean error of 0.23m for per frame tracking showing 9% relative error less than state of the art tracker.
Grasping for a Purpose: Using Task Goals for Efficient Manipulation Planning
Ana Huaman Quispe, Heni Ben Amor, Henrik Christensen, Mike Stilman
Mar 15 2016 cs.RO arXiv:1603.04338v1

@misc{1603.04338, author = {Ana Huaman Quispe and Heni Ben Amor and Henrik Christensen and Mike Stilman}, title = {{G}rasping for a {P}urpose: {U}sing {T}ask {G}oals for {E}fficient {M}anipulation {P}lanning}, year = {2016}, eprint = {1603.04338}, note = {arXiv:1603.04338v1} }
PDF
In this paper we propose an approach for efficient grasp selection for manipulation tasks of unknown objects. Even for simple tasks such as pick-and-place, a unique solution is rare to occur. Rather, multiple candidate grasps must be considered and (potentially) tested till a successful, kinematically feasible path is found. To make this process efficient, the grasps should be ordered such that those more likely to succeed are tested first. We propose to use grasp manipulability as a metric to prioritize grasps. We present results of simulation experiments which demonstrate the usefulness of our metric. Additionally, we present experiments with our physical robot performing simple manipulation tasks with a small set of different household objects.
Predicting Daily Activities From Egocentric Images Using Deep Learning
Daniel Castro, Steven Hickson, Vinay Bettadapura, Edison Thomaz, Gregory Abowd, Henrik Christensen, Irfan Essa
Oct 07 2015 cs.CV arXiv:1510.01576v1

@misc{1510.01576, author = {Daniel Castro and Steven Hickson and Vinay Bettadapura and Edison Thomaz and Gregory Abowd and Henrik Christensen and Irfan Essa}, title = {{P}redicting {D}aily {A}ctivities {F}rom {E}gocentric {I}mages {U}sing {D}eep {L}earning}, year = {2015}, eprint = {1510.01576}, howpublished = {ISWC '15 Proceedings of the 2015 ACM International Symposium on Wearable Computers - Pages 75-82}, doi = {10.1145/2802083.2808398}, note = {arXiv:1510.01576v1} }
PDF
We present a method to analyze images taken from a passive egocentric wearable camera along with the contextual information, such as time and day of week, to learn and predict everyday activities of an individual. We collected a dataset of 40,103 egocentric images over a 6 month period with 19 activity classes and demonstrate the benefit of state-of-the-art deep learning techniques for learning and predicting daily activities. Classification is conducted using a Convolutional Neural Network (CNN) with a classification method we introduce called a late fusion ensemble. This late fusion ensemble incorporates relevant contextual information and increases our classification accuracy. Our technique achieves an overall accuracy of 83.07% in predicting a person's activity across the 19 activity classes. We also demonstrate some promising results from two additional users by fine-tuning the classifier with one day of training data.
Using Bluetooth Low Energy in smartphones to map social networks
Samuel Townsend, Mark E. Larsen, Tjeerd W. Boonstra, Helen Christensen
Aug 18 2015 cs.SI cs.CY arXiv:1508.03938v1

@misc{1508.03938, author = {Samuel Townsend and Mark E.~Larsen and Tjeerd W.~Boonstra and Helen Christensen}, title = {{U}sing {B}luetooth {L}ow {E}nergy in smartphones to map social networks}, year = {2015}, eprint = {1508.03938}, note = {arXiv:1508.03938v1} }
PDF
Social networks have an important role in an individual's health, with the propagation of health-related features through a network, and correlations between network structures and symptomatology. Using Bluetooth-enabled smartphones to measure social connectivity is an alternative to traditional paper-based data collection; however studies employing this technology have been restricted to limited sets of homogenous handsets. We investigated the feasibility of using the Bluetooth Low Energy (BLE) protocol, present on users' own smartphones, to measure social connectivity. A custom application was designed for Android and iOS handsets. The app was configured to simultaneously broadcast via BLE and perform periodic discovery scans for other nearby devices. The app was installed on two Android handsets and two iOS handsets, and each combination of devices was tested in the foreground, background and locked states. Connectivity was successfully measured in all test cases, except between two iOS devices when both were in a locked state with their screens off. As smartphones are in a locked state for the majority of a day, this severely limits the ability to measure social connectivity on users' own smartphones. It is not currently feasible to use Bluetooth Low Energy to map social networks, due to the inability of iOS devices to detect another iOS device when both are in a locked state. While the technology was successfully implemented on Android devices, this represents a smaller market share of partially or fully compatible devices.
Occlusion-Aware Object Localization, Segmentation and Pose Estimation
Samarth Brahmbhatt, Heni Ben Amor, Henrik Christensen
Jul 29 2015 cs.CV arXiv:1507.07882v1

@misc{1507.07882, author = {Samarth Brahmbhatt and Heni Ben Amor and Henrik Christensen}, title = {{O}cclusion-{A}ware {O}bject {L}ocalization, {S}egmentation and {P}ose {E}stimation}, year = {2015}, eprint = {1507.07882}, note = {arXiv:1507.07882v1} }
PDF
We present a learning approach for localization and segmentation of objects in an image in a manner that is robust to partial occlusion. Our algorithm produces a bounding box around the full extent of the object and labels pixels in the interior that belong to the object. Like existing segmentation aware detection approaches, we learn an appearance model of the object and consider regions that do not fit this model as potential occlusions. However, in addition to the established use of pairwise potentials for encouraging local consistency, we use higher order potentials which capture information at the level of im- age segments. We also propose an efficient loss function that targets both localization and segmentation performance. Our algorithm achieves 13.52% segmentation error and 0.81 area under the false-positive per image vs. recall curve on average over the challenging CMU Kitchen Occlusion Dataset. This is a 42.44% decrease in segmentation error and a 16.13% increase in localization performance compared to the state-of-the-art. Finally, we show that the visibility labelling produced by our algorithm can make full 3D pose estimation from a single image robust to occlusion.
GASP : Geometric Association with Surface Patches
Rahul Sawhney, Fuxin Li, Henrik I. Christensen
Nov 18 2014 cs.CV cs.GR cs.RO arXiv:1411.4098v1

@misc{1411.4098, author = {Rahul Sawhney and Fuxin Li and Henrik I.~Christensen}, title = {{GASP} : {G}eometric {A}ssociation with {S}urface {P}atches}, year = {2014}, eprint = {1411.4098}, note = {arXiv:1411.4098v1} }
PDF
A fundamental challenge to sensory processing tasks in perception and robotics is the problem of obtaining data associations across views. We present a robust solution for ascertaining potentially dense surface patch (superpixel) associations, requiring just range information. Our approach involves decomposition of a view into regularized surface patches. We represent them as sequences expressing geometry invariantly over their superpixel neighborhoods, as uniquely consistent partial orderings. We match these representations through an optimal sequence comparison metric based on the Damerau-Levenshtein distance - enabling robust association with quadratic complexity (in contrast to hitherto employed joint matching formulations which are NP-complete). The approach is able to perform under wide baselines, heavy rotations, partial overlaps, significant occlusions and sensor noise. The technique does not require any priors -- motion or otherwise, and does not make restrictive assumptions on scene structure and sensor movement. It does not require appearance -- is hence more widely applicable than appearance reliant methods, and invulnerable to related ambiguities such as textureless or aliased content. We present promising qualitative and quantitative results under diverse settings, along with comparatives with popular approaches based on range as well as RGB-D data.
Anisotropic Agglomerative Adaptive Mean-Shift
Rahul Sawhney, Henrik I. Christensen, Gary R. Bradski
Nov 18 2014 cs.CV cs.LG arXiv:1411.4102v1

@misc{1411.4102, author = {Rahul Sawhney and Henrik I.~Christensen and Gary R.~Bradski}, title = {{A}nisotropic {A}gglomerative {A}daptive {M}ean-{S}hift}, year = {2014}, eprint = {1411.4102}, note = {arXiv:1411.4102v1} }
PDF
Mean Shift today, is widely used for mode detection and clustering. The technique though, is challenged in practice due to assumptions of isotropicity and homoscedasticity. We present an adaptive Mean Shift methodology that allows for full anisotropic clustering, through unsupervised local bandwidth selection. The bandwidth matrices evolve naturally, adapting locally through agglomeration, and in turn guiding further agglomeration. The online methodology is practical and effecive for low-dimensional feature spaces, preserving better detail and clustering salience. Additionally, conventional Mean Shift either critically depends on a per instance choice of bandwidth, or relies on offline methods which are inflexible and/or again data instance specific. The presented approach, due to its adaptive design, also alleviates this issue - with a default form performing generally well. The methodology though, allows for effective tuning of results.