Search SciRate

16 results for au:Thomaz_E in:cs

Show all abstracts

Development and Evaluation of Three Chatbots for Postpartum Mood and Anxiety Disorders
Xuewen Yao, Miriam Mikhelson, S. Craig Watkins, Eunsol Choi, Edison Thomaz, Kaya de Barbaro
Aug 16 2023 cs.CL cs.HC arXiv:2308.07407v1

@misc{2308.07407, author = {Xuewen Yao and Miriam Mikhelson and S.~Craig Watkins and Eunsol Choi and Edison Thomaz and Kaya de Barbaro}, title = {{D}evelopment and {E}valuation of {T}hree {C}hatbots for {P}ostpartum {M}ood and {A}nxiety {D}isorders}, year = {2023}, eprint = {2308.07407}, note = {arXiv:2308.07407v1} }
PDF
In collaboration with Postpartum Support International (PSI), a non-profit organization dedicated to supporting caregivers with postpartum mood and anxiety disorders, we developed three chatbots to provide context-specific empathetic support to postpartum caregivers, leveraging both rule-based and generative models. We present and evaluate the performance of our chatbots using both machine-based metrics and human-based questionnaires. Overall, our rule-based model achieves the best performance, with outputs that are close to ground truth reference and contain the highest levels of empathy. Human users prefer the rule-based chatbot over the generative chatbot for its context-specific and human-like replies. Our generative chatbot also produced empathetic responses and was described by human users as engaging. However, limitations in the training dataset often result in confusing or nonsensical responses. We conclude by discussing practical benefits of rule-based vs. generative models for supporting individuals with mental health challenges. In light of the recent surge of ChatGPT and BARD, we also discuss the possibilities and pitfalls of large language models for digital mental healthcare.
Cheating off your neighbors: Improving activity recognition through corroboration
Haoxiang Yu, Jingyi An, Evan King, Edison Thomaz, Christine Julien
Jun 12 2023 cs.CV cs.HC cs.LG eess.SP arXiv:2306.06078v1

@misc{2306.06078, author = {Haoxiang Yu and Jingyi An and Evan King and Edison Thomaz and Christine Julien}, title = {{C}heating off your neighbors: {I}mproving activity recognition through corroboration}, year = {2023}, eprint = {2306.06078}, note = {arXiv:2306.06078v1} }
PDF
Understanding the complexity of human activities solely through an individual's data can be challenging. However, in many situations, surrounding individuals are likely performing similar activities, while existing human activity recognition approaches focus almost exclusively on individual measurements and largely ignore the context of the activity. Consider two activities: attending a small group meeting and working at an office desk. From solely an individual's perspective, it can be difficult to differentiate between these activities as they may appear very similar, even though they are markedly different. Yet, by observing others nearby, it can be possible to distinguish between these activities. In this paper, we propose an approach to enhance the prediction accuracy of an individual's activities by incorporating insights from surrounding individuals. We have collected a real-world dataset from 20 participants with over 58 hours of data including activities such as attending lectures, having meetings, working in the office, and eating together. Compared to observing a single person in isolation, our proposed approach significantly improves accuracy. We regard this work as a first step in collaborative activity recognition, opening new possibilities for understanding human activity in group settings.
Understanding Postpartum Parents' Experiences via Two Digital Platforms
Xuewen Yao, Miriam Mikhelson, Megan Micheletti, Eunsol Choi, S Craig Watkins, Edison Thomaz, Kaya De Barbaro
Dec 23 2022 cs.CL arXiv:2212.11455v1

@misc{2212.11455, author = {Xuewen Yao and Miriam Mikhelson and Megan Micheletti and Eunsol Choi and S Craig Watkins and Edison Thomaz and Kaya De Barbaro}, title = {{U}nderstanding {P}ostpartum {P}arents' {E}xperiences via {T}wo {D}igital {P}latforms}, year = {2022}, eprint = {2212.11455}, note = {arXiv:2212.11455v1} }
PDF
Digital platforms, including online forums and helplines, have emerged as avenues of support for caregivers suffering from postpartum mental health distress. Understanding support seekers' experiences as shared on these platforms could provide crucial insight into caregivers' needs during this vulnerable time. In the current work, we provide a descriptive analysis of the concerns, psychological states, and motivations shared by healthy and distressed postpartum support seekers on two digital platforms, a one-on-one digital helpline and a publicly available online forum. Using a combination of human annotations, dictionary models and unsupervised techniques, we find stark differences between the experiences of distressed and healthy mothers. Distressed mothers described interpersonal problems and a lack of support, with 8.60% - 14.56% reporting severe symptoms including suicidal ideation. In contrast, the majority of healthy mothers described childcare issues, such as questions about breastfeeding or sleeping, and reported no severe mental health concerns. Across the two digital platforms, we found that distressed mothers shared similar content. However, the patterns of speech and affect shared by distressed mothers differed between the helpline vs. the online forum, suggesting the design of these platforms may shape meaningful measures of their support-seeking experiences. Our results provide new insight into the experiences of caregivers suffering from postpartum mental health distress. We conclude by discussing methodological considerations for understanding content shared by support seekers and design considerations for the next generation of support tools for postpartum parents.
Dynamic Speech Endpoint Detection with Regression Targets
Dawei Liang, Hang Su, Tarun Singh, Jay Mahadeokar, Shanil Puri, Jiedan Zhu, Edison Thomaz, Mike Seltzer
Oct 27 2022 cs.SD eess.AS arXiv:2210.14252v1

@misc{2210.14252, author = {Dawei Liang and Hang Su and Tarun Singh and Jay Mahadeokar and Shanil Puri and Jiedan Zhu and Edison Thomaz and Mike Seltzer}, title = {{D}ynamic {S}peech {E}ndpoint {D}etection with {R}egression {T}argets}, year = {2022}, eprint = {2210.14252}, note = {arXiv:2210.14252v1} }
PDF
Interactive voice assistants have been widely used as input interfaces in various scenarios, e.g. on smart homes devices, wearables and on AR devices. Detecting the end of a speech query, i.e. speech end-pointing, is an important task for voice assistants to interact with users. Traditionally, speech end-pointing is based on pure classification methods along with arbitrary binary targets. In this paper, we propose a novel regression-based speech end-pointing model, which enables an end-pointer to adjust its detection behavior based on context of user queries. Specifically, we present a pause modeling method and show its effectiveness for dynamic end-pointing. Based on our experiments with vendor-collected smartphone and wearables speech queries, our strategy shows a better trade-off between endpointing latency and accuracy, compared to the traditional classification-based method. We further discuss the benefits of this model and generalization of the framework in the paper.
Automated detection of foreground speech with wearable sensing in everyday home environments: A transfer learning approach
Dawei Liang, Zifan Xu, Yinuo Chen, Rebecca Adaimi, David Harwath, Edison Thomaz
Mar 23 2022 cs.SD arXiv:2203.11294v1

@misc{2203.11294, author = {Dawei Liang and Zifan Xu and Yinuo Chen and Rebecca Adaimi and David Harwath and Edison Thomaz}, title = {{A}utomated detection of foreground speech with wearable sensing in everyday home environments: {A} transfer learning approach}, year = {2022}, eprint = {2203.11294}, note = {arXiv:2203.11294v1} }
PDF
Acoustic sensing has proved effective as a foundation for numerous applications in health and human behavior analysis. In this work, we focus on the problem of detecting in-person social interactions in naturalistic settings from audio captured by a smartwatch. As a first step towards detecting social interactions, it is critical to distinguish the speech of the individual wearing the watch from all other sounds nearby, such as speech from other individuals and ambient sounds. This is very challenging in realistic settings, where interactions take place spontaneously and supervised models cannot be trained apriori to recognize the full complexity of dynamic social environments. In this paper, we introduce a transfer learning-based approach to detect foreground speech of users wearing a smartwatch. A highlight of the method is that it does not depend on the collection of voice samples to build user-specific models. Instead, the approach is based on knowledge transfer from general-purpose speaker representations derived from public datasets. Our experiments demonstrate that our approach performs comparably to a fully supervised model, with 80% F1 score. To evaluate the method, we collected a dataset of 31 hours of smartwatch-recorded audio in 18 homes with a total of 39 participants performing various semi-controlled tasks.
Lifelong Adaptive Machine Learning for Sensor-based Human Activity Recognition Using Prototypical Networks
Rebecca Adaimi, Edison Thomaz
Mar 14 2022 cs.LG eess.SP arXiv:2203.05692v1

@misc{2203.05692, author = {Rebecca Adaimi and Edison Thomaz}, title = {{L}ifelong {A}daptive {M}achine {L}earning for {S}ensor-based {H}uman {A}ctivity {R}ecognition {U}sing {P}rototypical {N}etworks}, year = {2022}, eprint = {2203.05692}, note = {arXiv:2203.05692v1} }
PDF
Continual learning, also known as lifelong learning, is an emerging research topic that has been attracting increasing interest in the field of machine learning. With human activity recognition (HAR) playing a key role in enabling numerous real-world applications, an essential step towards the long-term deployment of such recognition systems is to extend the activity model to dynamically adapt to changes in people's everyday behavior. Current research in continual learning applied to HAR domain is still under-explored with researchers exploring existing methods developed for computer vision in HAR. Moreover, analysis has so far focused on task-incremental or class-incremental learning paradigms where task boundaries are known. This impedes the applicability of such methods for real-world systems since data is presented in a randomly streaming fashion. To push this field forward, we build on recent advances in the area of continual machine learning and design a lifelong adaptive learning framework using Prototypical Networks, LAPNet-HAR, that processes sensor-based data streams in a task-free data-incremental fashion and mitigates catastrophic forgetting using experience replay and continual prototype adaptation. Online learning is further facilitated using contrastive loss to enforce inter-class separation. LAPNet-HAR is evaluated on 5 publicly available activity datasets in terms of the framework's ability to acquire new information while preserving previous knowledge. Our extensive empirical results demonstrate the effectiveness of LAPNet-HAR in task-free continual learning and uncover useful insights for future challenges.
HAR-GCNN: Deep Graph CNNs for Human Activity Recognition From Highly Unlabeled Mobile Sensor Data
Abduallah Mohamed, Fernando Lejarza, Stephanie Cahail, Christian Claudel, Edison Thomaz
Mar 08 2022 cs.CV cs.AI cs.HC cs.LG arXiv:2203.03087v1

@misc{2203.03087, author = {Abduallah Mohamed and Fernando Lejarza and Stephanie Cahail and Christian Claudel and Edison Thomaz}, title = {{HAR}-{GCNN}: {D}eep {G}raph {CNN}s for {H}uman {A}ctivity {R}ecognition {F}rom {H}ighly {U}nlabeled {M}obile {S}ensor {D}ata}, year = {2022}, eprint = {2203.03087}, howpublished = {IEEE PerCom Workshop on Context and Activity Modeling and Recognition (CoMoReA), 2022}, note = {arXiv:2203.03087v1} }
PDF
The problem of human activity recognition from mobile sensor data applies to multiple domains, such as health monitoring, personal fitness, daily life logging, and senior care. A critical challenge for training human activity recognition models is data quality. Acquiring balanced datasets containing accurate activity labels requires humans to correctly annotate and potentially interfere with the subjects' normal activities in real-time. Despite the likelihood of incorrect annotation or lack thereof, there is often an inherent chronology to human behavior. For example, we take a shower after we exercise. This implicit chronology can be used to learn unknown labels and classify future activities. In this work, we propose HAR-GCCN, a deep graph CNN model that leverages the correlation between chronologically adjacent sensor measurements to predict the correct labels for unclassified activities that have at least one activity label. We propose a new training strategy enforcing that the model predicts the missing activity labels by leveraging the known ones. HAR-GCCN shows superior performance relative to previously used baseline methods, improving classification accuracy by about 25% and up to 68% on different datasets. Code is available at \urlhttps://github.com/abduallahmohamed/HAR-GCNN.
Transferring Voice Knowledge for Acoustic Event Detection: An Empirical Study
Dawei Liang, Yangyang Shi, Yun Wang, Nayan Singhal, Alex Xiao, Jonathan Shaw, Edison Thomaz, Ozlem Kalinli, Mike Seltzer
Oct 08 2021 cs.SD cs.AI eess.AS arXiv:2110.03174v1

@misc{2110.03174, author = {Dawei Liang and Yangyang Shi and Yun Wang and Nayan Singhal and Alex Xiao and Jonathan Shaw and Edison Thomaz and Ozlem Kalinli and Mike Seltzer}, title = {{T}ransferring {V}oice {K}nowledge for {A}coustic {E}vent {D}etection: {A}n {E}mpirical {S}tudy}, year = {2021}, eprint = {2110.03174}, note = {arXiv:2110.03174v1} }
PDF
Detection of common events and scenes from audio is useful for extracting and understanding human contexts in daily life. Prior studies have shown that leveraging knowledge from a relevant domain is beneficial for a target acoustic event detection (AED) process. Inspired by the observation that many human-centered acoustic events in daily life involve voice elements, this paper investigates the potential of transferring high-level voice representations extracted from a public speaker dataset to enrich an AED pipeline. Towards this end, we develop a dual-branch neural network architecture for the joint learning of voice and acoustic features during an AED process and conduct thorough empirical studies to examine the performance on the public AudioSet [1] with different types of inputs. Our main observations are that: 1) Joint learning of audio and voice inputs improves the AED performance (mean average precision) for both a CNN baseline (0.292 vs 0.134 mAP) and a TALNet [2] baseline (0.361 vs 0.351 mAP); 2) Augmenting the extra voice features is critical to maximize the model performance with dual inputs.
MagSurface: Wireless 2D Finger Tracking Leveraging Magnetic Fields
Sarnab Bhattacharya, Keum San Chun, Edison Thomaz
May 04 2021 cs.HC cs.SY eess.SY arXiv:2105.00543v1

@misc{2105.00543, author = {Sarnab Bhattacharya and Keum San Chun and Edison Thomaz}, title = {{M}ag{S}urface: {W}ireless 2{D} {F}inger {T}racking {L}everaging {M}agnetic {F}ields}, year = {2021}, eprint = {2105.00543}, note = {arXiv:2105.00543v1} }
PDF
With the ubiquity of touchscreens, touch input modality has become a popular way of interaction. However, current touchscreen technology is limiting in its design as it restricts touch interactions to specially instrumented touch surfaces. Surface contaminants like water can also hinder proper interactions. In this paper, we propose the use of magnetic field sensing to enable finger tracking on a surface with minimal instrumentation. Our system, MagSurface, turns everyday surfaces into a touch medium, thus allowing more flexibility in the types of touch surfaces. The evaluation of our system consists of quantifying the accuracy of the system in locating an object on 2D flat surfaces. We test our system on three different surface materials to validate its usage scenarios. A qualitative user experience study was also conducted to get feedback on the ease of use and comfort of the system. Localization error as low as a few millimeters was achieved
Improving Prediction of Real-Time Loneliness and Companionship Type Using Geosocial Features of Personal Smartphone Data
Congyu Wu, Amanda N. Barczyk, R. Cameron Craddock, Gabriella M. Harari, Edison Thomaz, Jason D. Shumake, Christopher G. Beevers, Samuel D. Gosling, David M. Schnyer
Oct 21 2020 cs.HC arXiv:2010.09807v1

@misc{2010.09807, author = {Congyu Wu and Amanda N.~Barczyk and R.~Cameron Craddock and Gabriella M.~Harari and Edison Thomaz and Jason D.~Shumake and Christopher G.~Beevers and Samuel D.~Gosling and David M.~Schnyer}, title = {{I}mproving {P}rediction of {R}eal-{T}ime {L}oneliness and {C}ompanionship {T}ype {U}sing {G}eosocial {F}eatures of {P}ersonal {S}martphone {D}ata}, year = {2020}, eprint = {2010.09807}, note = {arXiv:2010.09807v1} }
PDF
Loneliness is a widely affecting mental health symptom and can be mediated by and co-vary with patterns of social exposure. Using momentary survey and smartphone sensing data collected from 129 Android-using college student participants over three weeks, we (1) investigate and uncover the relations between momentary loneliness experience and companionship type and (2) propose and validate novel geosocial features of smartphone-based Bluetooth and GPS data for predicting loneliness and companionship type in real time. We base our features on intuitions characterizing the quantity and spatiotemporal predictability of an individual's Bluetooth encounters and GPS location clusters to capture personal significance of social exposure scenarios conditional on their temporal distribution and geographic patterns. We examine our features' statistical correlation with momentary loneliness through regression analyses and evaluate their predictive power using a sliding window prediction procedure. Our features achieved significant performance improvement compared to baseline for predicting both momentary loneliness and companionship type, with the effect stronger for the loneliness prediction task. As such we recommend incorporation and further evaluation of our geosocial features proposed in this study in future mental health sensing and context-aware computing applications.
Multi-Modal Data Collection for Measuring Health, Behavior, and Living Environment of Large-Scale Participant Cohorts: Conceptual Framework and Findings from Deployments
Congyu Wu, Hagen Fritz, Zoltan Nagy, Juan P. Maestre, Edison Thomaz, Christine Julien, Darla M. Castelli, Kaya de Barbaro, Gabriella M. Harari, R. Cameron Craddock, Kerry A. Kinney, Samuel D. Gosling, David M. Schnyer
Oct 19 2020 cs.HC arXiv:2010.08457v1

@misc{2010.08457, author = {Congyu Wu and Hagen Fritz and Zoltan Nagy and Juan P.~Maestre and Edison Thomaz and Christine Julien and Darla M.~Castelli and Kaya de Barbaro and Gabriella M.~Harari and R.~Cameron Craddock and Kerry A.~Kinney and Samuel D.~Gosling and David M.~Schnyer}, title = {{M}ulti-{M}odal {D}ata {C}ollection for {M}easuring {H}ealth, {B}ehavior, and {L}iving {E}nvironment of {L}arge-{S}cale {P}articipant {C}ohorts: {C}onceptual {F}ramework and {F}indings from {D}eployments}, year = {2020}, eprint = {2010.08457}, note = {arXiv:2010.08457v1} }
PDF
As mobile technologies become ever more sensor-rich, portable, and ubiquitous, data captured by smart devices are lending rich insights into users' daily lives with unprecedented comprehensiveness, unobtrusiveness, and ecological validity. A number of human-subject studies have been conducted in the past decade to examine the use of mobile sensing to uncover individual behavioral patterns and health outcomes. While understanding health and behavior is the focus for most of these studies, we find that minimal attention has been placed on measuring personal environments, especially together with other human-centric data modalities. Moreover, the participant cohort size in most existing studies falls well below a few hundred, leaving questions open about the reliability of findings on the relations between mobile sensing signals and human outcomes. To address these limitations, we developed a home environment sensor kit for continuous indoor air quality tracking and deployed it in conjunction with established mobile sensing and experience sampling techniques in a cohort study of up to 1584 student participants per data type for 3 weeks at a major research university in the United States. In this paper, we begin by proposing a conceptual framework that systematically organizes human-centric data modalities by their temporal coverage and spatial freedom. Then we report our study design and procedure, technologies and methods deployed, descriptive statistics of the collected data, and results from our extensive exploratory analyses. Our novel data, conceptual development, and analytical findings provide important guidance for data collection and hypothesis generation in future human-centric sensing studies.
Infant Crying Detection in Real-World Environments
Xuewen Yao, Megan Micheletti, Mckensey Johnson, Edison Thomaz, Kaya de Barbaro
May 15 2020 eess.AS cs.LG cs.SD stat.ML arXiv:2005.07036v6

@misc{2005.07036, author = {Xuewen Yao and Megan Micheletti and Mckensey Johnson and Edison Thomaz and Kaya de Barbaro}, title = {{I}nfant {C}rying {D}etection in {R}eal-{W}orld {E}nvironments}, year = {2020}, eprint = {2005.07036}, note = {arXiv:2005.07036v6} }
PDF
Most existing cry detection models have been tested with data collected in controlled settings. Thus, the extent to which they generalize to noisy and lived environments is unclear. In this paper, we evaluate several established machine learning approaches including a model leveraging both deep spectrum and acoustic features. This model was able to recognize crying events with F1 score 0.613 (Precision: 0.672, Recall: 0.552), showing improved external validity over existing methods at cry detection in everyday real-world settings. As part of our evaluation, we collect and annotate a novel dataset of infant crying compiled from over 780 hours of labeled real-world audio data, captured via recorders worn by infants in their homes, which we make publicly available. Our findings confirm that a cry detection model trained on in-lab data underperforms when presented with real-world data (in-lab test F1: 0.656, real-world test F1: 0.236), highlighting the value of our new dataset and model.
Quantifying the Chaos Level of Infants' Environment via Unsupervised Learning
Priyanka Khante, Mai Lee Chang, Domingo Martinez, Kaya de Barbaro, Edison Thomaz
Dec 11 2019 eess.AS cs.LG arXiv:1912.04844v1

@misc{1912.04844, author = {Priyanka Khante and Mai Lee Chang and Domingo Martinez and Kaya de Barbaro and Edison Thomaz}, title = {{Q}uantifying the {C}haos {L}evel of {I}nfants' {E}nvironment via {U}nsupervised {L}earning}, year = {2019}, eprint = {1912.04844}, note = {arXiv:1912.04844v1} }
PDF
Acoustic environments vary dramatically within the home setting. They can be a source of comfort and tranquility or chaos that can lead to less optimal cognitive development in children. Research to date has only subjectively measured household chaos. In this work, we use three unsupervised machine learning techniques to quantify household chaos in infants' homes. These unsupervised techniques include hierarchical clustering using K-Means, clustering using self-organizing map (SOM) and deep learning. We evaluated these techniques using data from 9 participants which is a total of 197 hours. Results show that these techniques are promising to quantify household chaos.
Audio-Based Activities of Daily Living (ADL) Recognition with Large-Scale Acoustic Embeddings from Online Videos
Dawei Liang, Edison Thomaz
Oct 23 2018 cs.HC cs.LG cs.SD eess.AS arXiv:1810.08691v2

@misc{1810.08691, author = {Dawei Liang and Edison Thomaz}, title = {{A}udio-{B}ased {A}ctivities of {D}aily {L}iving ({ADL}) {R}ecognition with {L}arge-{S}cale {A}coustic {E}mbeddings from {O}nline {V}ideos}, year = {2018}, eprint = {1810.08691}, note = {arXiv:1810.08691v2} }
PDF
Over the years, activity sensing and recognition has been shown to play a key enabling role in a wide range of applications, from sustainability and human-computer interaction to health care. While many recognition tasks have traditionally employed inertial sensors, acoustic-based methods offer the benefit of capturing rich contextual information, which can be useful when discriminating complex activities. Given the emergence of deep learning techniques and leveraging new, large-scaled multi-media datasets, this paper revisits the opportunity of training audio-based classifiers without the onerous and time-consuming task of annotating audio data. We propose a framework for audio-based activity recognition that makes use of millions of embedding features from public online video sound clips. Based on the combination of oversampling and deep learning approaches, our framework does not require further feature processing or outliers filtering as in prior work. We evaluated our approach in the context of Activities of Daily Living (ADL) by recognizing 15 everyday activities with 14 participants in their own homes, achieving 64.2% and 83.6% averaged within-subject accuracy in terms of top-1 and top-3 classification respectively. Individual class performance was also examined in the paper to further study the co-occurrence characteristics of the activities and the robustness of the framework.
Leveraging Context to Support Automated Food Recognition in Restaurants
Vinay Bettadapura, Edison Thomaz, Aman Parnami, Gregory Abowd, Irfan Essa
Oct 08 2015 cs.CV arXiv:1510.02078v1

@misc{1510.02078, author = {Vinay Bettadapura and Edison Thomaz and Aman Parnami and Gregory Abowd and Irfan Essa}, title = {{L}everaging {C}ontext to {S}upport {A}utomated {F}ood {R}ecognition in {R}estaurants}, year = {2015}, eprint = {1510.02078}, note = {arXiv:1510.02078v1} }
PDF
The pervasiveness of mobile cameras has resulted in a dramatic increase in food photos, which are pictures reflecting what people eat. In this paper, we study how taking pictures of what we eat in restaurants can be used for the purpose of automating food journaling. We propose to leverage the context of where the picture was taken, with additional information about the restaurant, available online, coupled with state-of-the-art computer vision techniques to recognize the food being consumed. To this end, we demonstrate image-based recognition of foods eaten in restaurants by training a classifier with images from restaurant's online menu databases. We evaluate the performance of our system in unconstrained, real-world settings with food images taken in 10 restaurants across 5 different types of food (American, Indian, Italian, Mexican and Thai).
Predicting Daily Activities From Egocentric Images Using Deep Learning
Daniel Castro, Steven Hickson, Vinay Bettadapura, Edison Thomaz, Gregory Abowd, Henrik Christensen, Irfan Essa
Oct 07 2015 cs.CV arXiv:1510.01576v1

@misc{1510.01576, author = {Daniel Castro and Steven Hickson and Vinay Bettadapura and Edison Thomaz and Gregory Abowd and Henrik Christensen and Irfan Essa}, title = {{P}redicting {D}aily {A}ctivities {F}rom {E}gocentric {I}mages {U}sing {D}eep {L}earning}, year = {2015}, eprint = {1510.01576}, howpublished = {ISWC '15 Proceedings of the 2015 ACM International Symposium on Wearable Computers - Pages 75-82}, doi = {10.1145/2802083.2808398}, note = {arXiv:1510.01576v1} }
PDF
We present a method to analyze images taken from a passive egocentric wearable camera along with the contextual information, such as time and day of week, to learn and predict everyday activities of an individual. We collected a dataset of 40,103 egocentric images over a 6 month period with 19 activity classes and demonstrate the benefit of state-of-the-art deep learning techniques for learning and predicting daily activities. Classification is conducted using a Convolutional Neural Network (CNN) with a classification method we introduce called a late fusion ensemble. This late fusion ensemble incorporates relevant contextual information and increases our classification accuracy. Our technique achieves an overall accuracy of 83.07% in predicting a person's activity across the 19 activity classes. We also demonstrate some promising results from two additional users by fine-tuning the classifier with one day of training data.