subscribe to arXiv mailings

Boundless: Generating Photorealistic Synthetic Data for Object Detection in Urban Streetscapes

Authors: Mehmet Kerem Turkcan, Yuyang Li, Chengbo Zang, Javad Ghaderi, Gil Zussman, Zoran Kostic

Abstract: We introduce Boundless, a photo-realistic synthetic data generation system for enabling highly accurate object detection in dense urban streetscapes. Boundless can replace massive real-world data collection and manual ground-truth object annotation (labeling) with an automated and configurable process. Boundless is based on the Unreal Engine 5 (UE5) City Sample project with improvements enabling a… ▽ More We introduce Boundless, a photo-realistic synthetic data generation system for enabling highly accurate object detection in dense urban streetscapes. Boundless can replace massive real-world data collection and manual ground-truth object annotation (labeling) with an automated and configurable process. Boundless is based on the Unreal Engine 5 (UE5) City Sample project with improvements enabling accurate collection of 3D bounding boxes across different lighting and scene variability conditions. We evaluate the performance of object detection models trained on the dataset generated by Boundless when used for inference on a real-world dataset acquired from medium-altitude cameras. We compare the performance of the Boundless-trained model against the CARLA-trained model and observe an improvement of 7.8 mAP. The results we achieved support the premise that synthetic data generation is a credible methodology for training/fine-tuning scalable object detection models for urban scenes. △ Less

Submitted 26 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

arXiv:2408.00943 [pdf, other]

Data-Driven Traffic Simulation for an Intersection in a Metropolis

Authors: Chengbo Zang, Mehmet Kerem Turkcan, Gil Zussman, Javad Ghaderi, Zoran Kostic

Abstract: We present a novel data-driven simulation environment for modeling traffic in metropolitan street intersections. Using real-world tracking data collected over an extended period of time, we train trajectory forecasting models to learn agent interactions and environmental constraints that are difficult to capture conventionally. Trajectories of new agents are first coarsely generated by sampling fr… ▽ More We present a novel data-driven simulation environment for modeling traffic in metropolitan street intersections. Using real-world tracking data collected over an extended period of time, we train trajectory forecasting models to learn agent interactions and environmental constraints that are difficult to capture conventionally. Trajectories of new agents are first coarsely generated by sampling from the spatial and temporal generative distributions, then refined using state-of-the-art trajectory forecasting models. The simulation can run either autonomously, or under explicit human control conditioned on the generative distributions. We present the experiments for a variety of model configurations. Under an iterative prediction scheme, the way-point-supervised TrajNet++ model obtained 0.36 Final Displacement Error (FDE) in 20 FPS on an NVIDIA A100 GPU. △ Less

Submitted 1 August, 2024; originally announced August 2024.

Comments: CVPR 2024 Workshop POETS Oral

arXiv:2404.16944 [pdf, other]

Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection

Authors: Mehmet Kerem Turkcan, Sanjeev Narasimhan, Chengbo Zang, Gyung Hyun Je, Bo Yu, Mahshid Ghasemi, Javad Ghaderi, Gil Zussman, Zoran Kostic

Abstract: We introduce Constellation, a dataset of 13K images suitable for research on detection of objects in dense urban streetscapes observed from high-elevation cameras, collected for a variety of temporal conditions. The dataset addresses the need for curated data to explore problems in small object detection exemplified by the limited pixel footprint of pedestrians observed tens of meters from above.… ▽ More We introduce Constellation, a dataset of 13K images suitable for research on detection of objects in dense urban streetscapes observed from high-elevation cameras, collected for a variety of temporal conditions. The dataset addresses the need for curated data to explore problems in small object detection exemplified by the limited pixel footprint of pedestrians observed tens of meters from above. It enables the testing of object detection models for variations in lighting, building shadows, weather, and scene dynamics. We evaluate contemporary object detection architectures on the dataset, observing that state-of-the-art methods have lower performance in detecting small pedestrians compared to vehicles, corresponding to a 10% difference in average precision (AP). Using structurally similar datasets for pretraining the models results in an increase of 1.8% mean AP (mAP). We further find that incorporating domain-specific data augmentations helps improve model performance. Using pseudo-labeled data, obtained from inference outcomes of the best-performing models, improves the performance of the models. Finally, comparing the models trained using the data collected in two different time intervals, we find a performance drift in models due to the changes in intersection conditions over time. The best-performing model achieves a pedestrian AP of 92.0% with 11.5 ms inference time on NVIDIA A100 GPUs, and an mAP of 95.4%. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2311.15372 [pdf, other]

doi 10.1109/TVCG.2023.3332647

Exploring Mid-Air Hand Interaction in Data Visualization

Authors: Zona Kostic, Catherine Dumas, Sarah Pratt, Johanna Beyer

Abstract: Interacting with data visualizations without an instrument or touch surface is typically characterized by the use of mid-air hand gestures. While mid-air expressions can be quite intuitive for interacting with digital content at a distance, they frequently lack precision and necessitate a different way of expressing users' data-related intentions. In this work, we aim to identify new designs for m… ▽ More Interacting with data visualizations without an instrument or touch surface is typically characterized by the use of mid-air hand gestures. While mid-air expressions can be quite intuitive for interacting with digital content at a distance, they frequently lack precision and necessitate a different way of expressing users' data-related intentions. In this work, we aim to identify new designs for mid-air hand gesture manipulations that can facilitate instrument-free, touch-free, and embedded interactions with visualizations, while utilizing the three-dimensional (3D) interaction space that mid-air gestures afford. We explore mid-air hand gestures for data visualization by searching for natural means to interact with content. We employ three studies - an Elicitation Study, a User Study, and an Expert Study, to provide insight into the users' mental models, explore the design space, and suggest considerations for future mid-air hand gesture design. In addition to forming strong associations with physical manipulations, we discovered that mid-air hand gestures can: promote space-multiplexed interaction, which allows for a greater degree of expression; play a functional role in visual cognition and comprehension; and enhance creativity and engagement. We further highlight the challenges that designers in this field may face to help set the stage for developing effective gestures for a wide range of touchless interactions with visualizations. △ Less

Submitted 26 November, 2023; originally announced November 2023.

arXiv:2310.00491 [pdf, other]

StreetNav: Leveraging Street Cameras to Support Precise Outdoor Navigation for Blind Pedestrians

Authors: Gaurav Jain, Basel Hindi, Zihao Zhang, Koushik Srinivasula, Mingyu Xie, Mahshid Ghasemi, Daniel Weiner, Sophie Ana Paris, Xin Yi Therese Xu, Michael Malcolm, Mehmet Turkcan, Javad Ghaderi, Zoran Kostic, Gil Zussman, Brian A. Smith

Abstract: Blind and low-vision (BLV) people rely on GPS-based systems for outdoor navigation. GPS's inaccuracy, however, causes them to veer off track, run into obstacles, and struggle to reach precise destinations. While prior work has made precise navigation possible indoors via hardware installations, enabling this outdoors remains a challenge. Interestingly, many outdoor environments are already instrum… ▽ More Blind and low-vision (BLV) people rely on GPS-based systems for outdoor navigation. GPS's inaccuracy, however, causes them to veer off track, run into obstacles, and struggle to reach precise destinations. While prior work has made precise navigation possible indoors via hardware installations, enabling this outdoors remains a challenge. Interestingly, many outdoor environments are already instrumented with hardware such as street cameras. In this work, we explore the idea of repurposing existing street cameras for outdoor navigation. Our community-driven approach considers both technical and sociotechnical concerns through engagements with various stakeholders: BLV users, residents, business owners, and Community Board leadership. The resulting system, StreetNav, processes a camera's video feed using computer vision and gives BLV pedestrians real-time navigation assistance. Our evaluations show that StreetNav guides users more precisely than GPS, but its technical performance is sensitive to environmental occlusions and distance from the camera. We discuss future implications for deploying such systems at scale. △ Less

Submitted 30 July, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

arXiv:2207.07243 [pdf, other]

LineCap: Line Charts for Data Visualization Captioning Models

Authors: Anita Mahinpei, Zona Kostic, Chris Tanner

Abstract: Data visualization captions help readers understand the purpose of a visualization and are crucial for individuals with visual impairments. The prevalence of poor figure captions and the successful application of deep learning approaches to image captioning motivate the use of similar techniques for automated figure captioning. However, research in this field has been stunted by the lack of suitab… ▽ More Data visualization captions help readers understand the purpose of a visualization and are crucial for individuals with visual impairments. The prevalence of poor figure captions and the successful application of deep learning approaches to image captioning motivate the use of similar techniques for automated figure captioning. However, research in this field has been stunted by the lack of suitable datasets. We introduce LineCap, a novel figure captioning dataset of 3,528 figures, and we provide insights from curating this dataset and using end-to-end deep learning models for automated figure captioning. △ Less

Submitted 14 July, 2022; originally announced July 2022.

arXiv:2205.01686 [pdf, other]

Smart City Intersections: Intelligence Nodes for Future Metropolises

Authors: Zoran Kostić, Alex Angus, Zhengye Yang, Zhuoxu Duan, Ivan Seskar, Gil Zussman, Dipankar Raychaudhuri

Abstract: Traffic intersections are the most suitable locations for the deployment of computing, communications, and intelligence services for smart cities of the future. The abundance of data to be collected and processed, in combination with privacy and security concerns, motivates the use of the edge-computing paradigm which aligns well with physical intersections in metropolises. This paper focuses on h… ▽ More Traffic intersections are the most suitable locations for the deployment of computing, communications, and intelligence services for smart cities of the future. The abundance of data to be collected and processed, in combination with privacy and security concerns, motivates the use of the edge-computing paradigm which aligns well with physical intersections in metropolises. This paper focuses on high-bandwidth, low-latency applications, and in that context it describes: (i) system design considerations for smart city intersection intelligence nodes; (ii) key technological components including sensors, networking, edge computing, low latency design, and AI-based intelligence; and (iii) applications such as privacy preservation, cloud-connected vehicles, a real-time "radar-screen", traffic management, and monitoring of pedestrian behavior during pandemics. The results of the experimental studies performed on the COSMOS testbed located in New York City are illustrated. Future challenges in designing human-centered smart city intersections are summarized. △ Less

Submitted 13 May, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

arXiv:2202.07296 [pdf, other]

Roomsemble: Progressive web application for intuitive property search

Authors: Chris Kottmyer, Kevin Zhao, Zona Kostic, Aleksandar Jevremovic

Abstract: A successful real estate search process involves locating a property that meets a user's search criteria subject to an allocated budget and time constraints. Many studies have investigated modeling housing prices over time. However, little is known about how a user's tastes influence their real estate search and purchase decisions. It is unknown what house a user would choose taking into account a… ▽ More A successful real estate search process involves locating a property that meets a user's search criteria subject to an allocated budget and time constraints. Many studies have investigated modeling housing prices over time. However, little is known about how a user's tastes influence their real estate search and purchase decisions. It is unknown what house a user would choose taking into account an individual's personal tastes, behaviors, and constraints, and, therefore, creating an algorithm that finds the perfect match. In this paper, we investigate the first step in understanding a user's tastes by building a system to capture personal preferences. We concentrated our research on real estate photos, being inspired by house aesthetics, which often motivates prospective buyers into considering a property as a candidate for purchase. We designed a system that takes a user-provided photo representing that person's personal taste and recommends properties similar to the photo available on the market. The user can additionally filter the recommendations by budget and location when conducting a property search. The paper describes the application's overall layout including frontend design and backend processes for locating a desired property. The proposed model, which serves as the application's core, was tested with 25 users, and the study's findings, as well as some key conclusions, are detailed in this paper. △ Less

Submitted 15 February, 2022; originally announced February 2022.

arXiv:2112.07159 [pdf, other]

Birds Eye View Social Distancing Analysis System

Authors: Zhengye Yang, Mingfei Sun, Hongzhe Ye, Zihao Xiong, Gil Zussman, Zoran Kostic

Abstract: Social distancing can reduce the infection rates in respiratory pandemics such as COVID-19. Traffic intersections are particularly suitable for monitoring and evaluation of social distancing behavior in metropolises. We propose and evaluate a privacy-preserving social distancing analysis system (B-SDA), which uses bird's-eye view video recordings of pedestrians who cross traffic intersections. We… ▽ More Social distancing can reduce the infection rates in respiratory pandemics such as COVID-19. Traffic intersections are particularly suitable for monitoring and evaluation of social distancing behavior in metropolises. We propose and evaluate a privacy-preserving social distancing analysis system (B-SDA), which uses bird's-eye view video recordings of pedestrians who cross traffic intersections. We devise algorithms for video pre-processing, object detection and tracking which are rooted in the known computer-vision and deep learning techniques, but modified to address the problem of detecting very small objects/pedestrians captured by a highly elevated camera. We propose a method for incorporating pedestrian grouping for detection of social distancing violations. B-SDA is used to compare pedestrian behavior based on pre-pandemic and pandemic videos in a major metropolitan area. The accomplished pedestrian detection performance is $63.0\%$ $AP_{50}$ and the tracking performance is $47.6\%$ MOTA. The social distancing violation rate of $15.6\%$ during the pandemic is notably lower than $31.4\%$ pre-pandemic baseline, indicating that pedestrians followed CDC-prescribed social distancing recommendations. The proposed system is suitable for deployment in real-world applications. △ Less

Submitted 9 February, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

arXiv:2107.07148 [pdf, other]

doi 10.1109/TMM.2020.2966890

What Image Features Boost Housing Market Predictions?

Authors: Zona Kostic, Aleksandar Jevremovic

Abstract: The attractiveness of a property is one of the most interesting, yet challenging, categories to model. Image characteristics are used to describe certain attributes, and to examine the influence of visual factors on the price or timeframe of the listing. In this paper, we propose a set of techniques for the extraction of visual features for efficient numerical inclusion in modern-day predictive al… ▽ More The attractiveness of a property is one of the most interesting, yet challenging, categories to model. Image characteristics are used to describe certain attributes, and to examine the influence of visual factors on the price or timeframe of the listing. In this paper, we propose a set of techniques for the extraction of visual features for efficient numerical inclusion in modern-day predictive algorithms. We discuss techniques such as Shannon's entropy, calculating the center of gravity, employing image segmentation, and using Convolutional Neural Networks. After comparing these techniques as applied to a set of property-related images (indoor, outdoor, and satellite), we conclude the following: (i) the entropy is the most efficient single-digit visual measure for housing price prediction; (ii) image segmentation is the most important visual feature for the prediction of housing lifespan; and (iii) deep image features can be used to quantify interior characteristics and contribute to captivation modeling. The set of 40 image features selected here carries a significant amount of predictive power and outperforms some of the strongest metadata predictors. Without any need to replace a human expert in a real-estate appraisal process, we conclude that the techniques presented in this paper can efficiently describe visible characteristics, thus introducing perceived attractiveness as a quantitative measure into the predictive modeling of housing. △ Less

Submitted 15 July, 2021; originally announced July 2021.

arXiv:2011.00329 [pdf, other]

Visual Companion for Booklovers

Authors: Zona Kostic, Jared Jessup, Jeffrey Baglioni, Nathan Weeks, Johann Philipp Dreessen, Ning Chen, Tianyu Liu

Abstract: An innumerable number of individual choices go into discovering a new book. There are unmistakably two groups of booklovers: those who like to search online, follow other people's latest readings, or simply react to a system's recommendations; and those who love to wander between library stacks, lose themselves behind bookstore shelves, or simply hide behind piles of (un)organized books. Depending… ▽ More An innumerable number of individual choices go into discovering a new book. There are unmistakably two groups of booklovers: those who like to search online, follow other people's latest readings, or simply react to a system's recommendations; and those who love to wander between library stacks, lose themselves behind bookstore shelves, or simply hide behind piles of (un)organized books. Depending on which group a person may fall into, there are two distinct and corresponding mediums that inform his or her choices: digital, that provides efficient retrieval of information online, and physical, a more tactile pursuit that leads to unexpected discoveries and promotes serendipity. How could we possibly bridge the gap between these seemingly disparate mediums into an integrated system that can amplify the benefits they both offer? In this paper, we present the BookVIS application, which uses book-related data and generates personalized visualizations to follow users in their quest for a new book. In this new redesigned version, the app brings associative visual connections to support intuitive exploration of easily retrieved digital information and its relationship with the physical book in hand. BookVIS keeps track of the user's reading preferences and generates a dataSelfie as an individual snapshot of a personal taste that grows over time. Usability testing has also been conducted and has demonstrated the app's ability to identify distinguishable patterns in readers' tastes that could be further used to communicate personal preferences in new "shelf-browsing" iterations. By efficiently supplementing the user's cognitive information needs while still supporting the spontaneity and enjoyment of the book browsing experience, BookVIS bridges the gap between real and online realms, and maximizes the engagement of personalized mobile visual clues. △ Less

Submitted 31 October, 2020; originally announced November 2020.

Comments: 9 pages, 8 figures

Showing 1–11 of 11 results for author: Kostic, Z