subscribe to arXiv mailings

Upper-Body Pose-based Gaze Estimation for Privacy-Preserving 3D Gaze Target Detection

Authors: Andrea Toaiari, Vittorio Murino, Marco Cristani, Cigdem Beyan

Abstract: Gaze Target Detection (GTD), i.e., determining where a person is looking within a scene from an external viewpoint, is a challenging task, particularly in 3D space. Existing approaches heavily rely on analyzing the person's appearance, primarily focusing on their face to predict the gaze target. This paper presents a novel approach to tackle this problem by utilizing the person's upper-body pose a… ▽ More Gaze Target Detection (GTD), i.e., determining where a person is looking within a scene from an external viewpoint, is a challenging task, particularly in 3D space. Existing approaches heavily rely on analyzing the person's appearance, primarily focusing on their face to predict the gaze target. This paper presents a novel approach to tackle this problem by utilizing the person's upper-body pose and available depth maps to extract a 3D gaze direction and employing a multi-stage or an end-to-end pipeline to predict the gazed target. When predicted accurately, the human body pose can provide valuable information about the head pose, which is a good approximation of the gaze direction, as well as the position of the arms and hands, which are linked to the activity the person is performing and the objects they are likely focusing on. Consequently, in addition to performing gaze estimation in 3D, we are also able to perform GTD simultaneously. We demonstrate state-of-the-art results on the most comprehensive publicly accessible 3D gaze target detection dataset without requiring images of the person's face, thus promoting privacy preservation in various application contexts. The code is available at https://github.com/intelligolabs/privacy-gtd-3D. △ Less

Submitted 26 September, 2024; originally announced September 2024.

Comments: Accepted in the T-CAP workshop at ECCV 2024

arXiv:2409.03879 [pdf, other]

Multi-Camera Industrial Open-Set Person Re-Identification and Tracking

Authors: Federico Cunico, Marco Cristani

Abstract: In recent years, the development of deep learning approaches for the task of person re-identification led to impressive results. However, this comes with a limitation for industrial and practical real-world applications. Firstly, most of the existing works operate on closed-world scenarios, in which the people to re-identify (probes) are compared to a closed-set (gallery). Real-world scenarios oft… ▽ More In recent years, the development of deep learning approaches for the task of person re-identification led to impressive results. However, this comes with a limitation for industrial and practical real-world applications. Firstly, most of the existing works operate on closed-world scenarios, in which the people to re-identify (probes) are compared to a closed-set (gallery). Real-world scenarios often are open-set problems in which the gallery is not known a priori, but the number of open-set approaches in the literature is significantly lower. Secondly, challenges such as multi-camera setups, occlusions, real-time requirements, etc., further constrain the applicability of off-the-shelf methods. This work presents MICRO-TRACK, a Modular Industrial multi-Camera Re_identification and Open-set Tracking system that is real-time, scalable, and easy to integrate into existing industrial surveillance scenarios. Furthermore, we release a novel Re-ID and tracking dataset acquired in an industrial manufacturing facility, dubbed Facility-ReID, consisting of 18-minute videos captured by 8 surveillance cameras. △ Less

Submitted 5 September, 2024; originally announced September 2024.

Comments: Accepted at T-CAP workshop at ECCV 2024

arXiv:2409.00774 [pdf, other]

SITUATE: Indoor Human Trajectory Prediction through Geometric Features and Self-Supervised Vision Representation

Authors: Luigi Capogrosso, Andrea Toaiari, Andrea Avogaro, Uzair Khan, Aditya Jivoji, Franco Fummi, Marco Cristani

Abstract: Patterns of human motion in outdoor and indoor environments are substantially different due to the scope of the environment and the typical intentions of people therein. While outdoor trajectory forecasting has received significant attention, indoor forecasting is still an underexplored research area. This paper proposes SITUATE, a novel approach to cope with indoor human trajectory prediction by… ▽ More Patterns of human motion in outdoor and indoor environments are substantially different due to the scope of the environment and the typical intentions of people therein. While outdoor trajectory forecasting has received significant attention, indoor forecasting is still an underexplored research area. This paper proposes SITUATE, a novel approach to cope with indoor human trajectory prediction by leveraging equivariant and invariant geometric features and a self-supervised vision representation. The geometric learning modules model the intrinsic symmetries and human movements inherent in indoor spaces. This concept becomes particularly important because self-loops at various scales and rapid direction changes often characterize indoor trajectories. On the other hand, the vision representation module is used to acquire spatial-semantic information about the environment to predict users' future locations more accurately. We evaluate our method through comprehensive experiments on the two most famous indoor trajectory forecasting datasets, i.e., THÖR and Supermarket, obtaining state-of-the-art performance. Furthermore, we also achieve competitive results in outdoor scenarios, showing that indoor-oriented forecasting models generalize better than outdoor-oriented ones. The source code is available at https://github.com/intelligolabs/SITUATE. △ Less

Submitted 1 September, 2024; originally announced September 2024.

Comments: Accepted at the 27th International Conference on Pattern Recognition (ICPR 2024)

arXiv:2408.11561 [pdf, other]

Self-Supervised Iterative Refinement for Anomaly Detection in Industrial Quality Control

Authors: Muhammad Aqeel, Shakiba Sharifi, Marco Cristani, Francesco Setti

Abstract: This study introduces the Iterative Refinement Process (IRP), a robust anomaly detection methodology designed for high-stakes industrial quality control. The IRP enhances defect detection accuracy through a cyclic data refinement strategy, iteratively removing misleading data points to improve model performance and robustness. We validate the IRP's effectiveness using two benchmark datasets, Kolek… ▽ More This study introduces the Iterative Refinement Process (IRP), a robust anomaly detection methodology designed for high-stakes industrial quality control. The IRP enhances defect detection accuracy through a cyclic data refinement strategy, iteratively removing misleading data points to improve model performance and robustness. We validate the IRP's effectiveness using two benchmark datasets, Kolektor SDD2 (KSDD2) and MVTec AD, covering a wide range of industrial products and defect types. Our experimental results demonstrate that the IRP consistently outperforms traditional anomaly detection models, particularly in environments with high noise levels. This study highlights the IRP's potential to significantly enhance anomaly detection processes in industrial settings, effectively managing the challenges of sparse and noisy data. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2407.11763 [pdf, other]

Enhancing Split Computing and Early Exit Applications through Predefined Sparsity

Authors: Luigi Capogrosso, Enrico Fraccaroli, Giulio Petrozziello, Francesco Setti, Samarjit Chakraborty, Franco Fummi, Marco Cristani

Abstract: In the past decade, Deep Neural Networks (DNNs) achieved state-of-the-art performance in a broad range of problems, spanning from object classification and action recognition to smart building and healthcare. The flexibility that makes DNNs such a pervasive technology comes at a price: the computational requirements preclude their deployment on most of the resource-constrained edge devices availab… ▽ More In the past decade, Deep Neural Networks (DNNs) achieved state-of-the-art performance in a broad range of problems, spanning from object classification and action recognition to smart building and healthcare. The flexibility that makes DNNs such a pervasive technology comes at a price: the computational requirements preclude their deployment on most of the resource-constrained edge devices available today to solve real-time and real-world tasks. This paper introduces a novel approach to address this challenge by combining the concept of predefined sparsity with Split Computing (SC) and Early Exit (EE). In particular, SC aims at splitting a DNN with a part of it deployed on an edge device and the rest on a remote server. Instead, EE allows the system to stop using the remote server and rely solely on the edge device's computation if the answer is already good enough. Specifically, how to apply such a predefined sparsity to a SC and EE paradigm has never been studied. This paper studies this problem and shows how predefined sparsity significantly reduces the computational, storage, and energy burdens during the training and inference phases, regardless of the hardware platform. This makes it a valuable approach for enhancing the performance of SC and EE applications. Experimental results showcase reductions exceeding 4x in storage and computational complexity without compromising performance. The source code is available at https://github.com/intelligolabs/sparsity_sc_ee. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: Accepted at the 27th Forum on specification and Design Languages (FDL 2024)

arXiv:2407.05982 [pdf, other]

MTL-Split: Multi-Task Learning for Edge Devices using Split Computing

Authors: Luigi Capogrosso, Enrico Fraccaroli, Samarjit Chakraborty, Franco Fummi, Marco Cristani

Abstract: Split Computing (SC), where a Deep Neural Network (DNN) is intelligently split with a part of it deployed on an edge device and the rest on a remote server is emerging as a promising approach. It allows the power of DNNs to be leveraged for latency-sensitive applications that do not allow the entire DNN to be deployed remotely, while not having sufficient computation bandwidth available locally. I… ▽ More Split Computing (SC), where a Deep Neural Network (DNN) is intelligently split with a part of it deployed on an edge device and the rest on a remote server is emerging as a promising approach. It allows the power of DNNs to be leveraged for latency-sensitive applications that do not allow the entire DNN to be deployed remotely, while not having sufficient computation bandwidth available locally. In many such embedded systems scenarios, such as those in the automotive domain, computational resource constraints also necessitate Multi-Task Learning (MTL), where the same DNN is used for multiple inference tasks instead of having dedicated DNNs for each task, which would need more computing bandwidth. However, how to partition such a multi-tasking DNN to be deployed within a SC framework has not been sufficiently studied. This paper studies this problem, and MTL-Split, our novel proposed architecture, shows encouraging results on both synthetic and real-world data. The source code is available at https://github.com/intelligolabs/MTL-Split. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: Accepted at the 61st Design Automation Conference (DAC 2024)

arXiv:2407.03961 [pdf, other]

Leveraging Latent Diffusion Models for Training-Free In-Distribution Data Augmentation for Surface Defect Detection

Authors: Federico Girella, Ziyue Liu, Franco Fummi, Francesco Setti, Marco Cristani, Luigi Capogrosso

Abstract: Defect detection is the task of identifying defects in production samples. Usually, defect detection classifiers are trained on ground-truth data formed by normal samples (negative data) and samples with defects (positive data), where the latter are consistently fewer than normal samples. State-of-the-art data augmentation procedures add synthetic defect data by superimposing artifacts to normal s… ▽ More Defect detection is the task of identifying defects in production samples. Usually, defect detection classifiers are trained on ground-truth data formed by normal samples (negative data) and samples with defects (positive data), where the latter are consistently fewer than normal samples. State-of-the-art data augmentation procedures add synthetic defect data by superimposing artifacts to normal samples to mitigate problems related to unbalanced training data. These techniques often produce out-of-distribution images, resulting in systems that learn what is not a normal sample but cannot accurately identify what a defect looks like. In this work, we introduce DIAG, a training-free Diffusion-based In-distribution Anomaly Generation pipeline for data augmentation. Unlike conventional image generation techniques, we implement a human-in-the-loop pipeline, where domain experts provide multimodal guidance to the model through text descriptions and region localization of the possible anomalies. This strategic shift enhances the interpretability of results and fosters a more robust human feedback loop, facilitating iterative improvements of the generated outputs. Remarkably, our approach operates in a zero-shot manner, avoiding time-consuming fine-tuning procedures while achieving superior performance. We demonstrate the efficacy and versatility of DIAG with respect to state-of-the-art data augmentation approaches on the challenging KSDD2 dataset, with an improvement in AP of approximately 18% when positive samples are available and 28% when they are missing. The source code is available at https://github.com/intelligolabs/DIAG. △ Less

Submitted 11 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

Comments: Accepted at the 21st International Conference on Content-Based Multimedia Indexing (CBMI 2024)

arXiv:2406.05080 [pdf, other]

I2EDL: Interactive Instruction Error Detection and Localization

Authors: Francesco Taioli, Stefano Rosa, Alberto Castellini, Lorenzo Natale, Alessio Del Bue, Alessandro Farinelli, Marco Cristani, Yiming Wang

Abstract: In the Vision-and-Language Navigation in Continuous Environments (VLN-CE) task, the human user guides an autonomous agent to reach a target goal via a series of low-level actions following a textual instruction in natural language. However, most existing methods do not address the likely case where users may make mistakes when providing such instruction (e.g. "turn left" instead of "turn right").… ▽ More In the Vision-and-Language Navigation in Continuous Environments (VLN-CE) task, the human user guides an autonomous agent to reach a target goal via a series of low-level actions following a textual instruction in natural language. However, most existing methods do not address the likely case where users may make mistakes when providing such instruction (e.g. "turn left" instead of "turn right"). In this work, we address a novel task of Interactive VLN in Continuous Environments (IVLN-CE), which allows the agent to interact with the user during the VLN-CE navigation to verify any doubts regarding the instruction errors. We propose an Interactive Instruction Error Detector and Localizer (I2EDL) that triggers the user-agent interaction upon the detection of instruction errors during the navigation. We leverage a pre-trained module to detect instruction errors and pinpoint them in the instruction by cross-referencing the textual input and past observations. In such way, the agent is able to query the user for a timely correction, without demanding the user's cognitive load, as we locate the probable errors to a precise part of the instruction. We evaluate the proposed I2EDL on a dataset of instructions containing errors, and further devise a novel metric, the Success weighted by Interaction Number (SIN), to reflect both the navigation performance and the interaction effectiveness. We show how the proposed method can ask focused requests for corrections to the user, which in turn increases the navigation success, while minimizing the interactions. △ Less

Submitted 23 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

Comments: Accepted at IEEE RO-MAN 2024

arXiv:2406.00501 [pdf, other]

Diffusion-based Image Generation for In-distribution Data Augmentation in Surface Defect Detection

Authors: Luigi Capogrosso, Federico Girella, Francesco Taioli, Michele Dalla Chiara, Muhammad Aqeel, Franco Fummi, Francesco Setti, Marco Cristani

Abstract: In this study, we show that diffusion models can be used in industrial scenarios to improve the data augmentation procedure in the context of surface defect detection. In general, defect detection classifiers are trained on ground-truth data formed by normal samples (negative data) and samples with defects (positive data), where the latter are consistently fewer than normal samples. For these reas… ▽ More In this study, we show that diffusion models can be used in industrial scenarios to improve the data augmentation procedure in the context of surface defect detection. In general, defect detection classifiers are trained on ground-truth data formed by normal samples (negative data) and samples with defects (positive data), where the latter are consistently fewer than normal samples. For these reasons, state-of-the-art data augmentation procedures add synthetic defect data by superimposing artifacts to normal samples. This leads to out-of-distribution augmented data so that the classification system learns what is not a normal sample but does not know what a defect really is. We show that diffusion models overcome this situation, providing more realistic in-distribution defects so that the model can learn the defect's genuine appearance. We propose a novel approach for data augmentation that mixes out-of-distribution with in-distribution samples, which we call In&Out. The approach can deal with two data augmentation setups: i) when no defects are available (zero-shot data augmentation) and ii) when defects are available, which can be in a small number (few-shot) or a large one (full-shot). We focus the experimental part on the most challenging benchmark in the state-of-the-art, i.e., the Kolektor Surface-Defect Dataset 2, defining the new state-of-the-art classification AP score under weak supervision of .782. The code is available at https://github.com/intelligolabs/in_and_out. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: Accepted at the 19th International Conference on Computer Vision Theory and Applications (VISAPP 2024)

arXiv:2403.14447 [pdf, other]

Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset

Authors: Andrea Avogaro, Andrea Toaiari, Federico Cunico, Xiangmin Xu, Haralambos Dafas, Alessandro Vinciarelli, Emma Li, Marco Cristani

Abstract: We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and Spot, the quadruped robot manufactured by Boston Dynamics. The key-novelty is the focus on the robot's perspective, i.e., on the data captured by the robot's sensors. These make 3D body pose analysis challenging because being close to the ground captures humans only partially. The… ▽ More We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and Spot, the quadruped robot manufactured by Boston Dynamics. The key-novelty is the focus on the robot's perspective, i.e., on the data captured by the robot's sensors. These make 3D body pose analysis challenging because being close to the ground captures humans only partially. The scenario underlying HARPER includes 15 actions, of which 10 involve physical contact between the robot and users. The Corpus contains not only the recordings of the built-in stereo cameras of Spot, but also those of a 6-camera OptiTrack system (all recordings are synchronized). This leads to ground-truth skeletal representations with a precision lower than a millimeter. In addition, the Corpus includes reproducible benchmarks on 3D Human Pose Estimation, Human Pose Forecasting, and Collision Prediction, all based on publicly available baseline approaches. This enables future HARPER users to rigorously compare their results with those we provide in this work. △ Less

Submitted 23 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.10700 [pdf, other]

Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation

Authors: Francesco Taioli, Stefano Rosa, Alberto Castellini, Lorenzo Natale, Alessio Del Bue, Alessandro Farinelli, Marco Cristani, Yiming Wang

Abstract: Vision-and-Language Navigation in Continuous Environments (VLN-CE) is one of the most intuitive yet challenging embodied AI tasks. Agents are tasked to navigate towards a target goal by executing a set of low-level actions, following a series of natural language instructions. All VLN-CE methods in the literature assume that language instructions are exact. However, in practice, instructions given… ▽ More Vision-and-Language Navigation in Continuous Environments (VLN-CE) is one of the most intuitive yet challenging embodied AI tasks. Agents are tasked to navigate towards a target goal by executing a set of low-level actions, following a series of natural language instructions. All VLN-CE methods in the literature assume that language instructions are exact. However, in practice, instructions given by humans can contain errors when describing a spatial environment due to inaccurate memory or confusion. Current VLN-CE benchmarks do not address this scenario, making the state-of-the-art methods in VLN-CE fragile in the presence of erroneous instructions from human users. For the first time, we propose a novel benchmark dataset that introduces various types of instruction errors considering potential human causes. This benchmark provides valuable insight into the robustness of VLN systems in continuous environments. We observe a noticeable performance drop (up to -25%) in Success Rate when evaluating the state-of-the-art VLN-CE methods on our benchmark. Moreover, we formally define the task of Instruction Error Detection and Localization, and establish an evaluation protocol on top of our benchmark dataset. We also propose an effective method, based on a cross-modal transformer architecture, that achieves the best performance in error detection and localization, compared to baselines. Surprisingly, our proposed method has revealed errors in the validation set of the two commonly used datasets for VLN-CE, i.e., R2R-CE and RxR-CE, demonstrating the utility of our technique in other tasks. Code and dataset will be made available upon acceptance at https://intelligolabs.github.io/R2RIE-CE △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: 3 figures, 8 pages

arXiv:2310.09278 [pdf, other]

Disentangled Latent Spaces Facilitate Data-Driven Auxiliary Learning

Authors: Geri Skenderi, Luigi Capogrosso, Andrea Toaiari, Matteo Denitto, Franco Fummi, Simone Melzi, Marco Cristani

Abstract: In deep learning, auxiliary objectives are often used to facilitate learning in situations where data is scarce, or the principal task is extremely complex. This idea is primarily inspired by the improved generalization capability induced by solving multiple tasks simultaneously, which leads to a more robust shared representation. Nevertheless, finding optimal auxiliary tasks that give rise to the… ▽ More In deep learning, auxiliary objectives are often used to facilitate learning in situations where data is scarce, or the principal task is extremely complex. This idea is primarily inspired by the improved generalization capability induced by solving multiple tasks simultaneously, which leads to a more robust shared representation. Nevertheless, finding optimal auxiliary tasks that give rise to the desired improvement is a crucial problem that often requires hand-crafted solutions or expensive meta-learning approaches. In this paper, we propose a novel framework, dubbed Detaux, whereby a weakly supervised disentanglement procedure is used to discover new unrelated classification tasks and the associated labels that can be exploited with the principal task in any Multi-Task Learning (MTL) model. The disentanglement procedure works at a representation level, isolating a subspace related to the principal task, plus an arbitrary number of orthogonal subspaces. In the most disentangled subspaces, through a clustering procedure, we generate the additional classification tasks, and the associated labels become their representatives. Subsequently, the original data, the labels associated with the principal task, and the newly discovered ones can be fed into any MTL framework. Extensive validation on both synthetic and real data, along with various ablation studies, demonstrate promising results, revealing the potential in what has been, so far, an unexplored connection between learning disentangled representations and MTL. The code will be made publicly available upon acceptance. △ Less

Submitted 13 October, 2023; originally announced October 2023.

Comments: Under review in Pattern Recognition Letters

arXiv:2309.16014 [pdf, other]

Graph-level Representation Learning with Joint-Embedding Predictive Architectures

Authors: Geri Skenderi, Hang Li, Jiliang Tang, Marco Cristani

Abstract: Joint-Embedding Predictive Architectures (JEPAs) have recently emerged as a novel and powerful technique for self-supervised representation learning. They aim to learn an energy-based model by predicting the latent representation of a target signal y from the latent representation of a context signal x. JEPAs bypass the need for negative and positive samples, traditionally required by contrastive… ▽ More Joint-Embedding Predictive Architectures (JEPAs) have recently emerged as a novel and powerful technique for self-supervised representation learning. They aim to learn an energy-based model by predicting the latent representation of a target signal y from the latent representation of a context signal x. JEPAs bypass the need for negative and positive samples, traditionally required by contrastive learning while avoiding the overfitting issues associated with generative pretraining. In this paper, we show that graph-level representations can be effectively modeled using this paradigm by proposing a Graph Joint-Embedding Predictive Architecture (Graph-JEPA). In particular, we employ masked modeling and focus on predicting the latent representations of masked subgraphs starting from the latent representation of a context subgraph. To endow the representations with the implicit hierarchy that is often present in graph-level concepts, we devise an alternative prediction objective that consists of predicting the coordinates of the encoded subgraphs on the unit hyperbola in the 2D plane. Through multiple experimental evaluations, we show that Graph-JEPA can learn highly semantic and expressive representations, as shown by the downstream performance in graph classification, regression, and distinguishing non-isomorphic graphs. The code will be made available upon acceptance. △ Less

Submitted 23 June, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

Comments: Preprint. Under Review

arXiv:2309.11932 [pdf, other]

A Machine Learning-oriented Survey on Tiny Machine Learning

Authors: Luigi Capogrosso, Federico Cunico, Dong Seon Cheng, Franco Fummi, Marco Cristani

Abstract: The emergence of Tiny Machine Learning (TinyML) has positively revolutionized the field of Artificial Intelligence by promoting the joint design of resource-constrained IoT hardware devices and their learning-based software architectures. TinyML carries an essential role within the fourth and fifth industrial revolutions in helping societies, economies, and individuals employ effective AI-infused… ▽ More The emergence of Tiny Machine Learning (TinyML) has positively revolutionized the field of Artificial Intelligence by promoting the joint design of resource-constrained IoT hardware devices and their learning-based software architectures. TinyML carries an essential role within the fourth and fifth industrial revolutions in helping societies, economies, and individuals employ effective AI-infused computing technologies (e.g., smart cities, automotive, and medical robotics). Given its multidisciplinary nature, the field of TinyML has been approached from many different angles: this comprehensive survey wishes to provide an up-to-date overview focused on all the learning algorithms within TinyML-based solutions. The survey is based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodological flow, allowing for a systematic and complete literature survey. In particular, firstly we will examine the three different workflows for implementing a TinyML-based system, i.e., ML-oriented, HW-oriented, and co-design. Secondly, we propose a taxonomy that covers the learning panorama under the TinyML lens, examining in detail the different families of model optimization and design, as well as the state-of-the-art learning techniques. Thirdly, this survey will present the distinct features of hardware devices and software tools that represent the current state-of-the-art for TinyML intelligent edge applications. Finally, we discuss the challenges and future directions. △ Less

Submitted 26 September, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: Article currently under review at IEEE Access

arXiv:2308.08854 [pdf, other]

Language-enhanced RNR-Map: Querying Renderable Neural Radiance Field maps with natural language

Authors: Francesco Taioli, Federico Cunico, Federico Girella, Riccardo Bologna, Alessandro Farinelli, Marco Cristani

Abstract: We present Le-RNR-Map, a Language-enhanced Renderable Neural Radiance map for Visual Navigation with natural language query prompts. The recently proposed RNR-Map employs a grid structure comprising latent codes positioned at each pixel. These latent codes, which are derived from image observation, enable: i) image rendering given a camera pose, since they are converted to Neural Radiance Field; i… ▽ More We present Le-RNR-Map, a Language-enhanced Renderable Neural Radiance map for Visual Navigation with natural language query prompts. The recently proposed RNR-Map employs a grid structure comprising latent codes positioned at each pixel. These latent codes, which are derived from image observation, enable: i) image rendering given a camera pose, since they are converted to Neural Radiance Field; ii) image navigation and localization with astonishing accuracy. On top of this, we enhance RNR-Map with CLIP-based embedding latent codes, allowing natural language search without additional label data. We evaluate the effectiveness of this map in single and multi-object searches. We also investigate its compatibility with a Large Language Model as an "affordance query resolver". Code and videos are available at https://intelligolabs.github.io/Le-RNR-Map/ △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: Accepted at ICCVW23 VLAR

arXiv:2307.06975 [pdf, other]

Neuro-symbolic Empowered Denoising Diffusion Probabilistic Models for Real-time Anomaly Detection in Industry 4.0

Authors: Luigi Capogrosso, Alessio Mascolini, Federico Girella, Geri Skenderi, Sebastiano Gaiardelli, Nicola Dall'Ora, Francesco Ponzio, Enrico Fraccaroli, Santa Di Cataldo, Sara Vinco, Enrico Macii, Franco Fummi, Marco Cristani

Abstract: Industry 4.0 involves the integration of digital technologies, such as IoT, Big Data, and AI, into manufacturing and industrial processes to increase efficiency and productivity. As these technologies become more interconnected and interdependent, Industry 4.0 systems become more complex, which brings the difficulty of identifying and stopping anomalies that may cause disturbances in the manufactu… ▽ More Industry 4.0 involves the integration of digital technologies, such as IoT, Big Data, and AI, into manufacturing and industrial processes to increase efficiency and productivity. As these technologies become more interconnected and interdependent, Industry 4.0 systems become more complex, which brings the difficulty of identifying and stopping anomalies that may cause disturbances in the manufacturing process. This paper aims to propose a diffusion-based model for real-time anomaly prediction in Industry 4.0 processes. Using a neuro-symbolic approach, we integrate industrial ontologies in the model, thereby adding formal knowledge on smart manufacturing. Finally, we propose a simple yet effective way of distilling diffusion models through Random Fourier Features for deployment on an embedded system for direct integration into the manufacturing process. To the best of our knowledge, this approach has never been explored before. △ Less

Submitted 18 July, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

Comments: Accepted at the 26th Forum on specification and Design Languages (FDL 2023)

arXiv:2305.00039 [pdf, other]

HermesBDD: A Multi-Core and Multi-Platform Binary Decision Diagram Package

Authors: Luigi Capogrosso, Luca Geretti, Marco Cristani, Franco Fummi, Tiziano Villa

Abstract: BDDs are representations of a Boolean expression in the form of a directed acyclic graph. BDDs are widely used in several fields, particularly in model checking and hardware verification. There are several implementations for BDD manipulation, where each package differs depending on the application. This paper presents HermesBDD: a novel multi-core and multi-platform binary decision diagram packag… ▽ More BDDs are representations of a Boolean expression in the form of a directed acyclic graph. BDDs are widely used in several fields, particularly in model checking and hardware verification. There are several implementations for BDD manipulation, where each package differs depending on the application. This paper presents HermesBDD: a novel multi-core and multi-platform binary decision diagram package focused on high performance and usability. HermesBDD supports a static and dynamic memory management mechanism, the possibility to exploit lock-free hash tables, and a simple parallel implementation of the If-Then-Else procedure based on a higher-level wrapper for threads and futures. HermesBDD is completely written in C++ with no need to rely on external libraries and is developed according to software engineering principles for reliability and easy maintenance over time. We provide experimental results on the n-Queens problem, the de-facto SAT solver benchmark for BDDs, demonstrating a significant speedup of 18.73x over our non-parallel baselines, and a remarkable performance boost w.r.t. other state-of-the-art BDDs packages. △ Less

Submitted 22 March, 2023; originally announced May 2023.

Comments: 26th International Symposium on Design and Diagnostics of Electronic Circuits and Systems (DDECS)

arXiv:2304.05956 [pdf, other]

OO-dMVMT: A Deep Multi-view Multi-task Classification Framework for Real-time 3D Hand Gesture Classification and Segmentation

Authors: Federico Cunico, Federico Girella, Andrea Avogaro, Marco Emporio, Andrea Giachetti, Marco Cristani

Abstract: Continuous mid-air hand gesture recognition based on captured hand pose streams is fundamental for human-computer interaction, particularly in AR / VR. However, many of the methods proposed to recognize heterogeneous hand gestures are tested only on the classification task, and the real-time low-latency gesture segmentation in a continuous stream is not well addressed in the literature. For this t… ▽ More Continuous mid-air hand gesture recognition based on captured hand pose streams is fundamental for human-computer interaction, particularly in AR / VR. However, many of the methods proposed to recognize heterogeneous hand gestures are tested only on the classification task, and the real-time low-latency gesture segmentation in a continuous stream is not well addressed in the literature. For this task, we propose the On-Off deep Multi-View Multi-Task paradigm (OO-dMVMT). The idea is to exploit multiple time-local views related to hand pose and movement to generate rich gesture descriptions, along with using heterogeneous tasks to achieve high accuracy. OO-dMVMT extends the classical MVMT paradigm, where all of the multiple tasks have to be active at each time, by allowing specific tasks to switch on/off depending on whether they can apply to the input. We show that OO-dMVMT defines the new SotA on continuous/online 3D skeleton-based gesture recognition in terms of gesture classification accuracy, segmentation accuracy, false positives, and decision latency while maintaining real-time operation. △ Less

Submitted 12 April, 2023; originally announced April 2023.

Comments: Accepted to the Computer Vision for Mixed Reality workshop at CVPR 2023

arXiv:2303.12524 [pdf, other]

Split-Et-Impera: A Framework for the Design of Distributed Deep Learning Applications

Authors: Luigi Capogrosso, Federico Cunico, Michele Lora, Marco Cristani, Franco Fummi, Davide Quaglia

Abstract: Many recent pattern recognition applications rely on complex distributed architectures in which sensing and computational nodes interact together through a communication network. Deep neural networks (DNNs) play an important role in this scenario, furnishing powerful decision mechanisms, at the price of a high computational effort. Consequently, powerful state-of-the-art DNNs are frequently split… ▽ More Many recent pattern recognition applications rely on complex distributed architectures in which sensing and computational nodes interact together through a communication network. Deep neural networks (DNNs) play an important role in this scenario, furnishing powerful decision mechanisms, at the price of a high computational effort. Consequently, powerful state-of-the-art DNNs are frequently split over various computational nodes, e.g., a first part stays on an embedded device and the rest on a server. Deciding where to split a DNN is a challenge in itself, making the design of deep learning applications even more complicated. Therefore, we propose Split-Et-Impera, a novel and practical framework that i) determines the set of the best-split points of a neural network based on deep network interpretability principles without performing a tedious try-and-test approach, ii) performs a communication-aware simulation for the rapid evaluation of different neural network rearrangements, and iii) suggests the best match between the quality of service requirements of the application and the performance in terms of accuracy and latency time. △ Less

Submitted 22 March, 2023; originally announced March 2023.

Comments: 26th International Symposium on Design and Diagnostics of Electronic Circuits and Systems (DDECS)

arXiv:2303.03155 [pdf, other]

Unsupervised Active Visual Search with Monte Carlo planning under Uncertain Detections

Authors: Francesco Taioli, Francesco Giuliari, Yiming Wang, Riccardo Berra, Alberto Castellini, Alessio Del Bue, Alessandro Farinelli, Marco Cristani, Francesco Setti

Abstract: We propose a solution for Active Visual Search of objects in an environment, whose 2D floor map is the only known information. Our solution has three key features that make it more plausible and robust to detector failures compared to state-of-the-art methods: (i) it is unsupervised as it does not need any training sessions. (ii) During the exploration, a probability distribution on the 2D floor m… ▽ More We propose a solution for Active Visual Search of objects in an environment, whose 2D floor map is the only known information. Our solution has three key features that make it more plausible and robust to detector failures compared to state-of-the-art methods: (i) it is unsupervised as it does not need any training sessions. (ii) During the exploration, a probability distribution on the 2D floor map is updated according to an intuitive mechanism, while an improved belief update increases the effectiveness of the agent's exploration. (iii) We incorporate the awareness that an object detector may fail into the aforementioned probability modelling by exploiting the success statistics of a specific detector. Our solution is dubbed POMP-BE-PD (Pomcp-based Online Motion Planning with Belief by Exploration and Probabilistic Detection). It uses the current pose of an agent and an RGB-D observation to learn an optimal search policy, exploiting a POMDP solved by a Monte-Carlo planning approach. On the Active Vision Database benchmark, we increase the average success rate over all the environments by a significant 35% while decreasing the average path length by 4% with respect to competing methods. Thus, our results are state-of-the-art, even without using any training procedure. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: 12 pages,8 figures. Submitted for review at IEEE Transactions on Pattern Analysis and Machine Intelligence. arXiv admin note: text overlap with arXiv:2009.08140

arXiv:2211.13061 [pdf, other]

doi 10.1007/978-3-031-37660-3_4

A Masked Face Classification Benchmark on Low-Resolution Surveillance Images

Authors: Federico Cunico, Andrea Toaiari, Marco Cristani

Abstract: We propose a novel image dataset focused on tiny faces wearing face masks for mask classification purposes, dubbed Small Face MASK (SF-MASK), composed of a collection made from 20k low-resolution images exported from diverse and heterogeneous datasets, ranging from 7 x 7 to 64 x 64 pixel resolution. An accurate visualization of this collection, through counting grids, made it possible to highlight… ▽ More We propose a novel image dataset focused on tiny faces wearing face masks for mask classification purposes, dubbed Small Face MASK (SF-MASK), composed of a collection made from 20k low-resolution images exported from diverse and heterogeneous datasets, ranging from 7 x 7 to 64 x 64 pixel resolution. An accurate visualization of this collection, through counting grids, made it possible to highlight gaps in the variety of poses assumed by the heads of the pedestrians. In particular, faces filmed by very high cameras, in which the facial features appear strongly skewed, are absent. To address this structural deficiency, we produced a set of synthetic images which resulted in a satisfactory covering of the intra-class variance. Furthermore, a small subsample of 1701 images contains badly worn face masks, opening to multi-class classification challenges. Experiments on SF-MASK focus on face mask classification using several classifiers. Results show that the richness of SF-MASK (real + synthetic images) leads all of the tested classifiers to perform better than exploiting comparative face mask datasets, on a fixed 1077 images testing set. Dataset and evaluation code are publicly available here: https://github.com/HumaticsLAB/sf-mask △ Less

Submitted 3 August, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

Comments: 15 pages, 7 figures. Accepted at T-CAP workshop @ ICPR 2022

arXiv:2211.04798 [pdf, other]

On the use of learning-based forecasting methods for ameliorating fashion business processes: A position paper

Authors: Geri Skenderi, Christian Joppi, Matteo Denitto, Marco Cristani

Abstract: The fashion industry is one of the most active and competitive markets in the world, manufacturing millions of products and reaching large audiences every year. A plethora of business processes are involved in this large-scale industry, but due to the generally short life-cycle of clothing items, supply-chain management and retailing strategies are crucial for good market performance. Correctly un… ▽ More The fashion industry is one of the most active and competitive markets in the world, manufacturing millions of products and reaching large audiences every year. A plethora of business processes are involved in this large-scale industry, but due to the generally short life-cycle of clothing items, supply-chain management and retailing strategies are crucial for good market performance. Correctly understanding the wants and needs of clients, managing logistic issues and marketing the correct products are high-level problems with a lot of uncertainty associated to them given the number of influencing factors, but most importantly due to the unpredictability often associated with the future. It is therefore straightforward that forecasting methods, which generate predictions of the future, are indispensable in order to ameliorate all the various business processes that deal with the true purpose and meaning of fashion: having a lot of people wear a particular product or style, rendering these items, people and consequently brands fashionable. In this paper, we provide an overview of three concrete forecasting tasks that any fashion company can apply in order to improve their industrial and market impact. We underline advances and issues in all three tasks and argue about their importance and the impact they can have at an industrial level. Finally, we highlight issues and directions of future work, reflecting on how learning-based forecasting methods can further aid the fashion industry. △ Less

Submitted 9 November, 2022; originally announced November 2022.

Comments: 2nd International Workshop on Industrial Machine Learning @ ICPR 2022

arXiv:2211.00562 [pdf, other]

Leveraging commonsense for object localisation in partial scenes

Authors: Francesco Giuliari, Geri Skenderi, Marco Cristani, Alessio Del Bue, Yiming Wang

Abstract: We propose an end-to-end solution to address the problem of object localisation in partial scenes, where we aim to estimate the position of an object in an unknown area given only a partial 3D scan of the scene. We propose a novel scene representation to facilitate the geometric reasoning, Directed Spatial Commonsense Graph (D-SCG), a spatial scene graph that is enriched with additional concept no… ▽ More We propose an end-to-end solution to address the problem of object localisation in partial scenes, where we aim to estimate the position of an object in an unknown area given only a partial 3D scan of the scene. We propose a novel scene representation to facilitate the geometric reasoning, Directed Spatial Commonsense Graph (D-SCG), a spatial scene graph that is enriched with additional concept nodes from a commonsense knowledge base. Specifically, the nodes of D-SCG represent the scene objects and the edges are their relative positions. Each object node is then connected via different commonsense relationships to a set of concept nodes. With the proposed graph-based scene representation, we estimate the unknown position of the target object using a Graph Neural Network that implements a novel attentional message passing mechanism. The network first predicts the relative positions between the target object and each visible object by learning a rich representation of the objects via aggregating both the object nodes and the concept nodes in D-SCG. These relative positions then are merged to obtain the final position. We evaluate our method using Partial ScanNet, improving the state-of-the-art by 5.9% in terms of the localisation accuracy at a 8x faster training speed. △ Less

Submitted 1 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2203.05380

arXiv:2209.12655 [pdf, ps, other]

Deontic Meta-Rules

Authors: Francesco Olivieri, Guido Governatori, Matteo Cristani, Antonino Rotolo, Abdul Sattar

Abstract: The use of meta-rules in logic, i.e., rules whose content includes other rules, has recently gained attention in the setting of non-monotonic reasoning: a first logical formalisation and efficient algorithms to compute the (meta)-extensions of such theories were proposed in Olivieri et al (2021) This work extends such a logical framework by considering the deontic aspect. The resulting logic will… ▽ More The use of meta-rules in logic, i.e., rules whose content includes other rules, has recently gained attention in the setting of non-monotonic reasoning: a first logical formalisation and efficient algorithms to compute the (meta)-extensions of such theories were proposed in Olivieri et al (2021) This work extends such a logical framework by considering the deontic aspect. The resulting logic will not just be able to model policies but also tackle well-known aspects that occur in numerous legal systems. The use of Defeasible Logic (DL) to model meta-rules in the application area we just alluded to has been investigated. Within this line of research, the study mentioned above was not focusing on the general computational properties of meta-rules. This study fills this gap with two major contributions. First, we introduce and formalise two variants of Defeasible Deontic Logic with Meta-Rules to represent (1) defeasible meta-theories with deontic modalities, and (2) two different types of conflicts among rules: Simple Conflict Defeasible Deontic Logic, and Cautious Conflict Defeasible Deontic Logic. Second, we advance efficient algorithms to compute the extensions for both variants. △ Less

Submitted 23 September, 2022; originally announced September 2022.

arXiv:2209.11770 [pdf, other]

Toward Smart Doors: A Position Paper

Authors: Luigi Capogrosso, Geri Skenderi, Federico Girella, Franco Fummi, Marco Cristani

Abstract: Conventional automatic doors cannot distinguish between people wishing to pass through the door and people passing by the door, so they often open unnecessarily. This leads to the need to adopt new systems in both commercial and non-commercial environments: smart doors. In particular, a smart door system predicts the intention of people near the door based on the social context of the surrounding… ▽ More Conventional automatic doors cannot distinguish between people wishing to pass through the door and people passing by the door, so they often open unnecessarily. This leads to the need to adopt new systems in both commercial and non-commercial environments: smart doors. In particular, a smart door system predicts the intention of people near the door based on the social context of the surrounding environment and then makes rational decisions about whether or not to open the door. This work proposes the first position paper related to smart doors, without bells and whistles. We first point out that the problem not only concerns reliability, climate control, safety, and mode of operation. Indeed, a system to predict the intention of people near the door also involves a deeper understanding of the social context of the scene through a complex combined analysis of proxemics and scene reasoning. Furthermore, we conduct an exhaustive literature review about automatic doors, providing a novel system formulation. Also, we present an analysis of the possible future application of smart doors, a description of the ethical shortcomings, and legislative issues. △ Less

Submitted 23 September, 2022; originally announced September 2022.

Comments: 2nd International Workshop on Industrial Machine Learning @ ICPR 2022

arXiv:2209.11607 [pdf, other]

doi 10.1109/ICPR56361.2022.9956625

I-SPLIT: Deep Network Interpretability for Split Computing

Authors: Federico Cunico, Luigi Capogrosso, Francesco Setti, Damiano Carra, Franco Fummi, Marco Cristani

Abstract: This work makes a substantial step in the field of split computing, i.e., how to split a deep neural network to host its early part on an embedded device and the rest on a server. So far, potential split locations have been identified exploiting uniquely architectural aspects, i.e., based on the layer sizes. Under this paradigm, the efficacy of the split in terms of accuracy can be evaluated only… ▽ More This work makes a substantial step in the field of split computing, i.e., how to split a deep neural network to host its early part on an embedded device and the rest on a server. So far, potential split locations have been identified exploiting uniquely architectural aspects, i.e., based on the layer sizes. Under this paradigm, the efficacy of the split in terms of accuracy can be evaluated only after having performed the split and retrained the entire pipeline, making an exhaustive evaluation of all the plausible splitting points prohibitive in terms of time. Here we show that not only the architecture of the layers does matter, but the importance of the neurons contained therein too. A neuron is important if its gradient with respect to the correct class decision is high. It follows that a split should be applied right after a layer with a high density of important neurons, in order to preserve the information flowing until then. Upon this idea, we propose Interpretable Split (I-SPLIT): a procedure that identifies the most suitable splitting points by providing a reliable prediction on how well this split will perform in terms of classification accuracy, beforehand of its effective implementation. As a further major contribution of I-SPLIT, we show that the best choice for the splitting point on a multiclass categorization problem depends also on which specific classes the network has to deal with. Exhaustive experiments have been carried out on two networks, VGG16 and ResNet-50, and three datasets, Tiny-Imagenet-200, notMNIST, and Chest X-Ray Pneumonia. The source code is available at https://github.com/vips4/I-Split. △ Less

Submitted 23 September, 2022; originally announced September 2022.

Comments: ICPR 2022

arXiv:2208.07308 [pdf, other]

Pose Forecasting in Industrial Human-Robot Collaboration

Authors: Alessio Sampieri, Guido D'Amely, Andrea Avogaro, Federico Cunico, Geri Skenderi, Francesco Setti, Marco Cristani, Fabio Galasso

Abstract: Pushing back the frontiers of collaborative robots in industrial environments, we propose a new Separable-Sparse Graph Convolutional Network (SeS-GCN) for pose forecasting. For the first time, SeS-GCN bottlenecks the interaction of the spatial, temporal and channel-wise dimensions in GCNs, and it learns sparse adjacency matrices by a teacher-student framework. Compared to the state-of-the-art, it… ▽ More Pushing back the frontiers of collaborative robots in industrial environments, we propose a new Separable-Sparse Graph Convolutional Network (SeS-GCN) for pose forecasting. For the first time, SeS-GCN bottlenecks the interaction of the spatial, temporal and channel-wise dimensions in GCNs, and it learns sparse adjacency matrices by a teacher-student framework. Compared to the state-of-the-art, it only uses 1.72% of the parameters and it is ~4 times faster, while still performing comparably in forecasting accuracy on Human3.6M at 1 second in the future, which enables cobots to be aware of human operators. As a second contribution, we present a new benchmark of Cobots and Humans in Industrial COllaboration (CHICO). CHICO includes multi-view videos, 3D poses and trajectories of 20 human operators and cobots, engaging in 7 realistic industrial actions. Additionally, it reports 226 genuine collisions, taking place during the human-cobot interaction. We test SeS-GCN on CHICO for two important perception tasks in robotics: human pose forecasting, where it reaches an average error of 85.3 mm (MPJPE) at 1 sec in the future with a run time of 2.3 msec, and collision detection, by comparing the forecasted human motion with the known cobot motion, obtaining an F1-score of 0.64. △ Less

Submitted 24 July, 2022; originally announced August 2022.

Comments: ECCV 2022

arXiv:2207.11001 [pdf, other]

POP: Mining POtential Performance of new fashion products via webly cross-modal query expansion

Authors: Christian Joppi, Geri Skenderi, Marco Cristani

Abstract: We propose a data-centric pipeline able to generate exogenous observation data for the New Fashion Product Performance Forecasting (NFPPF) problem, i.e., predicting the performance of a brand-new clothing probe with no available past observations. Our pipeline manufactures the missing past starting from a single, available image of the clothing probe. It starts by expanding textual tags associated… ▽ More We propose a data-centric pipeline able to generate exogenous observation data for the New Fashion Product Performance Forecasting (NFPPF) problem, i.e., predicting the performance of a brand-new clothing probe with no available past observations. Our pipeline manufactures the missing past starting from a single, available image of the clothing probe. It starts by expanding textual tags associated with the image, querying related fashionable or unfashionable images uploaded on the web at a specific time in the past. A binary classifier is robustly trained on these web images by confident learning, to learn what was fashionable in the past and how much the probe image conforms to this notion of fashionability. This compliance produces the POtential Performance (POP) time series, indicating how performing the probe could have been if it were available earlier. POP proves to be highly predictive for the probe's future performance, ameliorating the sales forecasts of all state-of-the-art models on the recent VISUELLE fast-fashion dataset. We also show that POP reflects the ground-truth popularity of new styles (ensembles of clothing items) on the Fashion Forward benchmark, demonstrating that our webly-learned signal is a truthful expression of popularity, accessible by everyone and generalizable to any time of analysis. Forecasting code, data and the POP time series are available at: https://github.com/HumaticsLAB/POP-Mining-POtential-Performance △ Less

Submitted 22 July, 2022; originally announced July 2022.

Comments: ECCV 2022

arXiv:2207.06706 [pdf, other]

SHREC 2022 Track on Online Detection of Heterogeneous Gestures

Authors: Ariel Caputo, Marco Emporio, Andrea Giachetti, Marco Cristani, Guido Borghi, Andrea D'Eusanio, Minh-Quan Le, Hai-Dang Nguyen, Minh-Triet Tran, F. Ambellan, M. Hanik, E. Nava-Yazdani, C. von Tycowicz

Abstract: This paper presents the outcomes of a contest organized to evaluate methods for the online recognition of heterogeneous gestures from sequences of 3D hand poses. The task is the detection of gestures belonging to a dictionary of 16 classes characterized by different pose and motion features. The dataset features continuous sequences of hand tracking data where the gestures are interleaved with non… ▽ More This paper presents the outcomes of a contest organized to evaluate methods for the online recognition of heterogeneous gestures from sequences of 3D hand poses. The task is the detection of gestures belonging to a dictionary of 16 classes characterized by different pose and motion features. The dataset features continuous sequences of hand tracking data where the gestures are interleaved with non-significant motions. The data have been captured using the Hololens 2 finger tracking system in a realistic use-case of mixed reality interaction. The evaluation is based not only on the detection performances but also on the latency and the false positives, making it possible to understand the feasibility of practical interaction tools based on the algorithms proposed. The outcomes of the contest's evaluation demonstrate the necessity of further research to reduce recognition errors, while the computational cost of the algorithms proposed is sufficiently low. △ Less

Submitted 22 July, 2022; v1 submitted 14 July, 2022; originally announced July 2022.

Comments: Accepted on Computer & Graphics journal

MSC Class: 68T10 ACM Class: I.5.2

arXiv:2204.06972 [pdf, other]

The multi-modal universe of fast-fashion: the Visuelle 2.0 benchmark

Authors: Geri Skenderi, Christian Joppi, Matteo Denitto, Berniero Scarpa, Marco Cristani

Abstract: We present Visuelle 2.0, the first dataset useful for facing diverse prediction problems that a fast-fashion company has to manage routinely. Furthermore, we demonstrate how the use of computer vision is substantial in this scenario. Visuelle 2.0 contains data for 6 seasons / 5355 clothing products of Nuna Lie, a famous Italian company with hundreds of shops located in different areas within the c… ▽ More We present Visuelle 2.0, the first dataset useful for facing diverse prediction problems that a fast-fashion company has to manage routinely. Furthermore, we demonstrate how the use of computer vision is substantial in this scenario. Visuelle 2.0 contains data for 6 seasons / 5355 clothing products of Nuna Lie, a famous Italian company with hundreds of shops located in different areas within the country. In particular, we focus on a specific prediction problem, namely short-observation new product sale forecasting (SO-fore). SO-fore assumes that the season has started and a set of new products is on the shelves of the different stores. The goal is to forecast the sales for a particular horizon, given a short, available past (few weeks), since no earlier statistics are available. To be successful, SO-fore approaches should capture this short past and exploit other modalities or exogenous data. To these aims, Visuelle 2.0 is equipped with disaggregated data at the item-shop level and multi-modal information for each clothing item, allowing computer vision approaches to come into play. The main message that we deliver is that the use of image data with deep networks boosts performances obtained when using the time series in long-term forecasting scenarios, ameliorating the WAPE and MAE by up to 5.48% and 7% respectively compared to competitive baseline methods. The dataset is available at https://humaticslab.github.io/forecasting/visuelle △ Less

Submitted 30 November, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

Comments: Accepted at the 5th Workshop on Computer Vision for Fashion, Art, and Design @ CVPR22

arXiv:2203.11878 [pdf, other]

Under the Hood of Transformer Networks for Trajectory Forecasting

Authors: Luca Franco, Leonardo Placidi, Francesco Giuliari, Irtiza Hasan, Marco Cristani, Fabio Galasso

Abstract: Transformer Networks have established themselves as the de-facto state-of-the-art for trajectory forecasting but there is currently no systematic study on their capability to model the motion patterns of people, without interactions with other individuals nor the social context. This paper proposes the first in-depth study of Transformer Networks (TF) and Bidirectional Transformers (BERT) for the… ▽ More Transformer Networks have established themselves as the de-facto state-of-the-art for trajectory forecasting but there is currently no systematic study on their capability to model the motion patterns of people, without interactions with other individuals nor the social context. This paper proposes the first in-depth study of Transformer Networks (TF) and Bidirectional Transformers (BERT) for the forecasting of the individual motion of people, without bells and whistles. We conduct an exhaustive evaluation of input/output representations, problem formulations and sequence modeling, including a novel analysis of their capability to predict multi-modal futures. Out of comparative evaluation on the ETH+UCY benchmark, both TF and BERT are top performers in predicting individual motions, definitely overcoming RNNs and LSTMs. Furthermore, they remain within a narrow margin wrt more complex techniques, which include both social interactions and scene contexts. Source code will be released for all conducted experiments. △ Less

Submitted 22 March, 2022; originally announced March 2022.

Comments: Under review in Pattern Recognition journal

arXiv:2203.05380 [pdf, other]

Spatial Commonsense Graph for Object Localisation in Partial Scenes

Authors: Francesco Giuliari, Geri Skenderi, Marco Cristani, Yiming Wang, Alessio Del Bue

Abstract: We solve object localisation in partial scenes, a new problem of estimating the unknown position of an object (e.g. where is the bag?) given a partial 3D scan of a scene. The proposed solution is based on a novel scene graph model, the Spatial Commonsense Graph (SCG), where objects are the nodes and edges define pairwise distances between them, enriched by concept nodes and relationships from a co… ▽ More We solve object localisation in partial scenes, a new problem of estimating the unknown position of an object (e.g. where is the bag?) given a partial 3D scan of a scene. The proposed solution is based on a novel scene graph model, the Spatial Commonsense Graph (SCG), where objects are the nodes and edges define pairwise distances between them, enriched by concept nodes and relationships from a commonsense knowledge base. This allows SCG to better generalise its spatial inference over unknown 3D scenes. The SCG is used to estimate the unknown position of the target object in two steps: first, we feed the SCG into a novel Proximity Prediction Network, a graph neural network that uses attention to perform distance prediction between the node representing the target object and the nodes representing the observed objects in the SCG; second, we propose a Localisation Module based on circular intersection to estimate the object position using all the predicted pairwise distances in order to be independent of any reference system. We create a new dataset of partially reconstructed scenes to benchmark our method and baselines for object localisation in partial scenes, where our proposed method achieves the best localisation performance. △ Less

Submitted 14 March, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

Comments: Accepted to CVPR 2022, project website: http://fgiuliari.github.io/projects/SpatialCommonsenseGraph/

arXiv:2110.02627 [pdf, other]

MovingFashion: a Benchmark for the Video-to-Shop Challenge

Authors: Marco Godi, Christian Joppi, Geri Skenderi, Marco Cristani

Abstract: Retrieving clothes which are worn in social media videos (Instagram, TikTok) is the latest frontier of e-fashion, referred to as "video-to-shop" in the computer vision literature. In this paper we present MovingFashion, the first publicly available dataset to cope with this challenge. MovingFashion is composed of 14855 social videos, each one of them associated to e-commerce "shop" images where th… ▽ More Retrieving clothes which are worn in social media videos (Instagram, TikTok) is the latest frontier of e-fashion, referred to as "video-to-shop" in the computer vision literature. In this paper we present MovingFashion, the first publicly available dataset to cope with this challenge. MovingFashion is composed of 14855 social videos, each one of them associated to e-commerce "shop" images where the corresponding clothing items are clearly portrayed. In addition, we present a network for retrieving the shop images in this scenario, dubbed SEAM Match-RCNN. The model is trained by image-to-video domain adaptation, allowing to use video sequences where only their association with a shop image is given, eliminating the need of millions of annotated bounding boxes. SEAM Match-RCNN builds an embedding, where an attention-based weighted sum of few frames (10) of a social video is enough to individuate the correct product within the first 5 retrieved items in a 14K+ shop element gallery with an accuracy of 80%. This provides the best performance on MovingFashion, comparing exhaustively against the related state-of-the-art approaches and alternative baselines. △ Less

Submitted 14 October, 2021; v1 submitted 6 October, 2021; originally announced October 2021.

Comments: Accepted at WACV 2022

arXiv:2109.09824 [pdf, other]

doi 10.1002/for.3104

Well Googled is Half Done: Multimodal Forecasting of New Fashion Product Sales with Image-based Google Trends

Authors: Geri Skenderi, Christian Joppi, Matteo Denitto, Marco Cristani

Abstract: New fashion product sales forecasting is a challenging problem that involves many business dynamics and cannot be solved by classical forecasting approaches. In this paper, we investigate the effectiveness of systematically probing exogenous knowledge in the form of Google Trends time series and combining it with multi-modal information related to a brand-new fashion item, in order to effectively… ▽ More New fashion product sales forecasting is a challenging problem that involves many business dynamics and cannot be solved by classical forecasting approaches. In this paper, we investigate the effectiveness of systematically probing exogenous knowledge in the form of Google Trends time series and combining it with multi-modal information related to a brand-new fashion item, in order to effectively forecast its sales despite the lack of past data. In particular, we propose a neural network-based approach, where an encoder learns a representation of the exogenous time series, while the decoder forecasts the sales based on the Google Trends encoding and the available visual and metadata information. Our model works in a non-autoregressive manner, avoiding the compounding effect of large first-step errors. As a second contribution, we present VISUELLE, a publicly available dataset for the task of new fashion product sales forecasting, containing multimodal information for 5577 real, new products sold between 2016-2019 from Nunalie, an Italian fast-fashion company. The dataset is equipped with images of products, metadata, related sales, and associated Google Trends. We use VISUELLE to compare our approach against state-of-the-art alternatives and several baselines, showing that our neural network-based approach is the most accurate in terms of both percentage and absolute error. It is worth noting that the addition of exogenous knowledge boosts the forecasting accuracy by 1.5% in terms of Weighted Absolute Percentage Error (WAPE), revealing the importance of exploiting informative external information. The code and dataset are both available at https://github.com/HumaticsLAB/GTM-Transformer. △ Less

Submitted 14 January, 2024; v1 submitted 20 September, 2021; originally announced September 2021.

Comments: Accepted by the Wiley Journal of Forecasting

arXiv:2107.00914 [pdf, other]

POMP++: Pomcp-based Active Visual Search in unknown indoor environments

Authors: Francesco Giuliari, Alberto Castellini, Riccardo Berra, Alessio Del Bue, Alessandro Farinelli, Marco Cristani, Francesco Setti, Yiming Wang

Abstract: In this paper we focus on the problem of learning online an optimal policy for Active Visual Search (AVS) of objects in unknown indoor environments. We propose POMP++, a planning strategy that introduces a novel formulation on top of the classic Partially Observable Monte Carlo Planning (POMCP) framework, to allow training-free online policy learning in unknown environments. We present a new belie… ▽ More In this paper we focus on the problem of learning online an optimal policy for Active Visual Search (AVS) of objects in unknown indoor environments. We propose POMP++, a planning strategy that introduces a novel formulation on top of the classic Partially Observable Monte Carlo Planning (POMCP) framework, to allow training-free online policy learning in unknown environments. We present a new belief reinvigoration strategy which allows to use POMCP with a dynamically growing state space to address the online generation of the floor map. We evaluate our method on two public benchmark datasets, AVD that is acquired by real robotic platforms and Habitat ObjectNav that is rendered from real 3D scene scans, achieving the best success rate with an improvement of >10% over the state-of-the-art methods. △ Less

Submitted 5 November, 2021; v1 submitted 2 July, 2021; originally announced July 2021.

Comments: Accepted at 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:2009.08140 [pdf, other]

POMP: Pomcp-based Online Motion Planning for active visual search in indoor environments

Authors: Yiming Wang, Francesco Giuliari, Riccardo Berra, Alberto Castellini, Alessio Del Bue, Alessandro Farinelli, Marco Cristani, Francesco Setti

Abstract: In this paper we focus on the problem of learning an optimal policy for Active Visual Search (AVS) of objects in known indoor environments with an online setup. Our POMP method uses as input the current pose of an agent (e.g. a robot) and a RGB-D frame. The task is to plan the next move that brings the agent closer to the target object. We model this problem as a Partially Observable Markov Decisi… ▽ More In this paper we focus on the problem of learning an optimal policy for Active Visual Search (AVS) of objects in known indoor environments with an online setup. Our POMP method uses as input the current pose of an agent (e.g. a robot) and a RGB-D frame. The task is to plan the next move that brings the agent closer to the target object. We model this problem as a Partially Observable Markov Decision Process solved by a Monte-Carlo planning approach. This allows us to make decisions on the next moves by iterating over the known scenario at hand, exploring the environment and searching for the object at the same time. Differently from the current state of the art in Reinforcement Learning, POMP does not require extensive and expensive (in time and computation) labelled data so being very agile in solving AVS in small and medium real scenarios. We only require the information of the floormap of the environment, an information usually available or that can be easily extracted from an a priori single exploration run. We validate our method on the publicly available AVD benchmark, achieving an average success rate of 0.76 with an average path length of 17.1, performing close to the state of the art but without any training needed. Additionally, we show experimentally the robustness of our method when the quality of the object detection goes from ideal to faulty. △ Less

Submitted 17 September, 2020; originally announced September 2020.

Comments: Accepted at BMVC2020

arXiv:2006.08184 [pdf, other]

doi 10.1109/TPAMI.2020.3002843

Infinite Feature Selection: A Graph-based Feature Filtering Approach

Authors: Giorgio Roffo, Simone Melzi, Umberto Castellani, Alessandro Vinciarelli, Marco Cristani

Abstract: We propose a filtering feature selection framework that considers subsets of features as paths in a graph, where a node is a feature and an edge indicates pairwise (customizable) relations among features, dealing with relevance and redundancy principles. By two different interpretations (exploiting properties of power series of matrices and relying on Markov chains fundamentals) we can evaluate th… ▽ More We propose a filtering feature selection framework that considers subsets of features as paths in a graph, where a node is a feature and an edge indicates pairwise (customizable) relations among features, dealing with relevance and redundancy principles. By two different interpretations (exploiting properties of power series of matrices and relying on Markov chains fundamentals) we can evaluate the values of paths (i.e., feature subsets) of arbitrary lengths, eventually go to infinite, from which we dub our framework Infinite Feature Selection (Inf-FS). Going to infinite allows to constrain the computational complexity of the selection process, and to rank the features in an elegant way, that is, considering the value of any path (subset) containing a particular feature. We also propose a simple unsupervised strategy to cut the ranking, so providing the subset of features to keep. In the experiments, we analyze diverse settings with heterogeneous features, for a total of 11 benchmarks, comparing against 18 widely-known comparative approaches. The results show that Inf-FS behaves better in almost any situation, that is, when the number of features to keep are fixed a priori, or when the decision of the subset cardinality is part of the process. △ Less

Submitted 15 June, 2020; originally announced June 2020.

Comments: TPAMI PREPRINT 2020

Report number: TPAMI-2019-08-0679.R1

Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence (2020),

arXiv:2005.04813 [pdf, other]

The Visual Social Distancing Problem

Authors: Marco Cristani, Alessio Del Bue, Vittorio Murino, Francesco Setti, Alessandro Vinciarelli

Abstract: One of the main and most effective measures to contain the recent viral outbreak is the maintenance of the so-called Social Distancing (SD). To comply with this constraint, workplaces, public institutions, transports and schools will likely adopt restrictions over the minimum inter-personal distance between people. Given this actual scenario, it is crucial to massively measure the compliance to su… ▽ More One of the main and most effective measures to contain the recent viral outbreak is the maintenance of the so-called Social Distancing (SD). To comply with this constraint, workplaces, public institutions, transports and schools will likely adopt restrictions over the minimum inter-personal distance between people. Given this actual scenario, it is crucial to massively measure the compliance to such physical constraint in our life, in order to figure out the reasons of the possible breaks of such distance limitations, and understand if this implies a possible threat given the scene context. All of this, complying with privacy policies and making the measurement acceptable. To this end, we introduce the Visual Social Distancing (VSD) problem, defined as the automatic estimation of the inter-personal distance from an image, and the characterization of the related people aggregations. VSD is pivotal for a non-invasive analysis to whether people comply with the SD restriction, and to provide statistics about the level of safety of specific areas whenever this constraint is violated. We then discuss how VSD relates with previous literature in Social Signal Processing and indicate which existing Computer Vision methods can be used to manage such problem. We conclude with future challenges related to the effectiveness of VSD systems, ethical implications and future application scenarios. △ Less

Submitted 10 May, 2020; originally announced May 2020.

Comments: 9 pages, 5 figures. All the authors equally contributed to this manuscript and they are listed by alphabetical order. Under submission

arXiv:2004.08346 [pdf, other]

An integrated light management system with real-time light measurement and human perception

Authors: Theodore Tsesmelis, Irtiza Hasan, Marco Cristani, Alessio Del Bue, Fabio Galasso

Abstract: Illumination is important for well-being, productivity and safety across several environments, including offices, retail shops and industrial warehouses. Current techniques for setting up lighting require extensive and expert support and need to be repeated if the scene changes. Here we propose the first fully-automated light management system (LMS) which measures lighting in real-time, leveraging… ▽ More Illumination is important for well-being, productivity and safety across several environments, including offices, retail shops and industrial warehouses. Current techniques for setting up lighting require extensive and expert support and need to be repeated if the scene changes. Here we propose the first fully-automated light management system (LMS) which measures lighting in real-time, leveraging an RGBD sensor and a radiosity-based light propagation model. Thanks to the integration of light distribution and perception curves into the radiosity, we outperform a commercial software (Relux) on a newly introduced dataset. Furthermore, our proposed LMS is the first to estimate both the presence and the attention of the people in the environment, as well as their light perception. Our new LMS adapts therefore lighting to the scene and human activity and it is capable of saving up to 66%, as we experimentally quantify,without compromising the lighting quality. △ Less

Submitted 17 April, 2020; originally announced April 2020.

arXiv:2003.08111 [pdf, other]

Transformer Networks for Trajectory Forecasting

Authors: Francesco Giuliari, Irtiza Hasan, Marco Cristani, Fabio Galasso

Abstract: Most recent successes on forecasting the people motion are based on LSTM models and all most recent progress has been achieved by modelling the social interaction among people and the people interaction with the scene. We question the use of the LSTM models and propose the novel use of Transformer Networks for trajectory forecasting. This is a fundamental switch from the sequential step-by-step pr… ▽ More Most recent successes on forecasting the people motion are based on LSTM models and all most recent progress has been achieved by modelling the social interaction among people and the people interaction with the scene. We question the use of the LSTM models and propose the novel use of Transformer Networks for trajectory forecasting. This is a fundamental switch from the sequential step-by-step processing of LSTMs to the only-attention-based memory mechanisms of Transformers. In particular, we consider both the original Transformer Network (TF) and the larger Bidirectional Transformer (BERT), state-of-the-art on all natural language processing tasks. Our proposed Transformers predict the trajectories of the individual people in the scene. These are "simple" model because each person is modelled separately without any complex human-human nor scene interaction terms. In particular, the TF model without bells and whistles yields the best score on the largest and most challenging trajectory forecasting benchmark of TrajNet. Additionally, its extension which predicts multiple plausible future trajectories performs on par with more engineered techniques on the 5 datasets of ETH + UCY. Finally, we show that Transformers may deal with missing observations, as it may be the case with real sensor data. Code is available at https://github.com/FGiuliari/Trajectory-Transformer. △ Less

Submitted 21 October, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

Comments: To appear in International Conference on Pattern Recognition (ICPR) 2020

arXiv:1908.11127 [pdf, other]

Texel-Att: Representing and Classifying Element-based Textures by Attributes

Authors: Marco Godi, Christian Joppi, Andrea Giachetti, Fabio Pellacini, Marco Cristani

Abstract: Element-based textures are a kind of texture formed by nameable elements, the texels [1], distributed according to specific statistical distributions; it is of primary importance in many sectors, namely textile, fashion and interior design industry. State-of-theart texture descriptors fail to properly characterize element-based texture, so we present Texel-Att to fill this gap. Texel-Att is the fi… ▽ More Element-based textures are a kind of texture formed by nameable elements, the texels [1], distributed according to specific statistical distributions; it is of primary importance in many sectors, namely textile, fashion and interior design industry. State-of-theart texture descriptors fail to properly characterize element-based texture, so we present Texel-Att to fill this gap. Texel-Att is the first fine-grained, attribute-based representation and classification framework for element-based textures. It first individuates texels, characterizing them with individual attributes; subsequently, texels are grouped and characterized through layout attributes, which give the Texel-Att representation. Texels are detected by a Mask-RCNN, trained on a brand-new element-based texture dataset, ElBa, containing 30K texture images with 3M fully-annotated texels. Examples of individual and layout attributes are exhibited to give a glimpse on the level of achievable graininess. In the experiments, we present detection results to show that texels can be precisely individuated, even on textures "in the wild"; to this sake, we individuate the element-based classes of the Describable Texture Dataset (DTD), where almost 900K texels have been manually annotated, leading to the Element-based DTD (E-DTD). Subsequently, classification and ranking results demonstrate the expressivity of Texel-Att on ElBa and E-DTD, overcoming the alternative features and relative attributes, doubling the best performance in some cases; finally, we report interactive search results on ElBa and E-DTD: with Texel-Att on the E-DTD dataset we are able to individuate within 10 iterations the desired texture in the 90% of cases, against the 71% obtained with a combination of the finest existing attributes so far. Dataset and code is available at https://github.com/godimarcovr/Texel-Att △ Less

Submitted 30 August, 2019; v1 submitted 29 August, 2019; originally announced August 2019.

Comments: Accepted as oral at BMVC 2019

arXiv:1908.11111 [pdf, other]

Texture Retrieval in the Wild through detection-based attributes

Authors: Christian Joppi, Marco Godi, Andrea Giachetti, Fabio Pellacini, Marco Cristani

Abstract: Capturing the essence of a textile image in a robust way is important to retrieve it in a large repository, especially if it has been acquired in the wild (by taking a photo of the textile of interest). In this paper we show that a texel-based representation fits well with this task. In particular, we refer to Texel-Att, a recent texel-based descriptor which has shown to capture fine grained varia… ▽ More Capturing the essence of a textile image in a robust way is important to retrieve it in a large repository, especially if it has been acquired in the wild (by taking a photo of the textile of interest). In this paper we show that a texel-based representation fits well with this task. In particular, we refer to Texel-Att, a recent texel-based descriptor which has shown to capture fine grained variations of a texture, for retrieval purposes. After a brief explanation of Texel-Att, we will show in our experiments that this descriptor is robust to distortions resulting from acquisitions in the wild by setting up an experiment in which textures from the ElBa (an Element-Based texture dataset) are artificially distorted and then used to retrieve the original image. We compare our approach with existing descriptors using a simple ranking framework based on distance functions. Results show that even under extreme conditions (such as a down-sampling with a factor of 10), we perform better than alternative approaches. △ Less

Submitted 4 October, 2019; v1 submitted 29 August, 2019; originally announced August 2019.

Comments: ICIAP - International Conference on Image Analysis and Processing

arXiv:1908.05737 [pdf, other]

doi 10.4204/EPTCS.298.1

Applications of Linear Defeasible Logic: combining resource consumption and exceptions to energy management and business processes

Authors: Francesco Olivieri, Guido Governatori, Claudio Tomazzoli, Matteo Cristani

Abstract: Linear Logic and Defeasible Logic have been adopted to formalise different features of knowledge representation: consumption of resources, and non monotonic reasoning in particular to represent exceptions. Recently, a framework to combine sub-structural features, corresponding to the consumption of resources, with defeasibility aspects to handle potentially conflicting information, has been discu… ▽ More Linear Logic and Defeasible Logic have been adopted to formalise different features of knowledge representation: consumption of resources, and non monotonic reasoning in particular to represent exceptions. Recently, a framework to combine sub-structural features, corresponding to the consumption of resources, with defeasibility aspects to handle potentially conflicting information, has been discussed in literature, by some of the authors. Two applications emerged that are very relevant: energy management and business process management. We illustrate a set of guide lines to determine how to apply linear defeasible logic to those contexts. △ Less

Submitted 13 August, 2019; originally announced August 2019.

Comments: In Proceedings DICE-FOPARA 2019, arXiv:1908.04478. arXiv admin note: substantial text overlap with arXiv:1809.03656

Journal ref: EPTCS 298, 2019, pp. 1-14

arXiv:1904.07092 [pdf, other]

SIMCO: SIMilarity-based object COunting

Authors: Marco Godi, Christian Joppi, Andrea Giachetti, Marco Cristani

Abstract: We present SIMCO, the first agnostic multi-class object counting approach. SIMCO starts by detecting foreground objects through a novel Mask RCNN-based architecture trained beforehand (just once) on a brand-new synthetic 2D shape dataset, InShape; the idea is to highlight every object resembling a primitive 2D shape (circle, square, rectangle, etc.). Each object detected is described by a low-dime… ▽ More We present SIMCO, the first agnostic multi-class object counting approach. SIMCO starts by detecting foreground objects through a novel Mask RCNN-based architecture trained beforehand (just once) on a brand-new synthetic 2D shape dataset, InShape; the idea is to highlight every object resembling a primitive 2D shape (circle, square, rectangle, etc.). Each object detected is described by a low-dimensional embedding, obtained from a novel similarity-based head branch; this latter implements a triplet loss, encouraging similar objects (same 2D shape + color and scale) to map close. Subsequently, SIMCO uses this embedding for clustering, so that different types of objects can emerge and be counted, making SIMCO the very first multi-class unsupervised counter. Experiments show that SIMCO provides state-of-the-art scores on counting benchmarks and that it can also help in many challenging image understanding tasks. △ Less

Submitted 13 October, 2020; v1 submitted 15 April, 2019; originally announced April 2019.

Comments: Accepted at ICPR 2020

arXiv:1901.10772 [pdf, other]

Human-centric light sensing and estimation from RGBD images: The invisible light switch

Authors: Theodore Tsesmelis, Irtiza Hasan, Marco Cristani, Alessio Del Bue, Fabio Galasso

Abstract: Lighting design in indoor environments is of primary importance for at least two reasons: 1) people should perceive an adequate light; 2) an effective lighting design means consistent energy saving. We present the Invisible Light Switch (ILS) to address both aspects. ILS dynamically adjusts the room illumination level to save energy while maintaining constant the light level perception of the user… ▽ More Lighting design in indoor environments is of primary importance for at least two reasons: 1) people should perceive an adequate light; 2) an effective lighting design means consistent energy saving. We present the Invisible Light Switch (ILS) to address both aspects. ILS dynamically adjusts the room illumination level to save energy while maintaining constant the light level perception of the users. So the energy saving is invisible to them. Our proposed ILS leverages a radiosity model to estimate the light level which is perceived by a person within an indoor environment, taking into account the person position and her/his viewing frustum (head pose). ILS may therefore dim those luminaires, which are not seen by the user, resulting in an effective energy saving, especially in large open offices (where light may otherwise be ON everywhere for a single person). To quantify the system performance, we have collected a new dataset where people wear luxmeter devices while working in office rooms. The luxmeters measure the amount of light (in Lux) reaching the people gaze, which we consider a proxy to their illumination level perception. Our initial results are promising: in a room with 8 LED luminaires, the energy consumption in a day may be reduced from 18585 to 6206 watts with ILS (currently needing 1560 watts for operations). While doing so, the drop in perceived lighting decreases by just 200 lux, a value considered negligible when the original illumination level is above 1200 lux, as is normally the case in offices. △ Less

Submitted 30 January, 2019; originally announced January 2019.

arXiv:1901.09867 [pdf, other]

It could be worse, it could be raining: reliable automatic meteorological forecasting

Authors: Matteo Cristani, Francesco Domenichini, Claudio Tomazzoli, Luca Viganò, Margherita Zorzi

Abstract: Meteorological forecasting provides reliable prediction about the future weather within a given interval of time. Meteorological forecasting can be viewed as a form of hybrid diagnostic reasoning and can be mapped onto an integrated conceptual framework. The automation of the forecasting process would be helpful in a number of contexts, in particular: when the amount of data is too wide to be deal… ▽ More Meteorological forecasting provides reliable prediction about the future weather within a given interval of time. Meteorological forecasting can be viewed as a form of hybrid diagnostic reasoning and can be mapped onto an integrated conceptual framework. The automation of the forecasting process would be helpful in a number of contexts, in particular: when the amount of data is too wide to be dealt with manually; to support forecasters education; when forecasting about underpopulated geographic areas is not interesting for everyday life (and then is out from human forecasters' tasks) but is central for tourism sponsorship. We present logic MeteoLOG, a framework that models the main steps of the reasoner the forecaster adopts to provide a bulletin. MeteoLOG rests on several traditions, mainly on fuzzy, temporal and probabilistic logics. On this basis, we also introduce the algorithm Tournament, that transforms a set of MeteoLOG rules into a defeasible theory, that can be implemented into an automatic reasoner. We finally propose an example that models a real world forecasting scenario. △ Less

Submitted 8 February, 2019; v1 submitted 28 January, 2019; originally announced January 2019.

arXiv:1901.02000 [pdf, other]

Forecasting People Trajectories and Head Poses by Jointly Reasoning on Tracklets and Vislets

Authors: Irtiza Hasan, Francesco Setti, Theodore Tsesmelis, Vasileios Belagiannis, Sikandar Amin, Alessio Del Bue, Marco Cristani, Fabio Galasso

Abstract: In this work, we explore the correlation between people trajectories and their head orientations. We argue that people trajectory and head pose forecasting can be modelled as a joint problem. Recent approaches on trajectory forecasting leverage short-term trajectories (aka tracklets) of pedestrians to predict their future paths. In addition, sociological cues, such as expected destination or pedes… ▽ More In this work, we explore the correlation between people trajectories and their head orientations. We argue that people trajectory and head pose forecasting can be modelled as a joint problem. Recent approaches on trajectory forecasting leverage short-term trajectories (aka tracklets) of pedestrians to predict their future paths. In addition, sociological cues, such as expected destination or pedestrian interaction, are often combined with tracklets. In this paper, we propose MiXing-LSTM (MX-LSTM) to capture the interplay between positions and head orientations (vislets) thanks to a joint unconstrained optimization of full covariance matrices during the LSTM backpropagation. We additionally exploit the head orientations as a proxy for the visual attention, when modeling social interactions. MX-LSTM predicts future pedestrians location and head pose, increasing the standard capabilities of the current approaches on long-term trajectory forecasting. Compared to the state-of-the-art, our approach shows better performances on an extensive set of public benchmarks. MX-LSTM is particularly effective when people move slowly, i.e. the most challenging scenario for all other models. The proposed approach also allows for accurate predictions on a longer time horizon. △ Less

Submitted 15 October, 2019; v1 submitted 7 January, 2019; originally announced January 2019.

Comments: Accepted at IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2019. arXiv admin note: text overlap with arXiv:1805.00652

arXiv:1809.07558 [pdf, other]

RGBD2lux: Dense light intensity estimation with an RGBD sensor

Authors: Theodore Tsesmelis, Irtiza Hasan, Marco Cristani, Fabio Galasso, Alessio Del Bue

Abstract: Lighting design and modelling or industrial applications like luminaire planning and commissioning rely heavily on time consuming manual measurements or on physically coherent computational simulations. Regarding the latter,standard approaches are based on CAD modeling simulations and offline rendering, with long processing times and therefore inflexible workflows. Thus, in this paper we pro-pose… ▽ More Lighting design and modelling or industrial applications like luminaire planning and commissioning rely heavily on time consuming manual measurements or on physically coherent computational simulations. Regarding the latter,standard approaches are based on CAD modeling simulations and offline rendering, with long processing times and therefore inflexible workflows. Thus, in this paper we pro-pose a computer vision based system to measure lighting with just a single RGBD camera. The proposed method uses both depth data and images from the sensor to provide a dense measure of light intensity in the field of view of the camera. We evaluate our system on novel ground truth data and compare it to state-of-the-art commercial light-planning software. Our system provides improved performance, while being completely automated, given that the CAD model is extracted from the depth and the albedo estimated with the support of RGB images. To the best of our knowledge, this is the first automatic framework for the estimation of lighting in general indoor scenarios from RGBDinput. △ Less

Submitted 7 December, 2018; v1 submitted 20 September, 2018; originally announced September 2018.

Comments: 10 pages, 9 figures, this manuscript is accepted in WACV 2019

arXiv:1809.03656 [pdf, ps, other]

Resource-driven Substructural Defeasible Logic

Authors: Francesco Olivieri, Guido Governatori, Matteo Cristani, Nick van Beest, Silvano Colombo-Tosatto

Abstract: Linear Logic and Defeasible Logic have been adopted to formalise different features relevant to agents: consumption of resources, and reasoning with exceptions. We propose a framework to combine sub-structural features, corresponding to the consumption of resources, with defeasibility aspects, and we discuss the design choices for the framework. Linear Logic and Defeasible Logic have been adopted to formalise different features relevant to agents: consumption of resources, and reasoning with exceptions. We propose a framework to combine sub-structural features, corresponding to the consumption of resources, with defeasibility aspects, and we discuss the design choices for the framework. △ Less

Submitted 10 September, 2018; originally announced September 2018.

arXiv:1805.00652 [pdf, other]

MX-LSTM: mixing tracklets and vislets to jointly forecast trajectories and head poses

Authors: Irtiza Hasan, Francesco Setti, Theodore Tsesmelis, Alessio Del Bue, Fabio Galasso, Marco Cristani

Abstract: Recent approaches on trajectory forecasting use tracklets to predict the future positions of pedestrians exploiting Long Short Term Memory (LSTM) architectures. This paper shows that adding vislets, that is, short sequences of head pose estimations, allows to increase significantly the trajectory forecasting performance. We then propose to use vislets in a novel framework called MX-LSTM, capturing… ▽ More Recent approaches on trajectory forecasting use tracklets to predict the future positions of pedestrians exploiting Long Short Term Memory (LSTM) architectures. This paper shows that adding vislets, that is, short sequences of head pose estimations, allows to increase significantly the trajectory forecasting performance. We then propose to use vislets in a novel framework called MX-LSTM, capturing the interplay between tracklets and vislets thanks to a joint unconstrained optimization of full covariance matrices during the LSTM backpropagation. At the same time, MX-LSTM predicts the future head poses, increasing the standard capabilities of the long-term trajectory forecasting approaches. With standard head pose estimators and an attentional-based social pooling, MX-LSTM scores the new trajectory forecasting state-of-the-art in all the considered datasets (Zara01, Zara02, UCY, and TownCentre) with a dramatic margin when the pedestrians slow down, a case where most of the forecasting approaches struggle to provide an accurate solution. △ Less

Submitted 2 May, 2018; originally announced May 2018.

Comments: 10 pages, 3 figures to appear in CVPR 2018

Showing 1–50 of 64 results for author: Cristani, M