Alert button
Picture for Mårten Björkman

Mårten Björkman

Alert button

FlowIBR: Leveraging Pre-Training for Efficient Neural Image-Based Rendering of Dynamic Scenes

Sep 11, 2023
Marcel Büsching, Josef Bengtson, David Nilsson, Mårten Björkman

Figure 1 for FlowIBR: Leveraging Pre-Training for Efficient Neural Image-Based Rendering of Dynamic Scenes
Figure 2 for FlowIBR: Leveraging Pre-Training for Efficient Neural Image-Based Rendering of Dynamic Scenes
Figure 3 for FlowIBR: Leveraging Pre-Training for Efficient Neural Image-Based Rendering of Dynamic Scenes
Figure 4 for FlowIBR: Leveraging Pre-Training for Efficient Neural Image-Based Rendering of Dynamic Scenes

We introduce a novel approach for monocular novel view synthesis of dynamic scenes. Existing techniques already show impressive rendering quality but tend to focus on optimization within a single scene without leveraging prior knowledge. This limitation has been primarily attributed to the lack of datasets of dynamic scenes available for training and the diversity of scene dynamics. Our method FlowIBR circumvents these issues by integrating a neural image-based rendering method, pre-trained on a large corpus of widely available static scenes, with a per-scene optimized scene flow field. Utilizing this flow field, we bend the camera rays to counteract the scene dynamics, thereby presenting the dynamic scene as if it were static to the rendering network. The proposed method reduces per-scene optimization time by an order of magnitude, achieving comparable results to existing methods - all on a single consumer-grade GPU.

Viaarxiv icon

Automated Construction of Time-Space Diagrams for Traffic Analysis Using Street-View Video Sequence

Aug 11, 2023
Tanay Rastogi, Mårten Björkman

Figure 1 for Automated Construction of Time-Space Diagrams for Traffic Analysis Using Street-View Video Sequence
Figure 2 for Automated Construction of Time-Space Diagrams for Traffic Analysis Using Street-View Video Sequence
Figure 3 for Automated Construction of Time-Space Diagrams for Traffic Analysis Using Street-View Video Sequence
Figure 4 for Automated Construction of Time-Space Diagrams for Traffic Analysis Using Street-View Video Sequence

Time-space diagrams are essential tools for analyzing traffic patterns and optimizing transportation infrastructure and traffic management strategies. Traditional data collection methods for these diagrams have limitations in terms of temporal and spatial coverage. Recent advancements in camera technology have overcome these limitations and provided extensive urban data. In this study, we propose an innovative approach to constructing time-space diagrams by utilizing street-view video sequences captured by cameras mounted on moving vehicles. Using the state-of-the-art YOLOv5, StrongSORT, and photogrammetry techniques for distance calculation, we can infer vehicle trajectories from the video data and generate time-space diagrams. To evaluate the effectiveness of our proposed method, we utilized datasets from the KITTI computer vision benchmark suite. The evaluation results demonstrate that our approach can generate trajectories from video data, although there are some errors that can be mitigated by improving the performance of the detector, tracker, and distance calculation components. In conclusion, the utilization of street-view video sequences captured by cameras mounted on moving vehicles, combined with state-of-the-art computer vision techniques, has immense potential for constructing comprehensive time-space diagrams. These diagrams offer valuable insights into traffic patterns and contribute to the design of transportation infrastructure and traffic management strategies.

Viaarxiv icon

TD-GEM: Text-Driven Garment Editing Mapper

May 29, 2023
Reza Dadfar, Sanaz Sabzevari, Mårten Björkman, Danica Kragic

Figure 1 for TD-GEM: Text-Driven Garment Editing Mapper
Figure 2 for TD-GEM: Text-Driven Garment Editing Mapper
Figure 3 for TD-GEM: Text-Driven Garment Editing Mapper
Figure 4 for TD-GEM: Text-Driven Garment Editing Mapper

Language-based fashion image editing allows users to try out variations of desired garments through provided text prompts. Inspired by research on manipulating latent representations in StyleCLIP and HairCLIP, we focus on these latent spaces for editing fashion items of full-body human datasets. Currently, there is a gap in handling fashion image editing due to the complexity of garment shapes and textures and the diversity of human poses. In this paper, we propose an editing optimizer scheme method called Text-Driven Garment Editing Mapper (TD-GEM), aiming to edit fashion items in a disentangled way. To this end, we initially obtain a latent representation of an image through generative adversarial network inversions such as Encoder for Editing (e4e) or Pivotal Tuning Inversion (PTI) for more accurate results. An optimization-based Contrasive Language-Image Pre-training (CLIP) is then utilized to guide the latent representation of a fashion image in the direction of a target attribute expressed in terms of a text prompt. Our TD-GEM manipulates the image accurately according to the target attribute, while other parts of the image are kept untouched. In the experiments, we evaluate TD-GEM on two different attributes (i.e., "color" and "sleeve length"), which effectively generates realistic images compared to the recent manipulation schemes.

* The first two authors contributed equally 
Viaarxiv icon

A Multimodal Data Set of Human Handovers with Design Implications for Human-Robot Handovers

Apr 04, 2023
Parag Khanna, Mårten Björkman, Christian Smith

Figure 1 for A Multimodal Data Set of Human Handovers with Design Implications for Human-Robot Handovers
Figure 2 for A Multimodal Data Set of Human Handovers with Design Implications for Human-Robot Handovers
Figure 3 for A Multimodal Data Set of Human Handovers with Design Implications for Human-Robot Handovers
Figure 4 for A Multimodal Data Set of Human Handovers with Design Implications for Human-Robot Handovers

Handovers are basic yet sophisticated motor tasks performed seamlessly by humans. They are among the most common activities in our daily lives and social environments. This makes mastering the art of handovers critical for a social and collaborative robot. In this work, we present an experimental study that involved human-human handovers by 13 pairs, i.e., 26 participants. We record and explore multiple features of handovers amongst humans aimed at inspiring handovers amongst humans and robots. With this work, we further create and publish a novel data set of 8672 handovers, bringing together human motion and the forces involved. We further analyze the effect of object weight and the role of visual sensory input in human-human handovers, as well as possible design implications for robots. As a proof of concept, the data set was used for creating a human-inspired data-driven strategy for robotic grip release in handovers, which was demonstrated to result in better robot to human handovers.

* The data set of human-human handovers can be found at: https://github.com/paragkhanna1/dataset 
Viaarxiv icon

Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models

Apr 03, 2023
Wenjie Yin, Ruibo Tu, Hang Yin, Danica Kragic, Hedvig Kjellström, Mårten Björkman

Figure 1 for Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models
Figure 2 for Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models
Figure 3 for Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models
Figure 4 for Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models

Data-driven and controllable human motion synthesis and prediction are active research areas with various applications in interactive media and social robotics. Challenges remain in these fields for generating diverse motions given past observations and dealing with imperfect poses. This paper introduces MoDiff, an autoregressive probabilistic diffusion model over motion sequences conditioned on control contexts of other modalities. Our model integrates a cross-modal Transformer encoder and a Transformer-based decoder, which are found effective in capturing temporal correlations in motion and control modalities. We also introduce a new data dropout method based on the diffusion forward process to provide richer data representations and robust generation. We demonstrate the superior performance of MoDiff in controllable motion synthesis for locomotion with respect to two baselines and show the benefits of diffusion data dropout for robust synthesis and reconstruction of high-fidelity motion close to recorded data.

Viaarxiv icon

User Study Exploring the Role of Explanation of Failures by Robots in Human Robot Collaboration Tasks

Mar 28, 2023
Parag Khanna, Elmira Yadollahi, Mårten Björkman, Iolanda Leite, Christian Smith

Figure 1 for User Study Exploring the Role of Explanation of Failures by Robots in Human Robot Collaboration Tasks
Figure 2 for User Study Exploring the Role of Explanation of Failures by Robots in Human Robot Collaboration Tasks
Figure 3 for User Study Exploring the Role of Explanation of Failures by Robots in Human Robot Collaboration Tasks
Figure 4 for User Study Exploring the Role of Explanation of Failures by Robots in Human Robot Collaboration Tasks

Despite great advances in what robots can do, they still experience failures in human-robot collaborative tasks due to high randomness in unstructured human environments. Moreover, a human's unfamiliarity with a robot and its abilities can cause such failures to repeat. This makes the ability to failure explanation very important for a robot. In this work, we describe a user study that incorporated different robotic failures in a human-robot collaboration (HRC) task aimed at filling a shelf. We included different types of failures and repeated occurrences of such failures in a prolonged interaction between humans and robots. The failure resolution involved human intervention in form of human-robot bidirectional handovers. Through such studies, we aim to test different explanation types and explanation progression in the interaction and record humans.

* Contributed to the: "The Imperfectly Relatable Robot: An interdisciplinary workshop on the role of failure in HRI", ACM/IEEE International Conference on Human-Robot Interaction HRI 2023. Video can be found at: https://sites.google.com/view/hri-failure-ws/teaser-videos 
Viaarxiv icon

Data-driven Grip Force Variation in Robot-Human Handovers

Mar 28, 2023
Parag Khanna, Mårten Björkman, Christian Smith

Figure 1 for Data-driven Grip Force Variation in Robot-Human Handovers
Figure 2 for Data-driven Grip Force Variation in Robot-Human Handovers
Figure 3 for Data-driven Grip Force Variation in Robot-Human Handovers
Figure 4 for Data-driven Grip Force Variation in Robot-Human Handovers

Handovers frequently occur in our social environments, making it imperative for a collaborative robotic system to master the skill of handover. In this work, we aim to investigate the relationship between the grip force variation for a human giver and the sensed interaction force-torque in human-human handovers, utilizing a data-driven approach. A Long-Short Term Memory (LSTM) network was trained to use the interaction force-torque in a handover to predict the human grip force variation in advance. Further, we propose to utilize the trained network to cause human-like grip force variation for a robotic giver.

* Contributed to "Advances in Close Proximity Human-Robot Collaboration" Workshop in 2022 IEEE-RAS International Conference on Humanoid Robots (Humanoids 2022) 
Viaarxiv icon

On the Lipschitz Constant of Deep Networks and Double Descent

Feb 16, 2023
Matteo Gamba, Hossein Azizpour, Mårten Björkman

Figure 1 for On the Lipschitz Constant of Deep Networks and Double Descent
Figure 2 for On the Lipschitz Constant of Deep Networks and Double Descent
Figure 3 for On the Lipschitz Constant of Deep Networks and Double Descent
Figure 4 for On the Lipschitz Constant of Deep Networks and Double Descent

Existing bounds on the generalization error of deep networks assume some form of smooth or bounded dependence on the input variable, falling short of investigating the mechanisms controlling such factors in practice. In this work, we present an extensive experimental study of the empirical Lipschitz constant of deep networks undergoing double descent, and highlight non-monotonic trends strongly correlating with the test error. Building a connection between parameter-space and input-space gradients for SGD around a critical point, we isolate two important factors -- namely loss landscape curvature and distance of parameters from initialization -- respectively controlling optimization dynamics around a critical point and bounding model function complexity, even beyond the training data. Our study presents novels insights on implicit regularization via overparameterization, and effective model complexity for networks trained in practice.

Viaarxiv icon

Deep Double Descent via Smooth Interpolation

Sep 21, 2022
Matteo Gamba, Erik Englesson, Mårten Björkman, Hossein Azizpour

Figure 1 for Deep Double Descent via Smooth Interpolation
Figure 2 for Deep Double Descent via Smooth Interpolation
Figure 3 for Deep Double Descent via Smooth Interpolation
Figure 4 for Deep Double Descent via Smooth Interpolation

Overparameterized deep networks are known to be able to perfectly fit the training data while at the same time showing good generalization performance. A common paradigm drawn from intuition on linear regression suggests that large networks are able to interpolate even noisy data, without considerably deviating from the ground-truth signal. At present, a precise characterization of this phenomenon is missing. In this work, we present an empirical study of sharpness of the loss landscape of deep networks as we systematically control the number of model parameters and training epochs. We extend our study to neighbourhoods of the training data, as well as around cleanly- and noisily-labelled samples. Our findings show that the loss sharpness in the input space follows both model- and epoch-wise double descent, with worse peaks observed around noisy labels. While small interpolating models sharply fit both clean and noisy data, large models express a smooth and flat loss landscape, in contrast with existing intuition.

Viaarxiv icon