Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bernard Ghanem

Enhancing Online Continual Learning with Plug-and-Play State Space Model and Class-Conditional Mixture of Discretization

Dec 24, 2024

Sihao Liu, Yibo Yang, Xiaojie Li, David A. Clifton, Bernard Ghanem

Figure 1 for Enhancing Online Continual Learning with Plug-and-Play State Space Model and Class-Conditional Mixture of Discretization

Figure 2 for Enhancing Online Continual Learning with Plug-and-Play State Space Model and Class-Conditional Mixture of Discretization

Figure 3 for Enhancing Online Continual Learning with Plug-and-Play State Space Model and Class-Conditional Mixture of Discretization

Figure 4 for Enhancing Online Continual Learning with Plug-and-Play State Space Model and Class-Conditional Mixture of Discretization

Abstract:Online continual learning (OCL) seeks to learn new tasks from data streams that appear only once, while retaining knowledge of previously learned tasks. Most existing methods rely on replay, focusing on enhancing memory retention through regularization or distillation. However, they often overlook the adaptability of the model, limiting the ability to learn generalizable and discriminative features incrementally from online training data. To address this, we introduce a plug-and-play module, S6MOD, which can be integrated into most existing methods and directly improve adaptability. Specifically, S6MOD introduces an extra branch after the backbone, where a mixture of discretization selectively adjusts parameters in a selective state space model, enriching selective scan patterns such that the model can adaptively select the most sensitive discretization method for current dynamics. We further design a class-conditional routing algorithm for dynamic, uncertainty-based adjustment and implement a contrastive discretization loss to optimize it. Extensive experiments combining our module with various models demonstrate that S6MOD significantly enhances model adaptability, leading to substantial performance gains and achieving the state-of-the-art results.

Via

Access Paper or Ask Questions

MatchDiffusion: Training-free Generation of Match-cuts

Nov 27, 2024

Alejandro Pardo, Fabio Pizzati, Tong Zhang, Alexander Pondaven, Philip Torr, Juan Camilo Perez, Bernard Ghanem

Figure 1 for MatchDiffusion: Training-free Generation of Match-cuts

Figure 2 for MatchDiffusion: Training-free Generation of Match-cuts

Figure 3 for MatchDiffusion: Training-free Generation of Match-cuts

Figure 4 for MatchDiffusion: Training-free Generation of Match-cuts

Abstract:Match-cuts are powerful cinematic tools that create seamless transitions between scenes, delivering strong visual and metaphorical connections. However, crafting match-cuts is a challenging, resource-intensive process requiring deliberate artistic planning. In MatchDiffusion, we present the first training-free method for match-cut generation using text-to-video diffusion models. MatchDiffusion leverages a key property of diffusion models: early denoising steps define the scene's broad structure, while later steps add details. Guided by this insight, MatchDiffusion employs "Joint Diffusion" to initialize generation for two prompts from shared noise, aligning structure and motion. It then applies "Disjoint Diffusion", allowing the videos to diverge and introduce unique details. This approach produces visually coherent videos suited for match-cuts. User studies and metrics demonstrate MatchDiffusion's effectiveness and potential to democratize match-cut creation.

* https://matchdiffusion.github.io

Via

Access Paper or Ask Questions

OASIS: Open Agent Social Interaction Simulations with One Million Agents

Nov 26, 2024

Ziyi Yang, Zaibin Zhang, Zirui Zheng, Yuxian Jiang, Ziyue Gan, Zhiyu Wang, Zijian Ling, Jinsong Chen, Martz Ma, Bowen Dong(+13 more)

Figure 1 for OASIS: Open Agent Social Interaction Simulations with One Million Agents

Figure 2 for OASIS: Open Agent Social Interaction Simulations with One Million Agents

Figure 3 for OASIS: Open Agent Social Interaction Simulations with One Million Agents

Figure 4 for OASIS: Open Agent Social Interaction Simulations with One Million Agents

Abstract:There has been a growing interest in enhancing rule-based agent-based models (ABMs) for social media platforms (i.e., X, Reddit) with more realistic large language model (LLM) agents, thereby allowing for a more nuanced study of complex systems. As a result, several LLM-based ABMs have been proposed in the past year. While they hold promise, each simulator is specifically designed to study a particular scenario, making it time-consuming and resource-intensive to explore other phenomena using the same ABM. Additionally, these models simulate only a limited number of agents, whereas real-world social media platforms involve millions of users. To this end, we propose OASIS, a generalizable and scalable social media simulator. OASIS is designed based on real-world social media platforms, incorporating dynamically updated environments (i.e., dynamic social networks and post information), diverse action spaces (i.e., following, commenting), and recommendation systems (i.e., interest-based and hot-score-based). Additionally, OASIS supports large-scale user simulations, capable of modeling up to one million users. With these features, OASIS can be easily extended to different social media platforms to study large-scale group phenomena and behaviors. We replicate various social phenomena, including information spreading, group polarization, and herd effects across X and Reddit platforms. Moreover, we provide observations of social phenomena at different agent group scales. We observe that the larger agent group scale leads to more enhanced group dynamics and more diverse and helpful agents' opinions. These findings demonstrate OASIS's potential as a powerful tool for studying complex systems in digital environments.

Via

Access Paper or Ask Questions

3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes

Nov 26, 2024

Jan Held, Renaud Vandeghen, Abdullah Hamdi, Adrien Deliege, Anthony Cioppa, Silvio Giancola, Andrea Vedaldi, Bernard Ghanem, Marc Van Droogenbroeck

Figure 1 for 3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes

Figure 2 for 3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes

Figure 3 for 3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes

Figure 4 for 3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes

Abstract:Recent advances in radiance field reconstruction, such as 3D Gaussian Splatting (3DGS), have achieved high-quality novel view synthesis and fast rendering by representing scenes with compositions of Gaussian primitives. However, 3D Gaussians present several limitations for scene reconstruction. Accurately capturing hard edges is challenging without significantly increasing the number of Gaussians, creating a large memory footprint. Moreover, they struggle to represent flat surfaces, as they are diffused in space. Without hand-crafted regularizers, they tend to disperse irregularly around the actual surface. To circumvent these issues, we introduce a novel method, named 3D Convex Splatting (3DCS), which leverages 3D smooth convexes as primitives for modeling geometrically-meaningful radiance fields from multi-view images. Smooth convex shapes offer greater flexibility than Gaussians, allowing for a better representation of 3D scenes with hard edges and dense volumes using fewer primitives. Powered by our efficient CUDA-based rasterizer, 3DCS achieves superior performance over 3DGS on benchmarks such as Mip-NeRF360, Tanks and Temples, and Deep Blending. Specifically, our method attains an improvement of up to 0.81 in PSNR and 0.026 in LPIPS compared to 3DGS while maintaining high rendering speeds and reducing the number of required primitives. Our results highlight the potential of 3D Convex Splatting to become the new standard for high-quality scene reconstruction and novel view synthesis. Project page: convexsplatting.github.io.

* 13 pages, 13 figures, 10 tables

Via

Access Paper or Ask Questions

OASIS: Open Agents Social Interaction Simulations on One Million Agents

Nov 21, 2024

Ziyi Yang, Zaibin Zhang, Zirui Zheng, Yuxian Jiang, Ziyue Gan, Zhiyu Wang, Zijian Ling, Jinsong Chen, Martz Ma, Bowen Dong(+12 more)

Figure 1 for OASIS: Open Agents Social Interaction Simulations on One Million Agents

Figure 2 for OASIS: Open Agents Social Interaction Simulations on One Million Agents

Figure 3 for OASIS: Open Agents Social Interaction Simulations on One Million Agents

Figure 4 for OASIS: Open Agents Social Interaction Simulations on One Million Agents

Via

Access Paper or Ask Questions

Generative Timelines for Instructed Visual Assembly

Nov 19, 2024

Alejandro Pardo, Jui-Hsien Wang, Bernard Ghanem, Josef Sivic, Bryan Russell, Fabian Caba Heilbron

Abstract:The objective of this work is to manipulate visual timelines (e.g. a video) through natural language instructions, making complex timeline editing tasks accessible to non-expert or potentially even disabled users. We call this task Instructed visual assembly. This task is challenging as it requires (i) identifying relevant visual content in the input timeline as well as retrieving relevant visual content in a given input (video) collection, (ii) understanding the input natural language instruction, and (iii) performing the desired edits of the input visual timeline to produce an output timeline. To address these challenges, we propose the Timeline Assembler, a generative model trained to perform instructed visual assembly tasks. The contributions of this work are three-fold. First, we develop a large multimodal language model, which is designed to process visual content, compactly represent timelines and accurately interpret timeline editing instructions. Second, we introduce a novel method for automatically generating datasets for visual assembly tasks, enabling efficient training of our model without the need for human-labeled data. Third, we validate our approach by creating two novel datasets for image and video assembly, demonstrating that the Timeline Assembler substantially outperforms established baseline models, including the recent GPT-4o, in accurately executing complex assembly instructions across various real-world inspired scenarios.

Via

Access Paper or Ask Questions

Deep learning for action spotting in association football videos

Oct 02, 2024

Silvio Giancola, Anthony Cioppa, Bernard Ghanem, Marc Van Droogenbroeck

Figure 1 for Deep learning for action spotting in association football videos

Figure 2 for Deep learning for action spotting in association football videos

Figure 3 for Deep learning for action spotting in association football videos

Figure 4 for Deep learning for action spotting in association football videos

Abstract:The task of action spotting consists in both identifying actions and precisely localizing them in time with a single timestamp in long, untrimmed video streams. Automatically extracting those actions is crucial for many sports applications, including sports analytics to produce extended statistics on game actions, coaching to provide support to video analysts, or fan engagement to automatically overlay content in the broadcast when specific actions occur. However, before 2018, no large-scale datasets for action spotting in sports were publicly available, which impeded benchmarking action spotting methods. In response, our team built the largest dataset and the most comprehensive benchmarks for sports video understanding, under the umbrella of SoccerNet. Particularly, our dataset contains a subset specifically dedicated to action spotting, called SoccerNet Action Spotting, containing more than 550 complete broadcast games annotated with almost all types of actions that can occur in a football game. This dataset is tailored to develop methods for automatic spotting of actions of interest, including deep learning approaches, by providing a large amount of manually annotated actions. To engage with the scientific community, the SoccerNet initiative organizes yearly challenges, during which participants from all around the world compete to achieve state-of-the-art performances. Thanks to our dataset and challenges, more than 60 methods were developed or published over the past five years, improving on the first baselines and making action spotting a viable option for the sports industry. This paper traces the history of action spotting in sports, from the creation of the task back in 2018, to the role it plays today in research and the sports industry.

* 31 pages, 2 figures, 5 tables

Via

Access Paper or Ask Questions

SoccerNet 2024 Challenges Results

Sep 16, 2024

Anthony Cioppa, Silvio Giancola, Vladimir Somers, Victor Joos, Floriane Magera, Jan Held, Seyed Abolfazl Ghasemzadeh, Xin Zhou, Karolina Seweryn, Mateusz Kowalczyk(+74 more)

Figure 1 for SoccerNet 2024 Challenges Results

Figure 2 for SoccerNet 2024 Challenges Results

Figure 3 for SoccerNet 2024 Challenges Results

Figure 4 for SoccerNet 2024 Challenges Results

Abstract:The SoccerNet 2024 challenges represent the fourth annual video understanding challenges organized by the SoccerNet team. These challenges aim to advance research across multiple themes in football, including broadcast video understanding, field understanding, and player understanding. This year, the challenges encompass four vision-based tasks. (1) Ball Action Spotting, focusing on precisely localizing when and which soccer actions related to the ball occur, (2) Dense Video Captioning, focusing on describing the broadcast with natural language and anchored timestamps, (3) Multi-View Foul Recognition, a novel task focusing on analyzing multiple viewpoints of a potential foul incident to classify whether a foul occurred and assess its severity, (4) Game State Reconstruction, another novel task focusing on reconstructing the game state from broadcast videos onto a 2D top-view map of the field. Detailed information about the tasks, challenges, and leaderboards can be found at https://www.soccer-net.org, with baselines and development kits available at https://github.com/SoccerNet.

* 7 pages, 1 figure

Via

Access Paper or Ask Questions

Deep Learning at the Intersection: Certified Robustness as a Tool for 3D Vision

Aug 23, 2024

Gabriel Pérez S, Juan C. Pérez, Motasem Alfarra, Jesús Zarzar, Sara Rojas, Bernard Ghanem, Pablo Arbeláez

Figure 1 for Deep Learning at the Intersection: Certified Robustness as a Tool for 3D Vision

Figure 2 for Deep Learning at the Intersection: Certified Robustness as a Tool for 3D Vision

Figure 3 for Deep Learning at the Intersection: Certified Robustness as a Tool for 3D Vision

Figure 4 for Deep Learning at the Intersection: Certified Robustness as a Tool for 3D Vision

Abstract:This paper presents preliminary work on a novel connection between certified robustness in machine learning and the modeling of 3D objects. We highlight an intriguing link between the Maximal Certified Radius (MCR) of a classifier representing a space's occupancy and the space's Signed Distance Function (SDF). Leveraging this relationship, we propose to use the certification method of randomized smoothing (RS) to compute SDFs. Since RS' high computational cost prevents its practical usage as a way to compute SDFs, we propose an algorithm to efficiently run RS in low-dimensional applications, such as 3D space, by expressing RS' fundamental operations as Gaussian smoothing on pre-computed voxel grids. Our approach offers an innovative and practical tool to compute SDFs, validated through proof-of-concept experiments in novel view synthesis. This paper bridges two previously disparate areas of machine learning, opening new avenues for further exploration and potential cross-domain advancements.

* This paper is an accepted extended abstract to the LatinX workshop at ICCV 2023. This was uploaded a year late

Via

Access Paper or Ask Questions

TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks

Aug 20, 2024

Jinjie Mai, Wenxuan Zhu, Sara Rojas, Jesus Zarzar, Abdullah Hamdi, Guocheng Qian, Bing Li, Silvio Giancola, Bernard Ghanem

Figure 1 for TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks

Figure 2 for TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks

Figure 3 for TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks

Figure 4 for TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks

Abstract:Neural radiance fields (NeRFs) generally require many images with accurate poses for accurate novel view synthesis, which does not reflect realistic setups where views can be sparse and poses can be noisy. Previous solutions for learning NeRFs with sparse views and noisy poses only consider local geometry consistency with pairs of views. Closely following \textit{bundle adjustment} in Structure-from-Motion (SfM), we introduce TrackNeRF for more globally consistent geometry reconstruction and more accurate pose optimization. TrackNeRF introduces \textit{feature tracks}, \ie connected pixel trajectories across \textit{all} visible views that correspond to the \textit{same} 3D points. By enforcing reprojection consistency among feature tracks, TrackNeRF encourages holistic 3D consistency explicitly. Through extensive experiments, TrackNeRF sets a new benchmark in noisy and sparse view reconstruction. In particular, TrackNeRF shows significant improvements over the state-of-the-art BARF and SPARF by $\sim8$ and $\sim1$ in terms of PSNR on DTU under various sparse and noisy view setups. The code is available at \href{https://tracknerf.github.io/}.

* ECCV 2024 (supplemental pages included)

Via

Access Paper or Ask Questions