Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Information": models, code, and papers

Signal Processing Based Antenna Pattern Characterization for MIMO Systems

Aug 18, 2023
Chandan Kumar Sheemar, Jorge Querol, Symeon Chatzinotas

Sophisticated antenna technologies are constantly evolving to meet the escalating data demands projected for 6G and future networks. The characterization of these emerging antenna systems poses challenges that necessitate a reevaluation of conventional techniques, which rely solely on simple measurements conducted in advanced anechoic chambers. In this study, our objective is to introduce a novel endeavour for antenna pattern characterization (APC) in next-generation multiple-input-multiple-output (MIMO) systems by utilizing the potential of signal processing tools. In contrast to traditional methods that struggle with multi-path scenarios and require specialized equipment for measurements, we endeavour to estimate the antenna pattern by exploiting information from both line-of-sight (LoS) and non-LoS contributions. This approach enables antenna pattern characterization in complex environments without the need for anechoic chambers, resulting in substantial cost savings. Furthermore, it grants a much wider research community the ability to independently perform APC for emerging complex 6G antenna systems, without relying on anechoic chambers. Simulation results demonstrate the efficacy of the proposed novel approach in accurately estimating the true antenna pattern.

Via

Access Paper or Ask Questions

MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection

Aug 18, 2023
Junkai Xu, Liang Peng, Haoran Cheng, Hao Li, Wei Qian, Ke Li, Wenxiao Wang, Deng Cai

Figure 1 for MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection

Figure 2 for MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection

Figure 3 for MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection

Figure 4 for MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection

In the field of monocular 3D detection, it is common practice to utilize scene geometric clues to enhance the detector's performance. However, many existing works adopt these clues explicitly such as estimating a depth map and back-projecting it into 3D space. This explicit methodology induces sparsity in 3D representations due to the increased dimensionality from 2D to 3D, and leads to substantial information loss, especially for distant and occluded objects. To alleviate this issue, we propose MonoNeRD, a novel detection framework that can infer dense 3D geometry and occupancy. Specifically, we model scenes with Signed Distance Functions (SDF), facilitating the production of dense 3D representations. We treat these representations as Neural Radiance Fields (NeRF) and then employ volume rendering to recover RGB images and depth maps. To the best of our knowledge, this work is the first to introduce volume rendering for M3D, and demonstrates the potential of implicit reconstruction for image-based 3D perception. Extensive experiments conducted on the KITTI-3D benchmark and Waymo Open Dataset demonstrate the effectiveness of MonoNeRD. Codes are available at https://github.com/cskkxjk/MonoNeRD.

* Accepted by ICCV 2023

Via

Access Paper or Ask Questions

LibreFace: An Open-Source Toolkit for Deep Facial Expression Analysis

Aug 18, 2023
Di Chang, Yufeng Yin, Zongjian Li, Minh Tran, Mohammad Soleymani

Figure 1 for LibreFace: An Open-Source Toolkit for Deep Facial Expression Analysis

Figure 2 for LibreFace: An Open-Source Toolkit for Deep Facial Expression Analysis

Figure 3 for LibreFace: An Open-Source Toolkit for Deep Facial Expression Analysis

Figure 4 for LibreFace: An Open-Source Toolkit for Deep Facial Expression Analysis

Facial expression analysis is an important tool for human-computer interaction. In this paper, we introduce LibreFace, an open-source toolkit for facial expression analysis. This open-source toolbox offers real-time and offline analysis of facial behavior through deep learning models, including facial action unit (AU) detection, AU intensity estimation, and facial expression recognition. To accomplish this, we employ several techniques, including the utilization of a large-scale pre-trained network, feature-wise knowledge distillation, and task-specific fine-tuning. These approaches are designed to effectively and accurately analyze facial expressions by leveraging visual information, thereby facilitating the implementation of real-time interactive applications. In terms of Action Unit (AU) intensity estimation, we achieve a Pearson Correlation Coefficient (PCC) of 0.63 on DISFA, which is 7% higher than the performance of OpenFace 2.0 while maintaining highly-efficient inference that runs two times faster than OpenFace 2.0. Despite being compact, our model also demonstrates competitive performance to state-of-the-art facial expression analysis methods on AffecNet, FFHQ, and RAFDB. Our code will be released at https://github.com/ihp-lab/LibreFace

* 10 pages, 5 figures. Accepted by WACV 2024 Round 1. (Application Track)

Via

Access Paper or Ask Questions

Exploiting Point-Wise Attention in 6D Object Pose Estimation Based on Bidirectional Prediction

Aug 17, 2023
Yuhao Yang, Jun Wu, Guangjian Zhang, Rong Xiong

Figure 1 for Exploiting Point-Wise Attention in 6D Object Pose Estimation Based on Bidirectional Prediction

Figure 2 for Exploiting Point-Wise Attention in 6D Object Pose Estimation Based on Bidirectional Prediction

Figure 3 for Exploiting Point-Wise Attention in 6D Object Pose Estimation Based on Bidirectional Prediction

Figure 4 for Exploiting Point-Wise Attention in 6D Object Pose Estimation Based on Bidirectional Prediction

Traditional geometric registration based estimation methods only exploit the CAD model implicitly, which leads to their dependence on observation quality and deficiency to occlusion. To address the problem,the paper proposes a bidirectional correspondence prediction network with a point-wise attention-aware mechanism. This network not only requires the model points to predict the correspondence but also explicitly models the geometric similarities between observations and the model prior. Our key insight is that the correlations between each model point and scene point provide essential information for learning point-pair matches. To further tackle the correlation noises brought by feature distribution divergence, we design a simple but effective pseudo-siamese network to improve feature homogeneity. Experimental results on the public datasets of LineMOD, YCB-Video, and Occ-LineMOD show that the proposed method achieves better performance than other state-of-the-art methods under the same evaluation criteria. Its robustness in estimating poses is greatly improved, especially in an environment with severe occlusions.

Via

Access Paper or Ask Questions

The Unreasonable Effectiveness of Large Language-Vision Models for Source-free Video Domain Adaptation

Aug 17, 2023
Giacomo Zara, Alessandro Conti, Subhankar Roy, Stéphane Lathuilière, Paolo Rota, Elisa Ricci

Figure 1 for The Unreasonable Effectiveness of Large Language-Vision Models for Source-free Video Domain Adaptation

Figure 2 for The Unreasonable Effectiveness of Large Language-Vision Models for Source-free Video Domain Adaptation

Figure 3 for The Unreasonable Effectiveness of Large Language-Vision Models for Source-free Video Domain Adaptation

Figure 4 for The Unreasonable Effectiveness of Large Language-Vision Models for Source-free Video Domain Adaptation

Source-Free Video Unsupervised Domain Adaptation (SFVUDA) methods consists in the task of adapting an action recognition model, trained on a labelled source dataset, to an unlabelled target dataset, without accessing the actual source data. Previous approaches have attempted to address SFVUDA by leveraging self-supervision (e.g., enforcing temporal consistency) derived from the target data itself. In this work we take an orthogonal approach by exploiting "web-supervision" from Large Language-Vision Models (LLVMs), driven by the rationale that LLVMs contain rich world prior, which is surprisingly robust to domain-shift. We showcase the unreasonable effectiveness of integrating LLVMs for SFVUDA by devising an intuitive and parameter efficient method, which we name as Domain Adaptation with Large Language-Vision models (DALL-V), that distills the world prior and complementary source model information into a student network tailored for the target. Despite the simplicity, DALL-V achieves significant improvement over state-of-the-art SFVUDA methods.

* Accepted at ICCV2023, 14 pages, 7 figures, code is available at https://github.com/giaczara/dallv

Via

Access Paper or Ask Questions

SDDNet: Style-guided Dual-layer Disentanglement Network for Shadow Detection

Aug 17, 2023
Runmin Cong, Yuchen Guan, Jinpeng Chen, Wei Zhang, Yao Zhao, Sam Kwong

Figure 1 for SDDNet: Style-guided Dual-layer Disentanglement Network for Shadow Detection

Figure 2 for SDDNet: Style-guided Dual-layer Disentanglement Network for Shadow Detection

Figure 3 for SDDNet: Style-guided Dual-layer Disentanglement Network for Shadow Detection

Figure 4 for SDDNet: Style-guided Dual-layer Disentanglement Network for Shadow Detection

Despite significant progress in shadow detection, current methods still struggle with the adverse impact of background color, which may lead to errors when shadows are present on complex backgrounds. Drawing inspiration from the human visual system, we treat the input shadow image as a composition of a background layer and a shadow layer, and design a Style-guided Dual-layer Disentanglement Network (SDDNet) to model these layers independently. To achieve this, we devise a Feature Separation and Recombination (FSR) module that decomposes multi-level features into shadow-related and background-related components by offering specialized supervision for each component, while preserving information integrity and avoiding redundancy through the reconstruction constraint. Moreover, we propose a Shadow Style Filter (SSF) module to guide the feature disentanglement by focusing on style differentiation and uniformization. With these two modules and our overall pipeline, our model effectively minimizes the detrimental effects of background color, yielding superior performance on three public datasets with a real-time inference speed of 32 FPS.

* Accepted by ACM MM 2023

Via

Access Paper or Ask Questions

Toward Fine Contact Interactions: Learning to Control Normal Contact Force with Limited Information

May 29, 2023
Jinda Cui, Jiawei Xu, David Saldaña, Jeff Trinkle

Figure 1 for Toward Fine Contact Interactions: Learning to Control Normal Contact Force with Limited Information

Figure 2 for Toward Fine Contact Interactions: Learning to Control Normal Contact Force with Limited Information

Figure 3 for Toward Fine Contact Interactions: Learning to Control Normal Contact Force with Limited Information

Figure 4 for Toward Fine Contact Interactions: Learning to Control Normal Contact Force with Limited Information

Dexterous manipulation of objects through fine control of physical contacts is essential for many important tasks of daily living. A fundamental ability underlying fine contact control is compliant control, \textit{i.e.}, controlling the contact forces while moving. For robots, the most widely explored approaches heavily depend on models of manipulated objects and expensive sensors to gather contact location and force information needed for real-time control. The models are difficult to obtain, and the sensors are costly, hindering personal robots' adoption in our homes and businesses. This study performs model-free reinforcement learning of a normal contact force controller on a robotic manipulation system built with a low-cost, information-poor tactile sensor. Despite the limited sensing capability, our force controller can be combined with a motion controller to enable fine contact interactions during object manipulation. Promising results are demonstrated in non-prehensile, dexterous manipulation experiments.

Via

Access Paper or Ask Questions

MapNeRF: Incorporating Map Priors into Neural Radiance Fields for Driving View Simulation

Aug 06, 2023
Chenming Wu, Jiadai Sun, Zhelun Shen, Liangjun Zhang

Simulating camera sensors is a crucial task in autonomous driving. Although neural radiance fields are exceptional at synthesizing photorealistic views in driving simulations, they still fail to generate extrapolated views. This paper proposes to incorporate map priors into neural radiance fields to synthesize out-of-trajectory driving views with semantic road consistency. The key insight is that map information can be utilized as a prior to guiding the training of the radiance fields with uncertainty. Specifically, we utilize the coarse ground surface as uncertain information to supervise the density field and warp depth with uncertainty from unknown camera poses to ensure multi-view consistency. Experimental results demonstrate that our approach can produce semantic consistency in deviated views for vehicle camera simulation. The supplementary video can be viewed at https://youtu.be/jEQWr-Rfh3A.

* Accepted by IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2023

Via

Access Paper or Ask Questions

Open Information Extraction via Chunks

May 05, 2023
Kuicai Dong, Aixin Sun, Jung-Jae Kim, Xiaoli Li

Figure 1 for Open Information Extraction via Chunks

Figure 2 for Open Information Extraction via Chunks

Figure 3 for Open Information Extraction via Chunks

Figure 4 for Open Information Extraction via Chunks

Open Information Extraction (OIE) aims to extract relational tuples from open-domain sentences. Existing OIE systems split a sentence into tokens and recognize token spans as tuple relations and arguments. We instead propose Sentence as Chunk sequence (SaC) and recognize chunk spans as tuple relations and arguments. We argue that SaC has better quantitative and qualitative properties for OIE than sentence as token sequence, and evaluate four choices of chunks (i.e., CoNLL chunks, simple phrases, NP chunks, and spans from SpanOIE) against gold OIE tuples. Accordingly, we propose a simple BERT-based model for sentence chunking, and propose Chunk-OIE for tuple extraction on top of SaC. Chunk-OIE achieves state-of-the-art results on multiple OIE datasets, showing that SaC benefits OIE task.

Via

Access Paper or Ask Questions

Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling

May 19, 2023
Shengqiong Wu, Hao Fei, Yixin Cao, Lidong Bing, Tat-Seng Chua

Figure 1 for Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling

Figure 2 for Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling

Figure 3 for Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling

Figure 4 for Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling

Existing research on multimodal relation extraction (MRE) faces two co-existing challenges, internal-information over-utilization and external-information under-exploitation. To combat that, we propose a novel framework that simultaneously implements the idea of internal-information screening and external-information exploiting. First, we represent the fine-grained semantic structures of the input image and text with the visual and textual scene graphs, which are further fused into a unified cross-modal graph (CMG). Based on CMG, we perform structure refinement with the guidance of the graph information bottleneck principle, actively denoising the less-informative features. Next, we perform topic modeling over the input image and text, incorporating latent multimodal topic features to enrich the contexts. On the benchmark MRE dataset, our system outperforms the current best model significantly. With further in-depth analyses, we reveal the great potential of our method for the MRE task. Our codes are open at https://github.com/ChocoWu/MRE-ISE.

Via

Access Paper or Ask Questions