Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Information": models, code, and papers

Progressive Feature Self-reinforcement for Weakly Supervised Semantic Segmentation

Dec 18, 2023
Jingxuan He, Lechao Cheng, Chaowei Fang, Zunlei Feng, Tingting Mu, Mingli Song

Compared to conventional semantic segmentation with pixel-level supervision, Weakly Supervised Semantic Segmentation (WSSS) with image-level labels poses the challenge that it always focuses on the most discriminative regions, resulting in a disparity between fully supervised conditions. A typical manifestation is the diminished precision on the object boundaries, leading to a deteriorated accuracy of WSSS. To alleviate this issue, we propose to adaptively partition the image content into deterministic regions (e.g., confident foreground and background) and uncertain regions (e.g., object boundaries and misclassified categories) for separate processing. For uncertain cues, we employ an activation-based masking strategy and seek to recover the local information with self-distilled knowledge. We further assume that the unmasked confident regions should be robust enough to preserve the global semantics. Building upon this, we introduce a complementary self-enhancement method that constrains the semantic consistency between these confident regions and an augmented image with the same class labels. Extensive experiments conducted on PASCAL VOC 2012 and MS COCO 2014 demonstrate that our proposed single-stage approach for WSSS not only outperforms state-of-the-art benchmarks remarkably but also surpasses multi-stage methodologies that trade complexity for accuracy. The code can be found at \url{https://github.com/Jessie459/feature-self-reinforcement}.

* Accepted by AAAI 2024

Via

Access Paper or Ask Questions

Painterly Image Harmonization by Learning from Painterly Objects

Dec 15, 2023
Li Niu, Junyan Cao, Yan Hong, Liqing Zhang

Given a composite image with photographic object and painterly background, painterly image harmonization targets at stylizing the composite object to be compatible with the background. Despite the competitive performance of existing painterly harmonization works, they did not fully leverage the painterly objects in artistic paintings. In this work, we explore learning from painterly objects for painterly image harmonization. In particular, we learn a mapping from background style and object information to object style based on painterly objects in artistic paintings. With the learnt mapping, we can hallucinate the target style of composite object, which is used to harmonize encoder feature maps to produce the harmonized image. Extensive experiments on the benchmark dataset demonstrate the effectiveness of our proposed method.

* Accepted by AAAI 2024

Via

Access Paper or Ask Questions

Non-Euclidean Spatial Graph Neural Network

Dec 17, 2023
Zheng Zhang, Sirui Li, Jingcheng Zhou, Junxiang Wang, Abhinav Angirekula, Allen Zhang, Liang Zhao

Spatial networks are networks whose graph topology is constrained by their embedded spatial space. Understanding the coupled spatial-graph properties is crucial for extracting powerful representations from spatial networks. Therefore, merely combining individual spatial and network representations cannot reveal the underlying interaction mechanism of spatial networks. Besides, existing spatial network representation learning methods can only consider networks embedded in Euclidean space, and can not well exploit the rich geometric information carried by irregular and non-uniform non-Euclidean space. In order to address this issue, in this paper we propose a novel generic framework to learn the representation of spatial networks that are embedded in non-Euclidean manifold space. Specifically, a novel message-passing-based neural network is proposed to combine graph topology and spatial geometry, where spatial geometry is extracted as messages on the edges. We theoretically guarantee that the learned representations are provably invariant to important symmetries such as rotation or translation, and simultaneously maintain sufficient ability in distinguishing different geometric structures. The strength of our proposed method is demonstrated through extensive experiments on both synthetic and real-world datasets.

* Accepted by SDM 2024

Via

Access Paper or Ask Questions

Identification of Knowledge Neurons in Protein Language Models

Dec 17, 2023
Divya Nori, Shivali Singireddy, Marina Ten Have

Neural language models have become powerful tools for learning complex representations of entities in natural language processing tasks. However, their interpretability remains a significant challenge, particularly in domains like computational biology where trust in model predictions is crucial. In this work, we aim to enhance the interpretability of protein language models, specifically the state-of-the-art ESM model, by identifying and characterizing knowledge neurons - components that express understanding of key information. After fine-tuning the ESM model for the task of enzyme sequence classification, we compare two knowledge neuron selection methods that preserve a subset of neurons from the original model. The two methods, activation-based and integrated gradient-based selection, consistently outperform a random baseline. In particular, these methods show that there is a high density of knowledge neurons in the key vector prediction networks of self-attention modules. Given that key vectors specialize in understanding different features of input sequences, these knowledge neurons could capture knowledge of different enzyme sequence motifs. In the future, the types of knowledge captured by each neuron could be characterized.

Via

Access Paper or Ask Questions

Light-weight CNN-based VVC Inter Partitioning Acceleration

Dec 17, 2023
Yiqun Liu, Mohsen Abdoli, Thomas Guionnet, Christine Guillemot, Aline Roumy

The Versatile Video Coding (VVC) standard has been finalized by Joint Video Exploration Team (JVET) in 2020. Compared to the High Efficiency Video Coding (HEVC) standard, VVC offers about 50% compression efficiency gain, in terms of Bjontegaard Delta-Rate (BD-rate), at the cost of about 10x more encoder complexity. In this paper, we propose a Convolutional Neural Network (CNN)-based method to speed up inter partitioning in VVC. Our method operates at the Coding Tree Unit (CTU) level, by splitting each CTU into a fixed grid of 8x8 blocks. Then each cell in this grid is associated with information about the partitioning depth within that area. A lightweight network for predicting this grid is employed during the rate-distortion optimization to limit the Quaternary Tree (QT)-split search and avoid partitions that are unlikely to be selected. Experiments show that the proposed method can achieve acceleration ranging from 17% to 30% in the RandomAccess Group Of Picture 32 (RAGOP32) mode of VVC Test Model (VTM)10 with a reasonable efficiency drop ranging from 0.37% to 1.18% in terms of BD-rate increase.

* Accepted by IVMSP

Via

Access Paper or Ask Questions

IntraSeismic: a coordinate-based learning approach to seismic inversion

Dec 17, 2023
Juan Romero, Wolfgang Heidrich, Nick Luiken, Matteo Ravasi

Seismic imaging is the numerical process of creating a volumetric representation of the subsurface geological structures from elastic waves recorded at the surface of the Earth. As such, it is widely utilized in the energy and construction sectors for applications ranging from oil and gas prospection, to geothermal production and carbon capture and storage monitoring, to geotechnical assessment of infrastructures. Extracting quantitative information from seismic recordings, such as an acoustic impedance model, is however a highly ill-posed inverse problem, due to the band-limited and noisy nature of the data. This paper introduces IntraSeismic, a novel hybrid seismic inversion method that seamlessly combines coordinate-based learning with the physics of the post-stack modeling operator. Key features of IntraSeismic are i) unparalleled performance in 2D and 3D post-stack seismic inversion, ii) rapid convergence rates, iii) ability to seamlessly include hard constraints (i.e., well data) and perform uncertainty quantification, and iv) potential data compression and fast randomized access to portions of the inverted model. Synthetic and field data applications of IntraSeismic are presented to validate the effectiveness of the proposed method.

* -

Via

Access Paper or Ask Questions

Green Operations of SWIPT Networks: The Role of End-User Devices

Dec 13, 2023
Gianluca Rizzo, Marco Ajmone Marsan, Christian Esposito, Biagio Boi

Internet of Things (IoT) devices often come with batteries of limited capacity that are not easily replaceable or rechargeable, and that constrain significantly the sensing, computing, and communication tasks that they can perform. The Simultaneous Wireless Information and Power Transfer (SWIPT) paradigm addresses this issue by delivering power wirelessly to energy-harvesting IoT devices with the same signal used for information transfer. For their peculiarity, these networks require specific energy-efficient planning and management approaches. However, to date, it is not clear what are the most effective strategies for managing a SWIPT network for energy efficiency. In this paper, we address this issue by developing an analytical model based on stochastic geometry, accounting for the statistics of user-perceived performance and base station scheduling. We formulate an optimization problem for deriving the energy optimal configuration as a function of the main system parameters, and we propose a genetic algorithm approach to solve it. Our results enable a first-order evaluation of the most effective strategies for energy-efficient provisioning of power and communications in a SWIPT network. We show that the service capacity brought about by users brings energy-efficient dynamic network provisioning strategies that radically differ from those of networks with no wireless power transfer.

* The manuscript has already been submitted to Journal on 7-12-2023

Via

Access Paper or Ask Questions

Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models

Dec 09, 2023
Shitian Zhao, Zhuowan Li, Yadong Lu, Alan Yuille, Yan Wang

While Multi-modal Language Models (MLMs) demonstrate impressive multimodal ability, they still struggle on providing factual and precise responses for tasks like visual question answering (VQA). In this paper, we address this challenge from the perspective of contextual information. We propose Causal Context Generation, Causal-CoG, which is a prompting strategy that engages contextual information to enhance precise VQA during inference. Specifically, we prompt MLMs to generate contexts, i.e, text description of an image, and engage the generated contexts for question answering. Moreover, we investigate the advantage of contexts on VQA from a causality perspective, introducing causality filtering to select samples for which contextual information is helpful. To show the effectiveness of Causal-CoG, we run extensive experiments on 10 multimodal benchmarks and show consistent improvements, e.g., +6.30% on POPE, +13.69% on Vizwiz and +6.43% on VQAv2 compared to direct decoding, surpassing existing methods. We hope Casual-CoG inspires explorations of context knowledge in multimodal models, and serves as a plug-and-play strategy for MLM decoding.

Via

Access Paper or Ask Questions

Calibrating Wireless Ray Tracing for Digital Twinning using Local Phase Error Estimates

Dec 19, 2023
Clement Ruah, Osvaldo Simeone, Jakob Hoydis, Bashir Al-Hashimi

Embodying the principle of simulation intelligence, digital twin (DT) systems construct and maintain a high-fidelity virtual model of a physical system. This paper focuses on ray tracing (RT), which is widely seen as an enabling technology for DTs of the radio access network (RAN) segment of next-generation disaggregated wireless systems. RT makes it possible to simulate channel conditions, enabling data augmentation and prediction-based transmission. However, the effectiveness of RT hinges on the adaptation of the electromagnetic properties assumed by the RT to actual channel conditions, a process known as calibration. The main challenge of RT calibration is the fact that small discrepancies in the geometric model fed to the RT software hinder the accuracy of the predicted phases of the simulated propagation paths. Existing solutions to this problem either rely on the channel power profile, hence disregarding phase information, or they operate on the channel responses by assuming the simulated phases to be sufficiently accurate for calibration. This paper proposes a novel channel response-based scheme that, unlike the state of the art, estimates and compensates for the phase errors in the RT-generated channel responses. The proposed approach builds on the variational expectation maximization algorithm with a flexible choice of the prior phase-error distribution that bridges between a deterministic model with no phase errors and a stochastic model with uniform phase errors. The algorithm is computationally efficient, and is demonstrated, by leveraging the open-source differentiable RT software available within the Sionna library, to outperform existing methods in terms of the accuracy of RT predictions.

Via

Access Paper or Ask Questions

Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion

Dec 19, 2023
Fan Zhang, Shaodi You, Yu Li, Ying Fu

Monocular depth estimation has experienced significant progress on terrestrial images in recent years, largely due to deep learning advancements. However, it remains inadequate for underwater scenes, primarily because of data scarcity. Given the inherent challenges of light attenuation and backscattering in water, acquiring clear underwater images or precise depth information is notably difficult and costly. Consequently, learning-based approaches often rely on synthetic data or turn to unsupervised or self-supervised methods to mitigate this lack of data. Nonetheless, the performance of these methods is often constrained by the domain gap and looser constraints. In this paper, we propose a novel pipeline for generating photorealistic underwater images using accurate terrestrial depth data. This approach facilitates the training of supervised models for underwater depth estimation, effectively reducing the performance disparity between terrestrial and underwater environments. Contrary to prior synthetic datasets that merely apply style transfer to terrestrial images without altering the scene content, our approach uniquely creates vibrant, non-existent underwater scenes by leveraging terrestrial depth data through the innovative Stable Diffusion model. Specifically, we introduce a unique Depth2Underwater ControlNet, trained on specially prepared \{Underwater, Depth, Text\} data triplets, for this generation task. Our newly developed dataset enables terrestrial depth estimation models to achieve considerable improvements, both quantitatively and qualitatively, on unseen underwater images, surpassing their terrestrial pre-trained counterparts. Moreover, the enhanced depth accuracy for underwater scenes also aids underwater image restoration techniques that rely on depth maps, further demonstrating our dataset's utility. The dataset will be available at https://github.com/zkawfanx/Atlantis.

* 10 pages

Via

Access Paper or Ask Questions