Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chen Chen

Department of Radiology, Zhejiang Cancer Hospital, Hangzhou, 310022, China, Hangzhou Institute of Medicine

Performance Analysis for Hybrid mmWave and THz Networks with Downlink and Uplink Decoupled Cell Association

Aug 10, 2023

Yunbai Wang, Chen Chen, Xiaoli Chu

Abstract:It is expected that B5G/6G networks will exploit both terahertz (THz) and millimetre wave (mmWave) frequency bands and will increase flexibility in user equipment (UE)-cell association. In this paper, we introduce a novel stochastic geometry-based framework for the analysis of the signal-to-interference-plus-noise-ratio (SINR) and rate coverage in a multi-tier hybrid mmWave and THz network, where each tier has a particular base station (BS) density, transmit power, bandwidth, number of BS antennas, and cell-association bias factor. The proposed framework incorporates the effects of mmWave and THz channel characteristics, BS beamforming gain, and blockages. We investigate the downlink (DL) and uplink (UL) decoupled cell-association strategy and characterise the per-tier cell-association probability. Based on that, we analytically derive the SINR and rate coverage probabilities of a typical user for both DL and UL transmissions. The analytical results are validated via extensive Monte Carlo simulations. Numerical results demonstrate the superiority of the DL and UL decoupled cell-association strategy in terms of SINR and rate coverage over its coupled counterpart. Moreover, we observe that the superiority of using the DL and UL decoupled cell-association strategy becomes more evident with the dense deployment of THz networks.

* This paper has been submitted to IEEE for possible publications

Via

Access Paper or Ask Questions

Mystique: Deconstructing SVG Charts for Layout Reuse

Aug 09, 2023

Chen Chen, Bongshin Lee, Yunhai Wang, Yunjeong Chang, Zhicheng Liu

Figure 1 for Mystique: Deconstructing SVG Charts for Layout Reuse

Figure 2 for Mystique: Deconstructing SVG Charts for Layout Reuse

Figure 3 for Mystique: Deconstructing SVG Charts for Layout Reuse

Figure 4 for Mystique: Deconstructing SVG Charts for Layout Reuse

Abstract:To facilitate the reuse of existing charts, previous research has examined how to obtain a semantic understanding of a chart by deconstructing its visual representation into reusable components, such as encodings. However, existing deconstruction approaches primarily focus on chart styles, handling only basic layouts. In this paper, we investigate how to deconstruct chart layouts, focusing on rectangle-based ones, as they cover not only 17 chart types but also advanced layouts (e.g., small multiples, nested layouts). We develop an interactive tool, called Mystique, adopting a mixed-initiative approach to extract the axes and legend, and deconstruct a chart's layout into four semantic components: mark groups, spatial relationships, data encodings, and graphical constraints. Mystique employs a wizard interface that guides chart authors through a series of steps to specify how the deconstructed components map to their own data. On 150 rectangle-based SVG charts, Mystique achieves above 85% accuracy for axis and legend extraction and 96% accuracy for layout deconstruction. In a chart reproduction study, participants could easily reuse existing charts on new datasets. We discuss the current limitations of Mystique and future research directions.

* To appear at the 2023 IEEE Visualization Conference

Via

Access Paper or Ask Questions

Learning Snippet-to-Motion Progression for Skeleton-based Human Motion Prediction

Jul 26, 2023

Xinshun Wang, Qiongjie Cui, Chen Chen, Shen Zhao, Mengyuan Liu

Abstract:Existing Graph Convolutional Networks to achieve human motion prediction largely adopt a one-step scheme, which output the prediction straight from history input, failing to exploit human motion patterns. We observe that human motions have transitional patterns and can be split into snippets representative of each transition. Each snippet can be reconstructed from its starting and ending poses referred to as the transitional poses. We propose a snippet-to-motion multi-stage framework that breaks motion prediction into sub-tasks easier to accomplish. Each sub-task integrates three modules: transitional pose prediction, snippet reconstruction, and snippet-to-motion prediction. Specifically, we propose to first predict only the transitional poses. Then we use them to reconstruct the corresponding snippets, obtaining a close approximation to the true motion sequence. Finally we refine them to produce the final prediction output. To implement the network, we propose a novel unified graph modeling, which allows for direct and effective feature propagation compared to existing approaches which rely on separate space-time modeling. Extensive experiments on Human 3.6M, CMU Mocap and 3DPW datasets verify the effectiveness of our method which achieves state-of-the-art performance.

Via

Access Paper or Ask Questions

LAMP: Leveraging Language Prompts for Multi-person Pose Estimation

Jul 26, 2023

Shengnan Hu, Ce Zheng, Zixiang Zhou, Chen Chen, Gita Sukthankar

Figure 1 for LAMP: Leveraging Language Prompts for Multi-person Pose Estimation

Figure 2 for LAMP: Leveraging Language Prompts for Multi-person Pose Estimation

Figure 3 for LAMP: Leveraging Language Prompts for Multi-person Pose Estimation

Figure 4 for LAMP: Leveraging Language Prompts for Multi-person Pose Estimation

Abstract:Human-centric visual understanding is an important desideratum for effective human-robot interaction. In order to navigate crowded public places, social robots must be able to interpret the activity of the surrounding humans. This paper addresses one key aspect of human-centric visual understanding, multi-person pose estimation. Achieving good performance on multi-person pose estimation in crowded scenes is difficult due to the challenges of occluded joints and instance separation. In order to tackle these challenges and overcome the limitations of image features in representing invisible body parts, we propose a novel prompt-based pose inference strategy called LAMP (Language Assisted Multi-person Pose estimation). By utilizing the text representations generated by a well-trained language model (CLIP), LAMP can facilitate the understanding of poses on the instance and joint levels, and learn more robust visual representations that are less susceptible to occlusion. This paper demonstrates that language-supervised training boosts the performance of single-stage multi-person pose estimation, and both instance-level and joint-level prompts are valuable for training. The code is available at https://github.com/shengnanh20/LAMP.

Via

Access Paper or Ask Questions

Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning

Jul 21, 2023

Jian Ma, Junhao Liang, Chen Chen, Haonan Lu

Abstract:Recent progress in personalized image generation using diffusion models has been significant. However, development in the area of open-domain and non-fine-tuning personalized image generation is proceeding rather slowly. In this paper, we propose Subject-Diffusion, a novel open-domain personalized image generation model that, in addition to not requiring test-time fine-tuning, also only requires a single reference image to support personalized generation of single- or multi-subject in any domain. Firstly, we construct an automatic data labeling tool and use the LAION-Aesthetics dataset to construct a large-scale dataset consisting of 76M images and their corresponding subject detection bounding boxes, segmentation masks and text descriptions. Secondly, we design a new unified framework that combines text and image semantics by incorporating coarse location and fine-grained reference image control to maximize subject fidelity and generalization. Furthermore, we also adopt an attention control mechanism to support multi-subject generation. Extensive qualitative and quantitative results demonstrate that our method outperforms other SOTA frameworks in single, multiple, and human customized image generation. Please refer to our \href{https://oppo-mente-lab.github.io/subject_diffusion/}{project page}

* 14 pages, 10 figures

Via

Access Paper or Ask Questions

AlignDet: Aligning Pre-training and Fine-tuning in Object Detection

Jul 20, 2023

Ming Li, Jie Wu, Xionghui Wang, Chen Chen, Jie Qin, Xuefeng Xiao, Rui Wang, Min Zheng, Xin Pan

Abstract:The paradigm of large-scale pre-training followed by downstream fine-tuning has been widely employed in various object detection algorithms. In this paper, we reveal discrepancies in data, model, and task between the pre-training and fine-tuning procedure in existing practices, which implicitly limit the detector's performance, generalization ability, and convergence speed. To this end, we propose AlignDet, a unified pre-training framework that can be adapted to various existing detectors to alleviate the discrepancies. AlignDet decouples the pre-training process into two stages, i.e., image-domain and box-domain pre-training. The image-domain pre-training optimizes the detection backbone to capture holistic visual abstraction, and box-domain pre-training learns instance-level semantics and task-aware concepts to initialize the parts out of the backbone. By incorporating the self-supervised pre-trained backbones, we can pre-train all modules for various detectors in an unsupervised paradigm. As depicted in Figure 1, extensive experiments demonstrate that AlignDet can achieve significant improvements across diverse protocols, such as detection algorithm, model backbone, data setting, and training schedule. For example, AlignDet improves FCOS by 5.3 mAP, RetinaNet by 2.1 mAP, Faster R-CNN by 3.3 mAP, and DETR by 2.3 mAP under fewer epochs.

* Accepted by ICCV 2023. Code and Models are publicly available. Project Page: https://liming-ai.github.io/AlignDet

Via

Access Paper or Ask Questions

M-FLAG: Medical Vision-Language Pre-training with Frozen Language Models and Latent Space Geometry Optimization

Jul 19, 2023

Che Liu, Sibo Cheng, Chen Chen, Mengyun Qiao, Weitong Zhang, Anand Shah, Wenjia Bai, Rossella Arcucci

Figure 1 for M-FLAG: Medical Vision-Language Pre-training with Frozen Language Models and Latent Space Geometry Optimization

Figure 2 for M-FLAG: Medical Vision-Language Pre-training with Frozen Language Models and Latent Space Geometry Optimization

Figure 3 for M-FLAG: Medical Vision-Language Pre-training with Frozen Language Models and Latent Space Geometry Optimization

Figure 4 for M-FLAG: Medical Vision-Language Pre-training with Frozen Language Models and Latent Space Geometry Optimization

Abstract:Medical vision-language models enable co-learning and integrating features from medical imaging and clinical text. However, these models are not easy to train and the latent representation space can be complex. Here we propose a novel way for pre-training and regularising medical vision-language models. The proposed method, named Medical vision-language pre-training with Frozen language models and Latent spAce Geometry optimization (M-FLAG), leverages a frozen language model for training stability and efficiency and introduces a novel orthogonality loss to harmonize the latent space geometry. We demonstrate the potential of the pre-trained model on three downstream tasks: medical image classification, segmentation, and object detection. Extensive experiments across five public datasets demonstrate that M-FLAG significantly outperforms existing medical vision-language pre-training approaches and reduces the number of parameters by 78\%. Notably, M-FLAG achieves outstanding performance on the segmentation task while using only 1\% of the RSNA dataset, even outperforming ImageNet pre-trained models that have been fine-tuned using 100\% of the data.

* Accepted by MICCAI 2023

Via

Access Paper or Ask Questions

A Look into Causal Effects under Entangled Treatment in Graphs: Investigating the Impact of Contact on MRSA Infection

Jul 17, 2023

Jing Ma, Chen Chen, Anil Vullikanti, Ritwick Mishra, Gregory Madden, Daniel Borrajo, Jundong Li

Abstract:Methicillin-resistant Staphylococcus aureus (MRSA) is a type of bacteria resistant to certain antibiotics, making it difficult to prevent MRSA infections. Among decades of efforts to conquer infectious diseases caused by MRSA, many studies have been proposed to estimate the causal effects of close contact (treatment) on MRSA infection (outcome) from observational data. In this problem, the treatment assignment mechanism plays a key role as it determines the patterns of missing counterfactuals -- the fundamental challenge of causal effect estimation. Most existing observational studies for causal effect learning assume that the treatment is assigned individually for each unit. However, on many occasions, the treatments are pairwisely assigned for units that are connected in graphs, i.e., the treatments of different units are entangled. Neglecting the entangled treatments can impede the causal effect estimation. In this paper, we study the problem of causal effect estimation with treatment entangled in a graph. Despite a few explorations for entangled treatments, this problem still remains challenging due to the following challenges: (1) the entanglement brings difficulties in modeling and leveraging the unknown treatment assignment mechanism; (2) there may exist hidden confounders which lead to confounding biases in causal effect estimation; (3) the observational data is often time-varying. To tackle these challenges, we propose a novel method NEAT, which explicitly leverages the graph structure to model the treatment assignment mechanism, and mitigates confounding biases based on the treatment assignment modeling. We also extend our method into a dynamic setting to handle time-varying observational data. Experiments on both synthetic datasets and a real-world MRSA dataset validate the effectiveness of the proposed method, and provide insights for future applications.

Via

Access Paper or Ask Questions

Noise-aware Speech Enhancement using Diffusion Probabilistic Model

Jul 16, 2023

Yuchen Hu, Chen Chen, Ruizhe Li, Qiushi Zhu, Eng Siong Chng

Figure 1 for Noise-aware Speech Enhancement using Diffusion Probabilistic Model

Figure 2 for Noise-aware Speech Enhancement using Diffusion Probabilistic Model

Figure 3 for Noise-aware Speech Enhancement using Diffusion Probabilistic Model

Figure 4 for Noise-aware Speech Enhancement using Diffusion Probabilistic Model

Abstract:With recent advances of diffusion model, generative speech enhancement (SE) has attracted a surge of research interest due to its great potential for unseen testing noises. However, existing efforts mainly focus on inherent properties of clean speech for inference, underexploiting the varying noise information in real-world conditions. In this paper, we propose a noise-aware speech enhancement (NASE) approach that extracts noise-specific information to guide the reverse process in diffusion model. Specifically, we design a noise classification (NC) model to produce acoustic embedding as a noise conditioner for guiding the reverse denoising process. Meanwhile, a multi-task learning scheme is devised to jointly optimize SE and NC tasks, in order to enhance the noise specificity of extracted noise conditioner. Our proposed NASE is shown to be a plug-and-play module that can be generalized to any diffusion SE models. Experiment evidence on VoiceBank-DEMAND dataset shows that NASE achieves significant improvement over multiple mainstream diffusion SE models, especially on unseen testing noises.

* 5 pages, 2 figures

Via

Access Paper or Ask Questions

Separate-and-Aggregate: A Transformer-based Patch Refinement Model for Knowledge Graph Completion

Jul 11, 2023

Chen Chen, Yufei Wang, Yang Zhang, Quan Z. Sheng, Kwok-Yan Lam

Abstract:Knowledge graph completion (KGC) is the task of inferencing missing facts from any given knowledge graphs (KG). Previous KGC methods typically represent knowledge graph entities and relations as trainable continuous embeddings and fuse the embeddings of the entity $h$ (or $t$) and relation $r$ into hidden representations of query $(h, r, ?)$ (or $(?, r, t$)) to approximate the missing entities. To achieve this, they either use shallow linear transformations or deep convolutional modules. However, the linear transformations suffer from the expressiveness issue while the deep convolutional modules introduce unnecessary inductive bias, which could potentially degrade the model performance. Thus, we propose a novel Transformer-based Patch Refinement Model (PatReFormer) for KGC. PatReFormer first segments the embedding into a sequence of patches and then employs cross-attention modules to allow bi-directional embedding feature interaction between the entities and relations, leading to a better understanding of the underlying KG. We conduct experiments on four popular KGC benchmarks, WN18RR, FB15k-237, YAGO37 and DB100K. The experimental results show significant performance improvement from existing KGC methods on standard KGC evaluation metrics, e.g., MRR and H@n. Our analysis first verifies the effectiveness of our model design choices in PatReFormer. We then find that PatReFormer can better capture KG information from a large relation embedding dimension. Finally, we demonstrate that the strength of PatReFormer is at complex relation types, compared to other KGC models

* Accepted by ADMA 2023, oral

Via

Access Paper or Ask Questions