Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tao Yu

Fast Segment Anything

Jun 21, 2023

Xu Zhao, Wenchao Ding, Yongqi An, Yinglong Du, Tao Yu, Min Li, Ming Tang, Jinqiao Wang

Abstract:The recently proposed segment anything model (SAM) has made a significant influence in many computer vision tasks. It is becoming a foundation step for many high-level tasks, like image segmentation, image caption, and image editing. However, its huge computation costs prevent it from wider applications in industry scenarios. The computation mainly comes from the Transformer architecture at high-resolution inputs. In this paper, we propose a speed-up alternative method for this fundamental task with comparable performance. By reformulating the task as segments-generation and prompting, we find that a regular CNN detector with an instance segmentation branch can also accomplish this task well. Specifically, we convert this task to the well-studied instance segmentation task and directly train the existing instance segmentation method using only 1/50 of the SA-1B dataset published by SAM authors. With our method, we achieve a comparable performance with the SAM method at 50 times higher run-time speed. We give sufficient experimental results to demonstrate its effectiveness. The codes and demos will be released at https://github.com/CASIA-IVA-Lab/FastSAM.

* Technical Report. The code is released at https://github.com/CASIA-IVA-Lab/FastSAM

Via

Access Paper or Ask Questions

BrainNet: Epileptic Wave Detection from SEEG with Hierarchical Graph Diffusion Learning

Jun 15, 2023

Junru Chen, Yang Yang, Tao Yu, Yingying Fan, Xiaolong Mo, Carl Yang

Figure 1 for BrainNet: Epileptic Wave Detection from SEEG with Hierarchical Graph Diffusion Learning

Figure 2 for BrainNet: Epileptic Wave Detection from SEEG with Hierarchical Graph Diffusion Learning

Figure 3 for BrainNet: Epileptic Wave Detection from SEEG with Hierarchical Graph Diffusion Learning

Figure 4 for BrainNet: Epileptic Wave Detection from SEEG with Hierarchical Graph Diffusion Learning

Abstract:Epilepsy is one of the most serious neurological diseases, affecting 1-2% of the world's population. The diagnosis of epilepsy depends heavily on the recognition of epileptic waves, i.e., disordered electrical brainwave activity in the patient's brain. Existing works have begun to employ machine learning models to detect epileptic waves via cortical electroencephalogram (EEG). However, the recently developed stereoelectrocorticography (SEEG) method provides information in stereo that is more precise than conventional EEG, and has been broadly applied in clinical practice. Therefore, we propose the first data-driven study to detect epileptic waves in a real-world SEEG dataset. While offering new opportunities, SEEG also poses several challenges. In clinical practice, epileptic wave activities are considered to propagate between different regions in the brain. These propagation paths, also known as the epileptogenic network, are deemed to be a key factor in the context of epilepsy surgery. However, the question of how to extract an exact epileptogenic network for each patient remains an open problem in the field of neuroscience. To address these challenges, we propose a novel model (BrainNet) that jointly learns the dynamic diffusion graphs and models the brain wave diffusion patterns. In addition, our model effectively aids in resisting label imbalance and severe noise by employing several self-supervised learning tasks and a hierarchical framework. By experimenting with the extensive real SEEG dataset obtained from multiple patients, we find that BrainNet outperforms several latest state-of-the-art baselines derived from time-series analysis.

Via

Access Paper or Ask Questions

Coneheads: Hierarchy Aware Attention

Jun 01, 2023

Albert Tseng, Tao Yu, Toni J. B. Liu, Christopher De Sa

Figure 1 for Coneheads: Hierarchy Aware Attention

Figure 2 for Coneheads: Hierarchy Aware Attention

Figure 3 for Coneheads: Hierarchy Aware Attention

Figure 4 for Coneheads: Hierarchy Aware Attention

Abstract:Attention networks such as transformers have achieved state-of-the-art performance in many domains. These networks rely heavily on the dot product attention operator, which computes the similarity between two points by taking their inner product. However, the inner product does not explicitly model the complex structural properties of real world datasets, such as hierarchies between data points. To remedy this, we introduce cone attention, a drop-in replacement for dot product attention based on hyperbolic entailment cones. Cone attention associates two points by the depth of their lowest common ancestor in a hierarchy defined by hyperbolic cones, which intuitively measures the divergence of two points and gives a hierarchy aware similarity score. We test cone attention on a wide variety of models and tasks and show that it improves task-level performance over dot product attention and other baselines, and is able to match dot-product attention with significantly fewer parameters. Our results suggest that cone attention is an effective way to capture hierarchical relationships when calculating attention.

Via

Access Paper or Ask Questions

Shadow Cones: Unveiling Partial Orders in Hyperbolic Space

May 24, 2023

Tao Yu, Toni J. B. Liu, Albert Tseng, Christopher De Sa

Abstract:Hyperbolic space has been shown to produce superior low-dimensional embeddings of hierarchical structures that are unattainable in Euclidean space. Building upon this, the entailment cone formulation of Ganea et al. uses geodesically convex cones to embed partial orderings in hyperbolic space. However, these entailment cones lack intuitive interpretations due to their definitions via complex concepts such as tangent vectors and the exponential map in Riemannian space. In this paper, we present shadow cones, an innovative framework that provides a physically intuitive interpretation for defining partial orders on general manifolds. This is achieved through the use of metaphoric light sources and object shadows, inspired by the sun-earth-moon relationship. Shadow cones consist of two primary classes: umbral and penumbral cones. Our results indicate that shadow cones offer robust representation and generalization capabilities across a variety of datasets, such as WordNet and ConceptNet, thereby outperforming the top-performing entailment cones. Our findings indicate that shadow cones offer an innovative, general approach to geometrically encode partial orders, enabling better representation and analysis of datasets with hierarchical structures.

Via

Access Paper or Ask Questions

Generating Data for Symbolic Language with Large Language Models

May 23, 2023

Jiacheng Ye, Chengzu Li, Lingpeng Kong, Tao Yu

Figure 1 for Generating Data for Symbolic Language with Large Language Models

Figure 2 for Generating Data for Symbolic Language with Large Language Models

Figure 3 for Generating Data for Symbolic Language with Large Language Models

Figure 4 for Generating Data for Symbolic Language with Large Language Models

Abstract:While large language models (LLMs) bring not only performance but also complexity, recent work has started to turn LLMs into data generators rather than task inferencers, where another affordable task model is trained for efficient deployment and inference. However, such an approach has primarily been applied to natural language tasks and has not yet been explored for symbolic language tasks with complex structured outputs (e.g., semantic parsing and code generation). In this paper, we propose SymGen which utilizes LLMs for generating various annotation-expensive symbolic language data. SymGen consists of an informative prompt to steer generation and an agreement-based verifier to improve data correctness. We conduct extensive experiments on six symbolic language tasks across various settings. Compared with the LLMs, we demonstrate the 1\%-sized task model can achieve comparable or better performance, largely cutting inference and deployment costs. We also show that generated data with only a few human demonstrations can be as effective as over 10 times the amount of human-annotated data when training the task model, saving a considerable amount of annotation effort. SymGen sheds new light on data generation for complex tasks, and we release the code at \href{https://github.com/HKUNLP/SymGen}{https://github.com/HKUNLP/SymGen}.

Via

Access Paper or Ask Questions

Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression

May 04, 2023

Hao Zhang, Meng Yu, Yuzhong Wu, Tao Yu, Dong Yu

Abstract:Deep learning has been recently introduced for efficient acoustic howling suppression (AHS). However, the recurrent nature of howling creates a mismatch between offline training and streaming inference, limiting the quality of enhanced speech. To address this limitation, we propose a hybrid method that combines a Kalman filter with a self-attentive recurrent neural network (SARNN) to leverage their respective advantages for robust AHS. During offline training, a pre-processed signal obtained from the Kalman filter and an ideal microphone signal generated via teacher-forced training strategy are used to train the deep neural network (DNN). During streaming inference, the DNN's parameters are fixed while its output serves as a reference signal for updating the Kalman filter. Evaluation in both offline and streaming inference scenarios using simulated and real-recorded data shows that the proposed method efficiently suppresses howling and consistently outperforms baselines.

* submitted to INTERSPEECH 2023

Via

Access Paper or Ask Questions

StyleAvatar: Real-time Photo-realistic Portrait Avatar from a Single Video

May 01, 2023

Lizhen Wang, Xiaochen Zhao, Jingxiang Sun, Yuxiang Zhang, Hongwen Zhang, Tao Yu, Yebin Liu

Abstract:Face reenactment methods attempt to restore and re-animate portrait videos as realistically as possible. Existing methods face a dilemma in quality versus controllability: 2D GAN-based methods achieve higher image quality but suffer in fine-grained control of facial attributes compared with 3D counterparts. In this work, we propose StyleAvatar, a real-time photo-realistic portrait avatar reconstruction method using StyleGAN-based networks, which can generate high-fidelity portrait avatars with faithful expression control. We expand the capabilities of StyleGAN by introducing a compositional representation and a sliding window augmentation method, which enable faster convergence and improve translation generalization. Specifically, we divide the portrait scenes into three parts for adaptive adjustments: facial region, non-facial foreground region, and the background. Besides, our network leverages the best of UNet, StyleGAN and time coding for video learning, which enables high-quality video generation. Furthermore, a sliding window augmentation method together with a pre-training strategy are proposed to improve translation generalization and training performance, respectively. The proposed network can converge within two hours while ensuring high image quality and a forward rendering time of only 20 milliseconds. Furthermore, we propose a real-time live system, which further pushes research into applications. Results and experiments demonstrate the superiority of our method in terms of image quality, full portrait video generation, and real-time re-animation compared to existing facial reenactment methods. Training and inference code for this paper are at https://github.com/LizhenWangT/StyleAvatar.

* 8 pages, 5 figures, SIGGRAPH 2023 Conference Proceedings

Via

Access Paper or Ask Questions

Super-NeRF: View-consistent Detail Generation for NeRF super-resolution

Apr 26, 2023

Yuqi Han, Tao Yu, Xiaohang Yu, Yuwang Wang, Qionghai Dai

Figure 1 for Super-NeRF: View-consistent Detail Generation for NeRF super-resolution

Figure 2 for Super-NeRF: View-consistent Detail Generation for NeRF super-resolution

Figure 3 for Super-NeRF: View-consistent Detail Generation for NeRF super-resolution

Figure 4 for Super-NeRF: View-consistent Detail Generation for NeRF super-resolution

Abstract:The neural radiance field (NeRF) achieved remarkable success in modeling 3D scenes and synthesizing high-fidelity novel views. However, existing NeRF-based methods focus more on the make full use of the image resolution to generate novel views, but less considering the generation of details under the limited input resolution. In analogy to the extensive usage of image super-resolution, NeRF super-resolution is an effective way to generate the high-resolution implicit representation of 3D scenes and holds great potential applications. Up to now, such an important topic is still under-explored. In this paper, we propose a NeRF super-resolution method, named Super-NeRF, to generate high-resolution NeRF from only low-resolution inputs. Given multi-view low-resolution images, Super-NeRF constructs a consistency-controlling super-resolution module to generate view-consistent high-resolution details for NeRF. Specifically, an optimizable latent code is introduced for each low-resolution input image to control the 2D super-resolution images to converge to the view-consistent output. The latent codes of each low-resolution image are optimized synergistically with the target Super-NeRF representation to fully utilize the view consistency constraint inherent in NeRF construction. We verify the effectiveness of Super-NeRF on synthetic, real-world, and AI-generated NeRF datasets. Super-NeRF achieves state-of-the-art NeRF super-resolution performance on high-resolution detail generation and cross-view consistency.

Via

Access Paper or Ask Questions

Learning Visibility Field for Detailed 3D Human Reconstruction and Relighting

Apr 24, 2023

Ruichen Zheng, Peng Li, Haoqian Wang, Tao Yu

Abstract:Detailed 3D reconstruction and photo-realistic relighting of digital humans are essential for various applications. To this end, we propose a novel sparse-view 3d human reconstruction framework that closely incorporates the occupancy field and albedo field with an additional visibility field--it not only resolves occlusion ambiguity in multiview feature aggregation, but can also be used to evaluate light attenuation for self-shadowed relighting. To enhance its training viability and efficiency, we discretize visibility onto a fixed set of sample directions and supply it with coupled geometric 3D depth feature and local 2D image feature. We further propose a novel rendering-inspired loss, namely TransferLoss, to implicitly enforce the alignment between visibility and occupancy field, enabling end-to-end joint training. Results and extensive experiments demonstrate the effectiveness of the proposed method, as it surpasses state-of-the-art in terms of reconstruction accuracy while achieving comparably accurate relighting to ray-traced ground truth.

* 8 pages, 10 figures, published to CVPR2023

Via

Access Paper or Ask Questions

The Seven Worlds and Experiences of the Wireless Metaverse: Challenges and Opportunities

Apr 20, 2023

Omar Hashash, Christina Chaccour, Walid Saad, Tao Yu, Kei Sakaguchi, Merouane Debbah

Figure 1 for The Seven Worlds and Experiences of the Wireless Metaverse: Challenges and Opportunities

Figure 2 for The Seven Worlds and Experiences of the Wireless Metaverse: Challenges and Opportunities

Figure 3 for The Seven Worlds and Experiences of the Wireless Metaverse: Challenges and Opportunities

Abstract:The wireless metaverse will create diverse user experiences at the intersection of the physical, digital, and virtual worlds. These experiences will enable novel interactions between the constituents (e.g., extended reality (XR) users and avatars) of the three worlds. However, remarkably, to date, there is no holistic vision that identifies the full set of metaverse worlds, constituents, and experiences, and the implications of their associated interactions on next-generation communication and computing systems. In this paper, we present a holistic vision of a limitless, wireless metaverse that distills the metaverse into an intersection of seven worlds and experiences that include the: i) physical, digital, and virtual worlds, along with the ii) cyber, extended, live, and parallel experiences. We then articulate how these experiences bring forth interactions between diverse metaverse constituents, namely, a) humans and avatars and b) connected intelligence systems and their digital twins (DTs). Then, we explore the wireless, computing, and artificial intelligence (AI) challenges that must be addressed to establish metaverse-ready networks that support these experiences and interactions. We particularly highlight the need for end-to-end synchronization of DTs, and the role of human-level AI and reasoning abilities for cognitive avatars. Moreover, we articulate a sequel of open questions that should ignite the quest for the future metaverse. We conclude with a set of recommendations to deploy the limitless metaverse over future wireless systems.

Via

Access Paper or Ask Questions