Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chen Feng

ActFormer: Scalable Collaborative Perception via Active Queries

Mar 08, 2024

Suozhi Huang, Juexiao Zhang, Yiming Li, Chen Feng

Figure 1 for ActFormer: Scalable Collaborative Perception via Active Queries

Figure 2 for ActFormer: Scalable Collaborative Perception via Active Queries

Figure 3 for ActFormer: Scalable Collaborative Perception via Active Queries

Figure 4 for ActFormer: Scalable Collaborative Perception via Active Queries

Abstract:Collaborative perception leverages rich visual observations from multiple robots to extend a single robot's perception ability beyond its field of view. Many prior works receive messages broadcast from all collaborators, leading to a scalability challenge when dealing with a large number of robots and sensors. In this work, we aim to address \textit{scalable camera-based collaborative perception} with a Transformer-based architecture. Our key idea is to enable a single robot to intelligently discern the relevance of the collaborators and their associated cameras according to a learned spatial prior. This proactive understanding of the visual features' relevance does not require the transmission of the features themselves, enhancing both communication and computation efficiency. Specifically, we present ActFormer, a Transformer that learns bird's eye view (BEV) representations by using predefined BEV queries to interact with multi-robot multi-camera inputs. Each BEV query can actively select relevant cameras for information aggregation based on pose information, instead of interacting with all cameras indiscriminately. Experiments on the V2X-Sim dataset demonstrate that ActFormer improves the detection performance from 29.89% to 45.15% in terms of AP@0.7 with about 50% fewer queries, showcasing the effectiveness of ActFormer in multi-agent collaborative 3D object detection.

* Accepted to ICRA 2024

Via

Access Paper or Ask Questions

Star-Searcher: A Complete and Efficient Aerial System for Autonomous Target Search in Complex Unknown Environments

Feb 26, 2024

Yiming Luo, Zixuan Zhuang, Neng Pan, Chen Feng, Shaojie Shen, Fei Gao, Hui Cheng, Boyu Zhou

Abstract:This paper tackles the challenge of autonomous target search using unmanned aerial vehicles (UAVs) in complex unknown environments. To fill the gap in systematic approaches for this task, we introduce Star-Searcher, an aerial system featuring specialized sensor suites, mapping, and planning modules to optimize searching. Path planning challenges due to increased inspection requirements are addressed through a hierarchical planner with a visibility-based viewpoint clustering method. This simplifies planning by breaking it into global and local sub-problems, ensuring efficient global and local path coverage in real-time. Furthermore, our global path planning employs a history-aware mechanism to reduce motion inconsistency from frequent map changes, significantly enhancing search efficiency. We conduct comparisons with state-of-the-art methods in both simulation and the real world, demonstrating shorter flight paths, reduced time, and higher target search completeness. Our approach will be open-sourced for community benefit at https://github.com/SYSU-STAR/STAR-Searcher.

* Submitted to IEEE RA-L. Code: https://github.com/SYSU-STAR/STAR-Searcher. Video: https://www.youtube.com/watch?v=08ll_oo_DtU

Via

Access Paper or Ask Questions

Networked Multiagent Reinforcement Learning for Peer-to-Peer Energy Trading

Jan 27, 2024

Chen Feng, Andrew L. Liu

Figure 1 for Networked Multiagent Reinforcement Learning for Peer-to-Peer Energy Trading

Figure 2 for Networked Multiagent Reinforcement Learning for Peer-to-Peer Energy Trading

Figure 3 for Networked Multiagent Reinforcement Learning for Peer-to-Peer Energy Trading

Figure 4 for Networked Multiagent Reinforcement Learning for Peer-to-Peer Energy Trading

Abstract:Utilizing distributed renewable and energy storage resources in local distribution networks via peer-to-peer (P2P) energy trading has long been touted as a solution to improve energy systems' resilience and sustainability. Consumers and prosumers (those who have energy generation resources), however, do not have the expertise to engage in repeated P2P trading, and the zero-marginal costs of renewables present challenges in determining fair market prices. To address these issues, we propose multi-agent reinforcement learning (MARL) frameworks to help automate consumers' bidding and management of their solar PV and energy storage resources, under a specific P2P clearing mechanism that utilizes the so-called supply-demand ratio. In addition, we show how the MARL frameworks can integrate physical network constraints to realize voltage control, hence ensuring physical feasibility of the P2P energy trading and paving way for real-world implementations.

Via

Access Paper or Ask Questions

Learning When to See for Long-term Traffic Data Collection on Power-constrained Devices

Jan 25, 2024

Ruixuan Zhang, Wenyu Han, Zilin Bian, Kaan Ozbay, Chen Feng

Figure 1 for Learning When to See for Long-term Traffic Data Collection on Power-constrained Devices

Figure 2 for Learning When to See for Long-term Traffic Data Collection on Power-constrained Devices

Figure 3 for Learning When to See for Long-term Traffic Data Collection on Power-constrained Devices

Figure 4 for Learning When to See for Long-term Traffic Data Collection on Power-constrained Devices

Abstract:Collecting traffic data is crucial for transportation systems and urban planning, and is often more desirable through easy-to-deploy but power-constrained devices, due to the unavailability or high cost of power and network infrastructure. The limited power means an inevitable trade-off between data collection duration and accuracy/resolution. We introduce a novel learning-based framework that strategically decides observation timings for battery-powered devices and reconstructs the full data stream from sparsely sampled observations, resulting in minimal performance loss and a significantly prolonged system lifetime. Our framework comprises a predictor, a controller, and an estimator. The predictor utilizes historical data to forecast future trends within a fixed time horizon. The controller uses the forecasts to determine the next optimal timing for data collection. Finally, the estimator reconstructs the complete data profile from the sampled observations. We evaluate the performance of the proposed method on PeMS data by an RNN (Recurrent Neural Network) predictor and estimator, and a DRQN (Deep Recurrent Q-Network) controller, and compare it against the baseline that uses Kalman filter and uniform sampling. The results indicate that our method outperforms the baseline, primarily due to the inclusion of more representative data points in the profile, resulting in an overall 10\% improvement in estimation accuracy. Source code will be publicly available.

* Accepted by IEEE 26th International Conference on Intelligent Transportation Systems

Via

Access Paper or Ask Questions

Full-reference Video Quality Assessment for User Generated Content Transcoding

Dec 19, 2023

Zihao Qi, Chen Feng, Duolikun Danier, Fan Zhang, Xiaozhong Xu, Shan Liu, David Bull

Figure 1 for Full-reference Video Quality Assessment for User Generated Content Transcoding

Figure 2 for Full-reference Video Quality Assessment for User Generated Content Transcoding

Figure 3 for Full-reference Video Quality Assessment for User Generated Content Transcoding

Figure 4 for Full-reference Video Quality Assessment for User Generated Content Transcoding

Abstract:Unlike video coding for professional content, the delivery pipeline of User Generated Content (UGC) involves transcoding where unpristine reference content needs to be compressed repeatedly. In this work, we observe that existing full-/no-reference quality metrics fail to accurately predict the perceptual quality difference between transcoded UGC content and the corresponding unpristine references. Therefore, they are unsuited for guiding the rate-distortion optimisation process in the transcoding process. In this context, we propose a bespoke full-reference deep video quality metric for UGC transcoding. The proposed method features a transcoding-specific weakly supervised training strategy employing a quality ranking-based Siamese structure. The proposed method is evaluated on the YouTube-UGC VP9 subset and the LIVE-Wild database, demonstrating state-of-the-art performance compared to existing VQA methods.

* 5 pages, 4 figures

Via

Access Paper or Ask Questions

RankDVQA-mini: Knowledge Distillation-Driven Deep Video Quality Assessment

Dec 14, 2023

Chen Feng, Duolikun Danier, Haoran Wang, Fan Zhang, David Bull

Figure 1 for RankDVQA-mini: Knowledge Distillation-Driven Deep Video Quality Assessment

Figure 2 for RankDVQA-mini: Knowledge Distillation-Driven Deep Video Quality Assessment

Figure 3 for RankDVQA-mini: Knowledge Distillation-Driven Deep Video Quality Assessment

Abstract:Deep learning-based video quality assessment (deep VQA) has demonstrated significant potential in surpassing conventional metrics, with promising improvements in terms of correlation with human perception. However, the practical deployment of such deep VQA models is often limited due to their high computational complexity and large memory requirements. To address this issue, we aim to significantly reduce the model size and runtime of one of the state-of-the-art deep VQA methods, RankDVQA, by employing a two-phase workflow that integrates pruning-driven model compression with multi-level knowledge distillation. The resulting lightweight quality metric, RankDVQA-mini, requires less than 10% of the model parameters compared to its full version (14% in terms of FLOPs), while still retaining a quality prediction performance that is superior to most existing deep VQA methods. The source code of the RankDVQA-mini has been released at https://chenfeng-bristol.github.io/RankDVQA-mini/ for public evaluation.

* 5 pages and 2 figures

Via

Access Paper or Ask Questions

BVI-Artefact: An Artefact Detection Benchmark Dataset for Streamed Videos

Dec 14, 2023

Chen Feng, Duolikun Danier, Fan Zhang, David Bull

Figure 1 for BVI-Artefact: An Artefact Detection Benchmark Dataset for Streamed Videos

Figure 2 for BVI-Artefact: An Artefact Detection Benchmark Dataset for Streamed Videos

Figure 3 for BVI-Artefact: An Artefact Detection Benchmark Dataset for Streamed Videos

Figure 4 for BVI-Artefact: An Artefact Detection Benchmark Dataset for Streamed Videos

Abstract:Professionally generated content (PGC) streamed online can contain visual artefacts that degrade the quality of user experience. These artefacts arise from different stages of the streaming pipeline, including acquisition, post-production, compression, and transmission. To better guide streaming experience enhancement, it is important to detect specific artefacts at the user end in the absence of a pristine reference. In this work, we address the lack of a comprehensive benchmark for artefact detection within streamed PGC, via the creation and validation of a large database, BVI-Artefact. Considering the ten most relevant artefact types encountered in video streaming, we collected and generated 480 video sequences, each containing various artefacts with associated binary artefact labels. Based on this new database, existing artefact detection methods are benchmarked, with results showing the challenging nature of this tasks and indicating the requirement of more reliable artefact detection methods. To facilitate further research in this area, we have made BVI-Artifact publicly available at https://chenfeng-bristol.github.io/BVI-Artefact/

* 5 pages and 3 figures

Via

Access Paper or Ask Questions

SO-NeRF: Active View Planning for NeRF using Surrogate Objectives

Dec 06, 2023

Keifer Lee, Shubham Gupta, Sunglyoung Kim, Bhargav Makwana, Chao Chen, Chen Feng

Figure 1 for SO-NeRF: Active View Planning for NeRF using Surrogate Objectives

Figure 2 for SO-NeRF: Active View Planning for NeRF using Surrogate Objectives

Figure 3 for SO-NeRF: Active View Planning for NeRF using Surrogate Objectives

Figure 4 for SO-NeRF: Active View Planning for NeRF using Surrogate Objectives

Abstract:Despite the great success of Neural Radiance Fields (NeRF), its data-gathering process remains vague with only a general rule of thumb of sampling as densely as possible. The lack of understanding of what actually constitutes good views for NeRF makes it difficult to actively plan a sequence of views that yield the maximal reconstruction quality. We propose Surrogate Objectives for Active Radiance Fields (SOAR), which is a set of interpretable functions that evaluates the goodness of views using geometric and photometric visual cues - surface coverage, geometric complexity, textural complexity, and ray diversity. Moreover, by learning to infer the SOAR scores from a deep network, SOARNet, we are able to effectively select views in mere seconds instead of hours, without the need for prior visits to all the candidate views or training any radiance field during such planning. Our experiments show SOARNet outperforms the baselines with $\sim$80x speed-up while achieving better or comparable reconstruction qualities. We finally show that SOAR is model-agnostic, thus it generalizes across fully neural-implicit to fully explicit approaches.

* 13 pages

Via

Access Paper or Ask Questions

Scene Summarization: Clustering Scene Videos into Spatially Diverse Frames

Nov 28, 2023

Chao Chen, Mingzhi Zhu, Ankush Pratap Singh, Yu Yan, Felix Juefei Xu, Chen Feng

Abstract:We propose scene summarization as a new video-based scene understanding task. It aims to summarize a long video walkthrough of a scene into a small set of frames that are spatially diverse in the scene, which has many impotant applications, such as in surveillance, real estate, and robotics. It stems from video summarization but focuses on long and continuous videos from moving cameras, instead of user-edited fragmented video clips that are more commonly studied in existing video summarization works. Our solution to this task is a two-stage self-supervised pipeline named SceneSum. Its first stage uses clustering to segment the video sequence. Our key idea is to combine visual place recognition (VPR) into this clustering process to promote spatial diversity. Its second stage needs to select a representative keyframe from each cluster as the summary while respecting resource constraints such as memory and disk space limits. Additionally, if the ground truth image trajectory is available, our method can be easily augmented with a supervised loss to enhance the clustering and keyframe selection. Extensive experiments on both real-world and simulated datasets show our method outperforms common video summarization baselines by 50%

Via

Access Paper or Ask Questions

AutoTrans: A Complete Planning and Control Framework for Autonomous UAV Payload Transportation

Oct 23, 2023

Haojia Li, Haokun Wang, Chen Feng, Fei Gao, Boyu Zhou, Shaojie Shen

Abstract:The robotics community is increasingly interested in autonomous aerial transportation. Unmanned aerial vehicles with suspended payloads have advantages over other systems, including mechanical simplicity and agility, but pose great challenges in planning and control. To realize fully autonomous aerial transportation, this paper presents a systematic solution to address these difficulties. First, we present a real-time planning method that generates smooth trajectories considering the time-varying shape and non-linear dynamics of the system, ensuring whole-body safety and dynamic feasibility. Additionally, an adaptive NMPC with a hierarchical disturbance compensation strategy is designed to overcome unknown external perturbations and inaccurate model parameters. Extensive experiments show that our method is capable of generating high-quality trajectories online, even in highly constrained environments, and tracking aggressive flight trajectories accurately, even under significant uncertainty. We plan to release our code to benefit the community.

* Accepted by IEEE Robotics and Automation Letters

Via

Access Paper or Ask Questions