In this paper, we propose an alternating optimization method to address a time-optimal trajectory generation problem. Different from the existing solutions, our approach introduces a new formulation that minimizes the overall trajectory running time while maintaining the polynomial smoothness constraints and incorporating hard limits on motion derivatives to ensure feasibility. To address this problem, an alternating peak-optimization method is developed, which splits the optimization process into two sub-optimizations: the first sub-optimization optimizes polynomial coefficients for smoothness, and the second sub-optimization adjusts the time allocated to each trajectory segment. These are alternated until a feasible minimum-time solution is found. We offer a comprehensive set of simulations and experiments to showcase the superior performance of our approach in comparison to existing methods. A collection of demonstration videos with real drone flying experiments can be accessed at https://www.youtube.com/playlist?list=PLQGtPFK17zUYkwFT-fr0a8E49R8Uq712l .
Colonoscopy reconstruction is pivotal for diagnosing colorectal cancer. However, accurate long-sequence colonoscopy reconstruction faces three major challenges: (1) dissimilarity among segments of the colon due to its meandering and convoluted shape; (2) co-existence of simple and intricately folded geometry structures; (3) sparse viewpoints due to constrained camera trajectories. To tackle these challenges, we introduce a new reconstruction framework based on neural radiance field (NeRF), named ColonNeRF, which leverages neural rendering for novel view synthesis of long-sequence colonoscopy. Specifically, to reconstruct the entire colon in a piecewise manner, our ColonNeRF introduces a region division and integration module, effectively reducing shape dissimilarity and ensuring geometric consistency in each segment. To learn both the simple and complex geometry in a unified framework, our ColonNeRF incorporates a multi-level fusion module that progressively models the colon regions from easy to hard. Additionally, to overcome the challenges from sparse views, we devise a DensiNet module for densifying camera poses under the guidance of semantic consistency. We conduct extensive experiments on both synthetic and real-world datasets to evaluate our ColonNeRF. Quantitatively, our ColonNeRF outperforms existing methods on two benchmarks over four evaluation metrics. Notably, our LPIPS-ALEX scores exhibit a substantial increase of about 67%-85% on the SimCol-to-3D dataset. Qualitatively, our reconstruction visualizations show much clearer textures and more accurate geometric details. These sufficiently demonstrate our superior performance over the state-of-the-art methods.
With the recent development of generative models, Text-to-3D generations have also seen significant growth. Nonetheless, achieving precise control over 3D generation continues to be an arduous task, as using text to control often leads to missing objects and imprecise locations. Contemporary strategies for enhancing controllability in 3D generation often entail the introduction of additional parameters, such as customized diffusion models. This often induces hardness in adapting to different diffusion models or creating distinct objects. In this paper, we present LucidDreaming as an effective pipeline capable of fine-grained control over 3D generation. It requires only minimal input of 3D bounding boxes, which can be deduced from a simple text prompt using a Large Language Model. Specifically, we propose clipped ray sampling to separately render and optimize objects with user specifications. We also introduce object-centric density blob bias, fostering the separation of generated objects. With individual rendering and optimizing of objects, our method excels not only in controlled content generation from scratch but also within the pre-trained NeRF scenes. In such scenarios, existing generative approaches often disrupt the integrity of the original scene, and current editing methods struggle to synthesize new content in empty spaces. We show that our method exhibits remarkable adaptability across a spectrum of mainstream Score Distillation Sampling-based 3D generation frameworks, and achieves superior alignment of 3D content when compared to baseline approaches. We also provide a dataset of prompts with 3D bounding boxes, benchmarking 3D spatial controllability.
In medical image analysis, Federated Learning (FL) stands out as a key technology that enables privacy-preserved, decentralized data processing, crucial for handling sensitive medical data. Currently, most FL models employ random initialization, which has been proven effective in various instances. However, given the unique challenges posed by non-IID (independently and identically distributed) data in FL, we propose a novel perspective: exploring the impact of using the foundation model with enormous pre-trained knowledge, such as the Segment Anything Model (SAM), as an instructive teacher for FL model initialization in medical image segmentation task. This work for the first time attempts to utilize the foundation model as an instructive teacher for initialization in FL, assessing its impact on the performance of FL models, especially in non-IID data scenarios. Our empirical evaluation on chest x-ray lung segmentation showcases that FL with foundation model instructed initialization not only achieves faster convergence but also improves performance in complex data contexts. These findings offer a new perspective for model initialization in FL.
ChatGPT and other Generative Artificial Intelligence (GAI) models tend to inherit and even amplify prevailing societal biases as they are trained on large amounts of existing data. Given the increasing usage of ChatGPT and other GAI by students, faculty members, and staff in higher education institutions (HEIs), there is an urgent need to examine the ethical issues involved such as its potential biases. In this scoping review, we clarify the ways in which biases related to GAI in higher education settings have been discussed in recent academic publications and identify what type of potential biases are commonly reported in this body of literature. We searched for academic articles written in English, Chinese, and Japanese across four main databases concerned with GAI usage in higher education and bias. Our findings show that while there is an awareness of potential biases around large language models (LLMs) and GAI, the majority of articles touch on ``bias'' at a relatively superficial level. Few identify what types of bias may occur under what circumstances. Neither do they discuss the possible implications for the higher education, staff, faculty members, or students. There is a notable lack of empirical work at this point, and we call for higher education researchers and AI experts to conduct more research in this area.
This scoping review explores the ethical challenges of using ChatGPT in education, focusing particularly on issues related to higher education. By reviewing recent academic articles written in English, Chinese, and Japanese, we aimed to provide a comprehensive overview of relevant research while identifying gaps for future considerations. Drawing on Arksey and O'Malley's (2005) five-stage scoping review framework, we identified research questions, search terms, and conducted article search from four databases in the target three languages. Each article was reviewed by at least two researchers identifying the main ethical issues of utilizing AI in education, particularly higher education. Our analysis of ethical issues followed the framework developed by DeepMind (Weiginger et al., 2021) to identify six main areas of ethical concern in Language Models. The majority of papers were concerned with misinformation harms (n=25) and/or human-computer interaction related harms (n=24). Given the rapid deployment of Generative Artificial Intelligence (GAI), it is imperative for educators to conduct more empirical studies to develop sound ethical policies for the use of GAI.
For intelligent quadcopter UAVs, a robust and reliable autonomous planning system is crucial. Most current trajectory planning methods for UAVs are suitable for static environments but struggle to handle dynamic obstacles, which can pose challenges and even dangers to flight. To address this issue, this paper proposes a vision-based planning system that combines tracking and trajectory prediction of dynamic obstacles to achieve efficient and reliable autonomous flight. We use a lightweight object detection algorithm to identify dynamic obstacles and then use Kalman Filtering to track and estimate their motion states. During the planning phase, we not only consider static obstacles but also account for the potential movements of dynamic obstacles. For trajectory generation, we use a B-spline-based trajectory search algorithm, which is further optimized with various constraints to enhance safety and alignment with the UAV's motion characteristics. We conduct experiments in both simulation and real-world environments, and the results indicate that our approach can successfully detect and avoid obstacles in dynamic environments in real-time, offering greater reliability compared to existing approaches. Furthermore, with the advancements in Natural Language Processing (NLP) technology demonstrating exceptional zero-shot generalization capabilities, more user-friendly human-machine interactions have become feasible, and this study also explores the integration of autonomous planning systems with Large Language Models (LLMs).
The utilization of integrated sensing and communication (ISAC) technology has the potential to enhance the communication performance of road side units (RSUs) through the active sensing of target vehicles. Furthermore, installing a simultaneous transmitting and reflecting surface (STARS) on the target vehicle can provide an extra boost to the reflection of the echo signal, thereby improving the communication quality for in-vehicle users. However, the design of this target-mounted STARS system exhibits significant challenges, such as limited information sharing and distributed STARS control. In this paper, we propose an end-to-end multi-agent deep reinforcement learning (MADRL) framework to tackle the challenges of joint sensing and communication optimization in the considered target-mounted STARS assisted vehicle networks. By deploying agents on both RSU and vehicle, the MADRL framework enables RSU and vehicle to perform beam prediction and STARS pre-configuration using their respective local information. To ensure efficient and stable learning for continuous decision-making, we employ the multi-agent soft actor critic (MASAC) algorithm and the multi-agent proximal policy optimization (MAPPO) algorithm on the proposed MADRL framework. Extensive experimental results confirm the effectiveness of our proposed MADRL framework in improving both sensing and communication performance through the utilization of target-mounted STARS. Finally, we conduct a comparative analysis and comparison of the two proposed algorithms under various environmental conditions.
Text-to-3D generation, which aims to synthesize vivid 3D objects from text prompts, has attracted much attention from the computer vision community. While several existing works have achieved impressive results for this task, they mainly rely on a time-consuming optimization paradigm. Specifically, these methods optimize a neural field from scratch for each text prompt, taking approximately one hour or more to generate one object. This heavy and repetitive training cost impedes their practical deployment. In this paper, we propose a novel framework for fast text-to-3D generation, dubbed Instant3D. Once trained, Instant3D is able to create a 3D object for an unseen text prompt in less than one second with a single run of a feedforward network. We achieve this remarkable speed by devising a new network that directly constructs a 3D triplane from a text prompt. The core innovation of our Instant3D lies in our exploration of strategies to effectively inject text conditions into the network. Furthermore, we propose a simple yet effective activation function, the scaled-sigmoid, to replace the original sigmoid function, which speeds up the training convergence by more than ten times. Finally, to address the Janus (multi-head) problem in 3D generation, we propose an adaptive Perp-Neg algorithm that can dynamically adjust its concept negation scales according to the severity of the Janus problem during training, effectively reducing the multi-head effect. Extensive experiments on a wide variety of benchmark datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods both qualitatively and quantitatively, while achieving significantly better efficiency. The project page is at https://ming1993li.github.io/Instant3DProj.
Thorax disease analysis in large-scale, multi-centre, and multi-scanner settings is often limited by strict privacy policies. Federated learning (FL) offers a potential solution, while traditional parameter-based FL can be limited by issues such as high communication costs, data leakage, and heterogeneity. Distillation-based FL can improve efficiency, but it relies on a proxy dataset, which is often impractical in clinical practice. To address these challenges, we introduce a data-free distillation-based FL approach FedKDF. In FedKDF, the server employs a lightweight generator to aggregate knowledge from different clients without requiring access to their private data or a proxy dataset. FedKDF combines the predictors from clients into a single, unified predictor, which is further optimized using the learned knowledge in the lightweight generator. Our empirical experiments demonstrate that FedKDF offers a robust solution for efficient, privacy-preserving federated thorax disease analysis.