Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qiang Fu

Tencent Inc

Playable Game Generation

Dec 01, 2024

Mingyu Yang, Junyou Li, Zhongbin Fang, Sheng Chen, Yangbin Yu, Qiang Fu, Wei Yang, Deheng Ye

Abstract:In recent years, Artificial Intelligence Generated Content (AIGC) has advanced from text-to-image generation to text-to-video and multimodal video synthesis. However, generating playable games presents significant challenges due to the stringent requirements for real-time interaction, high visual quality, and accurate simulation of game mechanics. Existing approaches often fall short, either lacking real-time capabilities or failing to accurately simulate interactive mechanics. To tackle the playability issue, we propose a novel method called \emph{PlayGen}, which encompasses game data generation, an autoregressive DiT-based diffusion model, and a comprehensive playability-based evaluation framework. Validated on well-known 2D and 3D games, PlayGen achieves real-time interaction, ensures sufficient visual quality, and provides accurate interactive mechanics simulation. Notably, these results are sustained even after over 1000 frames of gameplay on an NVIDIA RTX 2060 GPU. Our code is publicly available: https://github.com/GreatX3/Playable-Game-Generation. Our playable demo generated by AI is: http://124.156.151.207.

Via

Access Paper or Ask Questions

Visual Adversarial Attack on Vision-Language Models for Autonomous Driving

Nov 27, 2024

Tianyuan Zhang, Lu Wang, Xinwei Zhang, Yitong Zhang, Boyi Jia, Siyuan Liang, Shengshan Hu, Qiang Fu, Aishan Liu, Xianglong Liu

Figure 1 for Visual Adversarial Attack on Vision-Language Models for Autonomous Driving

Figure 2 for Visual Adversarial Attack on Vision-Language Models for Autonomous Driving

Figure 3 for Visual Adversarial Attack on Vision-Language Models for Autonomous Driving

Figure 4 for Visual Adversarial Attack on Vision-Language Models for Autonomous Driving

Abstract:Vision-language models (VLMs) have significantly advanced autonomous driving (AD) by enhancing reasoning capabilities. However, these models remain highly vulnerable to adversarial attacks. While existing research has primarily focused on general VLM attacks, the development of attacks tailored to the safety-critical AD context has been largely overlooked. In this paper, we take the first step toward designing adversarial attacks specifically targeting VLMs in AD, exposing the substantial risks these attacks pose within this critical domain. We identify two unique challenges for effective adversarial attacks on AD VLMs: the variability of textual instructions and the time-series nature of visual scenarios. To this end, we propose ADvLM, the first visual adversarial attack framework specifically designed for VLMs in AD. Our framework introduces Semantic-Invariant Induction, which uses a large language model to create a diverse prompt library of textual instructions with consistent semantic content, guided by semantic entropy. Building on this, we introduce Scenario-Associated Enhancement, an approach where attention mechanisms select key frames and perspectives within driving scenarios to optimize adversarial perturbations that generalize across the entire scenario. Extensive experiments on several AD VLMs over multiple benchmarks show that ADvLM achieves state-of-the-art attack effectiveness. Moreover, real-world attack studies further validate its applicability and potential in practice.

Via

Access Paper or Ask Questions

Diverse Policies Recovering via Pointwise Mutual Information Weighted Imitation Learning

Oct 21, 2024

Hanlin Yang, Jian Yao, Weiming Liu, Qing Wang, Hanmin Qin, Hansheng Kong, Kirk Tang, Jiechao Xiong, Chao Yu, Kai Li(+6 more)

Figure 1 for Diverse Policies Recovering via Pointwise Mutual Information Weighted Imitation Learning

Figure 2 for Diverse Policies Recovering via Pointwise Mutual Information Weighted Imitation Learning

Figure 3 for Diverse Policies Recovering via Pointwise Mutual Information Weighted Imitation Learning

Figure 4 for Diverse Policies Recovering via Pointwise Mutual Information Weighted Imitation Learning

Abstract:Recovering a spectrum of diverse policies from a set of expert trajectories is an important research topic in imitation learning. After determining a latent style for a trajectory, previous diverse policies recovering methods usually employ a vanilla behavioral cloning learning objective conditioned on the latent style, treating each state-action pair in the trajectory with equal importance. Based on an observation that in many scenarios, behavioral styles are often highly relevant with only a subset of state-action pairs, this paper presents a new principled method in diverse polices recovery. In particular, after inferring or assigning a latent style for a trajectory, we enhance the vanilla behavioral cloning by incorporating a weighting mechanism based on pointwise mutual information. This additional weighting reflects the significance of each state-action pair's contribution to learning the style, thus allowing our method to focus on state-action pairs most representative of that style. We provide theoretical justifications for our new objective, and extensive empirical evaluations confirm the effectiveness of our method in recovering diverse policies from expert data.

* 18 pages, 6 figures

Via

Access Paper or Ask Questions

HazyDet: Open-source Benchmark for Drone-view Object Detection with Depth-cues in Hazy Scenes

Sep 30, 2024

Changfeng Feng, Zhenyuan Chen, Renke Kou, Guangwei Gao, Chunping Wang, Xiang Li, Xiangbo Shu, Yimian Dai, Qiang Fu, Jian Yang

Figure 1 for HazyDet: Open-source Benchmark for Drone-view Object Detection with Depth-cues in Hazy Scenes

Figure 2 for HazyDet: Open-source Benchmark for Drone-view Object Detection with Depth-cues in Hazy Scenes

Figure 3 for HazyDet: Open-source Benchmark for Drone-view Object Detection with Depth-cues in Hazy Scenes

Figure 4 for HazyDet: Open-source Benchmark for Drone-view Object Detection with Depth-cues in Hazy Scenes

Abstract:Drone-based object detection in adverse weather conditions is crucial for enhancing drones' environmental perception, yet it remains largely unexplored due to the lack of relevant benchmarks. To bridge this gap, we introduce HazyDet, a large-scale dataset tailored for drone-based object detection in hazy scenes. It encompasses 383,000 real-world instances, collected from both naturally hazy environments and normal scenes with synthetically imposed haze effects to simulate adverse weather conditions. By observing the significant variations in object scale and clarity under different depth and haze conditions, we designed a Depth Conditioned Detector (DeCoDet) to incorporate this prior knowledge. DeCoDet features a Multi-scale Depth-aware Detection Head that seamlessly integrates depth perception, with the resulting depth cues harnessed by a dynamic Depth Condition Kernel module. Furthermore, we propose a Scale Invariant Refurbishment Loss to facilitate the learning of robust depth cues from pseudo-labels. Extensive evaluations on the HazyDet dataset demonstrate the flexibility and effectiveness of our method, yielding significant performance improvements. Our dataset and toolkit are available at https://github.com/GrokCV/HazyDet.

Via

Access Paper or Ask Questions

Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks

Aug 20, 2024

Yun Qu, Boyuan Wang, Jianzhun Shao, Yuhang Jiang, Chen Chen, Zhenbin Ye, Lin Liu, Junfeng Yang, Lin Lai, Hongyang Qin(+8 more)

Figure 1 for Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks

Figure 2 for Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks

Figure 3 for Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks

Figure 4 for Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks

Abstract:The advancement of Offline Reinforcement Learning (RL) and Offline Multi-Agent Reinforcement Learning (MARL) critically depends on the availability of high-quality, pre-collected offline datasets that represent real-world complexities and practical applications. However, existing datasets often fall short in their simplicity and lack of realism. To address this gap, we propose Hokoff, a comprehensive set of pre-collected datasets that covers both offline RL and offline MARL, accompanied by a robust framework, to facilitate further research. This data is derived from Honor of Kings, a recognized Multiplayer Online Battle Arena (MOBA) game known for its intricate nature, closely resembling real-life situations. Utilizing this framework, we benchmark a variety of offline RL and offline MARL algorithms. We also introduce a novel baseline algorithm tailored for the inherent hierarchical action space of the game. We reveal the incompetency of current offline RL approaches in handling task complexity, generalization and multi-task learning.

Via

Access Paper or Ask Questions

Latent Space Imaging

Jul 09, 2024

Matheus Souza, Yidan Zheng, Kaizhang Kang, Yogeshwar Nath Mishra, Qiang Fu, Wolfgang Heidrich

Abstract:Digital imaging systems have classically been based on brute-force measuring and processing of pixels organized on regular grids. The human visual system, on the other hand, performs a massive data reduction from the number of photo-receptors to the optic nerve, essentially encoding the image information into a low bandwidth latent space representation suitable for processing by the human brain. In this work, we propose to follow a similar approach for the development of artificial vision systems. Latent Space Imaging is a new paradigm that, through a combination of optics and software, directly encodes the image information into the semantically rich latent space of a generative model, thus substantially reducing bandwidth and memory requirements during the capture process. We demonstrate this new principle through an initial hardware prototype based on the single pixel camera. By designing an amplitude modulation scheme that encodes into the latent space of a generative model, we achieve compression ratios from 1:100 to 1:1,000 during the imaging process, illustrating the potential of latent space imaging for highly efficient imaging hardware, to enable future applications in high speed imaging, or task-specific cameras with substantially reduced hardware complexity.

Via

Access Paper or Ask Questions

Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models

Jul 04, 2024

Yuyan Chen, Qiang Fu, Yichen Yuan, Zhihao Wen, Ge Fan, Dayiheng Liu, Dongmei Zhang, Zhixu Li, Yanghua Xiao

Abstract:Large Language Models (LLMs) have gained widespread adoption in various natural language processing tasks, including question answering and dialogue systems. However, a major drawback of LLMs is the issue of hallucination, where they generate unfaithful or inconsistent content that deviates from the input source, leading to severe consequences. In this paper, we propose a robust discriminator named RelD to effectively detect hallucination in LLMs' generated answers. RelD is trained on the constructed RelQA, a bilingual question-answering dialogue dataset along with answers generated by LLMs and a comprehensive set of metrics. Our experimental results demonstrate that the proposed RelD successfully detects hallucination in the answers generated by diverse LLMs. Moreover, it performs well in distinguishing hallucination in LLMs' generated answers from both in-distribution and out-of-distribution datasets. Additionally, we also conduct a thorough analysis of the types of hallucinations that occur and present valuable insights. This research significantly contributes to the detection of reliable answers generated by LLMs and holds noteworthy implications for mitigating hallucination in the future work.

* Accepted to CIKM 2023 (Long Paper)

Via

Access Paper or Ask Questions

Scalable Differentiable Causal Discovery in the Presence of Latent Confounders with Skeleton Posterior (Extended Version)

Jun 15, 2024

Pingchuan Ma, Rui Ding, Qiang Fu, Jiaru Zhang, Shuai Wang, Shi Han, Dongmei Zhang

Figure 1 for Scalable Differentiable Causal Discovery in the Presence of Latent Confounders with Skeleton Posterior (Extended Version)

Figure 2 for Scalable Differentiable Causal Discovery in the Presence of Latent Confounders with Skeleton Posterior (Extended Version)

Figure 3 for Scalable Differentiable Causal Discovery in the Presence of Latent Confounders with Skeleton Posterior (Extended Version)

Figure 4 for Scalable Differentiable Causal Discovery in the Presence of Latent Confounders with Skeleton Posterior (Extended Version)

Abstract:Differentiable causal discovery has made significant advancements in the learning of directed acyclic graphs. However, its application to real-world datasets remains restricted due to the ubiquity of latent confounders and the requirement to learn maximal ancestral graphs (MAGs). To date, existing differentiable MAG learning algorithms have been limited to small datasets and failed to scale to larger ones (e.g., with more than 50 variables). The key insight in this paper is that the causal skeleton, which is the undirected version of the causal graph, has potential for improving accuracy and reducing the search space of the optimization procedure, thereby enhancing the performance of differentiable causal discovery. Therefore, we seek to address a two-fold challenge to harness the potential of the causal skeleton for differentiable causal discovery in the presence of latent confounders: (1) scalable and accurate estimation of skeleton and (2) universal integration of skeleton estimation with differentiable causal discovery. To this end, we propose SPOT (Skeleton Posterior-guided OpTimization), a two-phase framework that harnesses skeleton posterior for differentiable causal discovery in the presence of latent confounders. On the contrary to a ``point-estimation'', SPOT seeks to estimate the posterior distribution of skeletons given the dataset. It first formulates the posterior inference as an instance of amortized inference problem and concretizes it with a supervised causal learning (SCL)-enabled solution to estimate the skeleton posterior. To incorporate the skeleton posterior with differentiable causal discovery, SPOT then features a skeleton posterior-guided stochastic optimization procedure to guide the optimization of MAGs. [abridged due to length limit]

Via

Access Paper or Ask Questions

End-to-End Hybrid Refractive-Diffractive Lens Design with Differentiable Ray-Wave Model

Jun 02, 2024

Xinge Yang, Matheus Souza, Kunyi Wang, Praneeth Chakravarthula, Qiang Fu, Wolfgang Heidrich

$Figure 1 for End-to-End Hybrid Refractive-Diffractive Lens Design with Differentiable Ray-Wave Model$

$Figure 2 for End-to-End Hybrid Refractive-Diffractive Lens Design with Differentiable Ray-Wave Model$

$Figure 3 for End-to-End Hybrid Refractive-Diffractive Lens Design with Differentiable Ray-Wave Model$

$Figure 4 for End-to-End Hybrid Refractive-Diffractive Lens Design with Differentiable Ray-Wave Model$

Abstract:Hybrid refractive-diffractive lenses combine the light efficiency of refractive lenses with the information encoding power of diffractive optical elements (DOE), showing great potential as the next generation of imaging systems. However, accurately simulating such hybrid designs is generally difficult, and in particular, there are no existing differentiable image formation models for hybrid lenses with sufficient accuracy. In this work, we propose a new hybrid ray-tracing and wave-propagation (ray-wave) model for accurate simulation of both optical aberrations and diffractive phase modulation, where the DOE is placed between the last refractive surface and the image sensor, i.e. away from the Fourier plane that is often used as a DOE position. The proposed ray-wave model is fully differentiable, enabling gradient back-propagation for end-to-end co-design of refractive-diffractive lens optimization and the image reconstruction network. We validate the accuracy of the proposed model by comparing the simulated point spread functions (PSFs) with theoretical results, as well as simulation experiments that show our model to be more accurate than solutions implemented in commercial software packages like Zemax. We demonstrate the effectiveness of the proposed model through real-world experiments and show significant improvements in both aberration correction and extended depth-of-field (EDoF) imaging. We believe the proposed model will motivate further investigation into a wide range of applications in computational imaging, computational photography, and advanced optical design. Code will be released upon publication.

Via

Access Paper or Ask Questions

Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent

Apr 22, 2024

Hang Xu, Kai Li, Bingyun Liu, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng

Abstract:Counterfactual regret minimization (CFR) is a family of algorithms for effectively solving imperfect-information games. It decomposes the total regret into counterfactual regrets, utilizing local regret minimization algorithms, such as Regret Matching (RM) or RM+, to minimize them. Recent research establishes a connection between Online Mirror Descent (OMD) and RM+, paving the way for an optimistic variant PRM+ and its extension PCFR+. However, PCFR+ assigns uniform weights for each iteration when determining regrets, leading to substantial regrets when facing dominated actions. This work explores minimizing weighted counterfactual regret with optimistic OMD, resulting in a novel CFR variant PDCFR+. It integrates PCFR+ and Discounted CFR (DCFR) in a principled manner, swiftly mitigating negative effects of dominated actions and consistently leveraging predictions to accelerate convergence. Theoretical analyses prove that PDCFR+ converges to a Nash equilibrium, particularly under distinct weighting schemes for regrets and average strategies. Experimental results demonstrate PDCFR+'s fast convergence in common imperfect-information games. The code is available at https://github.com/rpSebastian/PDCFRPlus.

* Accepted to 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024)

Via

Access Paper or Ask Questions