Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bingnan Li

DLLG: Dynamic Logit-Level Gating of LLM Experts

Jun 03, 2026

Bingnan Li, Zhaoyang Zhang, Xiaoze Liu, Yantao Shen, Shuli Jiang, Shuo Yang, Wei Xia, Zhuowen Tu, Stefano Soatto

Abstract:Leveraging multiple specialized LLMs can combine complementary strengths, but existing approaches trade adaptability for stability: routing commits prematurely, heuristic ensembling depends on fragile proxies, and parameter merging introduces interference. We propose DLLG (Dynamic Logit-Level Gating), a dynamic logit-level ensembling framework that learns token-level expert fusion from sparse response-level supervision. A lightweight gating module predicts step-wise fusion weights, linking trajectory-level correctness to generation without token-level labels or expert retraining. Across diverse reasoning and code benchmarks, DLLG consistently outperforms strong routing, heuristic ensembling, and parameter-merging baselines across model scales, highlighting learned logit-level fusion as a robust and scalable paradigm for integrating specialized experts.

Via

Access Paper or Ask Questions

CyCLeGen: Cycle-Consistent Layout Prediction and Image Generation in Vision Foundation Models

Mar 16, 2026

Xiaojun Shan, Haoyu Shen, Yucheng Mao, Xiang Zhang, Abhay Anand, Bingnan Li, Haiyang Xu, Zhuowen Tu

Abstract:We present CyCLeGen, a unified vision-language foundation model capable of both image understanding and image generation within a single autoregressive framework. Unlike existing vision models that depend on separate modules for perception and synthesis, CyCLeGen adopts a fully integrated architecture that enforces cycle-consistent learning through image->layout->image and layout->image->layout generation loops. This unified formulation introduces two key advantages: introspection, enabling the model to reason about its own generations, and data efficiency, allowing self-improvement via synthetic supervision under a reinforcement learning objective guided by cycle consistency. Extensive experiments show that CyCLeGen achieves significant gains across diverse image understanding and generation benchmarks, highlighting the potential of unified vision-language foundation models.

Via

Access Paper or Ask Questions

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Aug 01, 2025

Guanning Zeng, Xiang Zhang, Zirui Wang, Haiyang Xu, Zeyuan Chen, Bingnan Li, Zhuowen Tu

Figure 1 for YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Figure 2 for YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Figure 3 for YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Figure 4 for YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Abstract:We propose YOLO-Count, a differentiable open-vocabulary object counting model that tackles both general counting challenges and enables precise quantity control for text-to-image (T2I) generation. A core contribution is the 'cardinality' map, a novel regression target that accounts for variations in object size and spatial distribution. Leveraging representation alignment and a hybrid strong-weak supervision scheme, YOLO-Count bridges the gap between open-vocabulary counting and T2I generation control. Its fully differentiable architecture facilitates gradient-based optimization, enabling accurate object count estimation and fine-grained guidance for generative models. Extensive experiments demonstrate that YOLO-Count achieves state-of-the-art counting accuracy while providing robust and effective quantity control for T2I systems.

* ICCV 2025

Via

Access Paper or Ask Questions

Generalize or Detect? Towards Robust Semantic Segmentation Under Multiple Distribution Shifts

Nov 06, 2024

Zhitong Gao, Bingnan Li, Mathieu Salzmann, Xuming He

Figure 1 for Generalize or Detect? Towards Robust Semantic Segmentation Under Multiple Distribution Shifts

Figure 2 for Generalize or Detect? Towards Robust Semantic Segmentation Under Multiple Distribution Shifts

Figure 3 for Generalize or Detect? Towards Robust Semantic Segmentation Under Multiple Distribution Shifts

Figure 4 for Generalize or Detect? Towards Robust Semantic Segmentation Under Multiple Distribution Shifts

Abstract:In open-world scenarios, where both novel classes and domains may exist, an ideal segmentation model should detect anomaly classes for safety and generalize to new domains. However, existing methods often struggle to distinguish between domain-level and semantic-level distribution shifts, leading to poor out-of-distribution (OOD) detection or domain generalization performance. In this work, we aim to equip the model to generalize effectively to covariate-shift regions while precisely identifying semantic-shift regions. To achieve this, we design a novel generative augmentation method to produce coherent images that incorporate both anomaly (or novel) objects and various covariate shifts at both image and object levels. Furthermore, we introduce a training strategy that recalibrates uncertainty specifically for semantic shifts and enhances the feature extractor to align features associated with domain shifts. We validate the effectiveness of our method across benchmarks featuring both semantic and domain shifts. Our method achieves state-of-the-art performance across all benchmarks for both OOD detection and domain generalization. Code is available at https://github.com/gaozhitong/MultiShiftSeg.

* Published in NeurIPS 2024

Via

Access Paper or Ask Questions

Gradient-Map-Guided Adaptive Domain Generalization for Cross Modality MRI Segmentation

Nov 16, 2023

Bingnan Li, Zhitong Gao, Xuming He

Abstract:Cross-modal MRI segmentation is of great value for computer-aided medical diagnosis, enabling flexible data acquisition and model generalization. However, most existing methods have difficulty in handling local variations in domain shift and typically require a significant amount of data for training, which hinders their usage in practice. To address these problems, we propose a novel adaptive domain generalization framework, which integrates a learning-free cross-domain representation based on image gradient maps and a class prior-informed test-time adaptation strategy for mitigating local domain shift. We validate our approach on two multi-modal MRI datasets with six cross-modal segmentation tasks. Across all the task settings, our method consistently outperforms competing approaches and shows a stable performance even with limited training data.

* 9 pages, Machine Learning for Health (ML4H) 2023

Via

Access Paper or Ask Questions