Alert button
Picture for Kun Zhou

Kun Zhou

Alert button

A Locality-based Neural Solver for Optical Motion Capture

Sep 04, 2023
Xiaoyu Pan, Bowen Zheng, Xinwei Jiang, Guanglong Xu, Xianli Gu, Jingxiang Li, Qilong Kou, He Wang, Tianjia Shao, Kun Zhou, Xiaogang Jin

We present a novel locality-based learning method for cleaning and solving optical motion capture data. Given noisy marker data, we propose a new heterogeneous graph neural network which treats markers and joints as different types of nodes, and uses graph convolution operations to extract the local features of markers and joints and transform them to clean motions. To deal with anomaly markers (e.g. occluded or with big tracking errors), the key insight is that a marker's motion shows strong correlations with the motions of its immediate neighboring markers but less so with other markers, a.k.a. locality, which enables us to efficiently fill missing markers (e.g. due to occlusion). Additionally, we also identify marker outliers due to tracking errors by investigating their acceleration profiles. Finally, we propose a training regime based on representation learning and data augmentation, by training the model on data with masking. The masking schemes aim to mimic the occluded and noisy markers often observed in the real data. Finally, we show that our method achieves high accuracy on multiple metrics across various datasets. Extensive comparison shows our method outperforms state-of-the-art methods in terms of prediction accuracy of occluded marker position error by approximately 20%, which leads to a further error reduction on the reconstructed joint rotations and positions by 30%. The code and data for this paper are available at https://github.com/non-void/LocalMoCap.

* Siggraph Asia 2023 Conference Paper 
Viaarxiv icon

Learning Photometric Feature Transform for Free-form Object Scan

Aug 07, 2023
Xiang Feng, Kaizhang Kang, Fan Pei, Huakeng Ding, Jinjiang You, Ping Tan, Kun Zhou, Hongzhi Wu

Figure 1 for Learning Photometric Feature Transform for Free-form Object Scan
Figure 2 for Learning Photometric Feature Transform for Free-form Object Scan
Figure 3 for Learning Photometric Feature Transform for Free-form Object Scan
Figure 4 for Learning Photometric Feature Transform for Free-form Object Scan

We propose a novel framework to automatically learn to aggregate and transform photometric measurements from multiple unstructured views into spatially distinctive and view-invariant low-level features, which are fed to a multi-view stereo method to enhance 3D reconstruction. The illumination conditions during acquisition and the feature transform are jointly trained on a large amount of synthetic data. We further build a system to reconstruct the geometry and anisotropic reflectance of a variety of challenging objects from hand-held scans. The effectiveness of the system is demonstrated with a lightweight prototype, consisting of a camera and an array of LEDs, as well as an off-the-shelf tablet. Our results are validated against reconstructions from a professional 3D scanner and photographs, and compare favorably with state-of-the-art techniques.

Viaarxiv icon

Alleviating the Long-Tail Problem in Conversational Recommender Systems

Jul 21, 2023
Zhipeng Zhao, Kun Zhou, Xiaolei Wang, Wayne Xin Zhao, Fan Pan, Zhao Cao, Ji-Rong Wen

Figure 1 for Alleviating the Long-Tail Problem in Conversational Recommender Systems
Figure 2 for Alleviating the Long-Tail Problem in Conversational Recommender Systems
Figure 3 for Alleviating the Long-Tail Problem in Conversational Recommender Systems
Figure 4 for Alleviating the Long-Tail Problem in Conversational Recommender Systems

Conversational recommender systems (CRS) aim to provide the recommendation service via natural language conversations. To develop an effective CRS, high-quality CRS datasets are very crucial. However, existing CRS datasets suffer from the long-tail issue, \ie a large proportion of items are rarely (or even never) mentioned in the conversations, which are called long-tail items. As a result, the CRSs trained on these datasets tend to recommend frequent items, and the diversity of the recommended items would be largely reduced, making users easier to get bored. To address this issue, this paper presents \textbf{LOT-CRS}, a novel framework that focuses on simulating and utilizing a balanced CRS dataset (\ie covering all the items evenly) for improving \textbf{LO}ng-\textbf{T}ail recommendation performance of CRSs. In our approach, we design two pre-training tasks to enhance the understanding of simulated conversation for long-tail items, and adopt retrieval-augmented fine-tuning with label smoothness strategy to further improve the recommendation of long-tail items. Extensive experiments on two public CRS datasets have demonstrated the effectiveness and extensibility of our approach, especially on long-tail recommendation.

* work in progress 
Viaarxiv icon

Adaptive Local Basis Functions for Shape Completion

Jul 17, 2023
Hui Ying, Tianjia Shao, He Wang, Yin Yang, Kun Zhou

Figure 1 for Adaptive Local Basis Functions for Shape Completion
Figure 2 for Adaptive Local Basis Functions for Shape Completion
Figure 3 for Adaptive Local Basis Functions for Shape Completion
Figure 4 for Adaptive Local Basis Functions for Shape Completion

In this paper, we focus on the task of 3D shape completion from partial point clouds using deep implicit functions. Existing methods seek to use voxelized basis functions or the ones from a certain family of functions (e.g., Gaussians), which leads to high computational costs or limited shape expressivity. On the contrary, our method employs adaptive local basis functions, which are learned end-to-end and not restricted in certain forms. Based on those basis functions, a local-to-local shape completion framework is presented. Our algorithm learns sparse parameterization with a small number of basis functions while preserving local geometric details during completion. Quantitative and qualitative experiments demonstrate that our method outperforms the state-of-the-art methods in shape completion, detail preservation, generalization to unseen geometries, and computational cost. Code and data are at https://github.com/yinghdb/Adaptive-Local-Basis-Functions.

* In SIGGRAPH 2023 
Viaarxiv icon

JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving

Jun 19, 2023
Wayne Xin Zhao, Kun Zhou, Beichen Zhang, Zheng Gong, Zhipeng Chen, Yuanhang Zhou, Ji-Rong Wen, Jing Sha, Shijin Wang, Cong Liu, Guoping Hu

Figure 1 for JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving
Figure 2 for JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving
Figure 3 for JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving
Figure 4 for JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving

Although pre-trained language models~(PLMs) have recently advanced the research progress in mathematical reasoning, they are not specially designed as a capable multi-task solver, suffering from high cost for multi-task deployment (\eg a model copy for a task) and inferior performance on complex mathematical problems in practical applications. To address these issues, in this paper, we propose \textbf{JiuZhang~2.0}, a unified Chinese PLM specially for multi-task mathematical problem solving. Our idea is to maintain a moderate-sized model and employ the \emph{cross-task knowledge sharing} to improve the model capacity in a multi-task setting. Specially, we construct a Mixture-of-Experts~(MoE) architecture for modeling mathematical text, so as to capture the common mathematical knowledge across tasks. For optimizing the MoE architecture, we design \emph{multi-task continual pre-training} and \emph{multi-task fine-tuning} strategies for multi-task adaptation. These training strategies can effectively decompose the knowledge from the task data and establish the cross-task sharing via expert networks. In order to further improve the general capacity of solving different complex tasks, we leverage large language models~(LLMs) as complementary models to iteratively refine the generated solution by our PLM, via in-context learning. Extensive experiments have demonstrated the effectiveness of our model.

* Accepted by KDD 2023 ADS track, the 2.0 version of JiuZhang (arxiv:2206.06315v1) 
Viaarxiv icon

From NeRFLiX to NeRFLiX++: A General NeRF-Agnostic Restorer Paradigm

Jun 10, 2023
Kun Zhou, Wenbo Li, Nianjuan Jiang, Xiaoguang Han, Jiangbo Lu

Figure 1 for From NeRFLiX to NeRFLiX++: A General NeRF-Agnostic Restorer Paradigm
Figure 2 for From NeRFLiX to NeRFLiX++: A General NeRF-Agnostic Restorer Paradigm
Figure 3 for From NeRFLiX to NeRFLiX++: A General NeRF-Agnostic Restorer Paradigm
Figure 4 for From NeRFLiX to NeRFLiX++: A General NeRF-Agnostic Restorer Paradigm

Neural radiance fields (NeRF) have shown great success in novel view synthesis. However, recovering high-quality details from real-world scenes is still challenging for the existing NeRF-based approaches, due to the potential imperfect calibration information and scene representation inaccuracy. Even with high-quality training frames, the synthetic novel views produced by NeRF models still suffer from notable rendering artifacts, such as noise and blur. To address this, we propose NeRFLiX, a general NeRF-agnostic restorer paradigm that learns a degradation-driven inter-viewpoint mixer. Specially, we design a NeRF-style degradation modeling approach and construct large-scale training data, enabling the possibility of effectively removing NeRF-native rendering artifacts for deep neural networks. Moreover, beyond the degradation removal, we propose an inter-viewpoint aggregation framework that fuses highly related high-quality training images, pushing the performance of cutting-edge NeRF models to entirely new levels and producing highly photo-realistic synthetic views. Based on this paradigm, we further present NeRFLiX++ with a stronger two-stage NeRF degradation simulator and a faster inter-viewpoint mixer, achieving superior performance with significantly improved computational efficiency. Notably, NeRFLiX++ is capable of restoring photo-realistic ultra-high-resolution outputs from noisy low-resolution NeRF-rendered views. Extensive experiments demonstrate the excellent restoration ability of NeRFLiX++ on various novel view synthesis benchmarks.

* 15 pages, 15 figures. arXiv admin note: text overlap with arXiv:2303.06919 
Viaarxiv icon

Improving Conversational Recommendation Systems via Counterfactual Data Simulation

Jun 05, 2023
Xiaolei Wang, Kun Zhou, Xinyu Tang, Wayne Xin Zhao, Fan Pan, Zhao Cao, Ji-Rong Wen

Figure 1 for Improving Conversational Recommendation Systems via Counterfactual Data Simulation
Figure 2 for Improving Conversational Recommendation Systems via Counterfactual Data Simulation
Figure 3 for Improving Conversational Recommendation Systems via Counterfactual Data Simulation
Figure 4 for Improving Conversational Recommendation Systems via Counterfactual Data Simulation

Conversational recommender systems (CRSs) aim to provide recommendation services via natural language conversations. Although a number of approaches have been proposed for developing capable CRSs, they typically rely on sufficient training data for training. Since it is difficult to annotate recommendation-oriented dialogue datasets, existing CRS approaches often suffer from the issue of insufficient training due to the scarcity of training data. To address this issue, in this paper, we propose a CounterFactual data simulation approach for CRS, named CFCRS, to alleviate the issue of data scarcity in CRSs. Our approach is developed based on the framework of counterfactual data augmentation, which gradually incorporates the rewriting to the user preference from a real dialogue without interfering with the entire conversation flow. To develop our approach, we characterize user preference and organize the conversation flow by the entities involved in the dialogue, and design a multi-stage recommendation dialogue simulator based on a conversation flow language model. Under the guidance of the learned user preference and dialogue schema, the flow language model can produce reasonable, coherent conversation flows, which can be further realized into complete dialogues. Based on the simulator, we perform the intervention at the representations of the interacted entities of target users, and design an adversarial training method with a curriculum schedule that can gradually optimize the data augmentation strategy. Extensive experiments show that our approach can consistently boost the performance of several competitive CRSs, and outperform other data augmentation methods, especially when the training data is limited. Our code is publicly available at https://github.com/RUCAIBox/CFCRS.

* Accepted by KDD 2023. Code: https://github.com/RUCAIBox/CFCRS 
Viaarxiv icon

Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning

Jun 04, 2023
Beichen Zhang, Kun Zhou, Xilin Wei, Wayne Xin Zhao, Jing Sha, Shijin Wang, Ji-Rong Wen

Figure 1 for Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning
Figure 2 for Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning
Figure 3 for Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning
Figure 4 for Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning

Chain-of-thought prompting~(CoT) and tool augmentation have been validated in recent work as effective practices for improving large language models~(LLMs) to perform step-by-step reasoning on complex math-related tasks. However, most existing math reasoning datasets may be not able to fully evaluate and analyze the ability of LLMs in manipulating tools and performing reasoning, as they may only require very few invocations of tools or miss annotations for evaluating intermediate reasoning steps. To address the issue, we construct \textbf{CARP}, a new Chinese dataset consisting of 4,886 computation-intensive algebra problems with formulated annotations on intermediate steps. In CARP, we test four LLMs with CoT prompting, and find that they are all prone to make mistakes at the early steps of the solution, leading to wrong answers. Based on this finding, we propose a new approach that can deliberate the reasoning steps with tool interfaces, namely \textbf{DELI}. In DELI, we first initialize a step-by-step solution based on retrieved exemplars, then iterate two deliberation procedures that check and refine the intermediate steps of the generated solution, from the perspectives of tool manipulation and natural language reasoning, until obtaining converged solutions or reaching the maximum turn. Experimental results on CARP and six other datasets show that the proposed DELI mostly outperforms competitive baselines, and can further boost the performance of existing CoT methods. Our data and code are available in \url{https://github.com/RUCAIBox/CARP}.

* 17 pages, working in progress 
Viaarxiv icon

BEDRF: Bidirectional Edge Diffraction Response Function for Interactive Sound Propagation

Jun 03, 2023
Chunxiao Cao, Zili An, Zhong Ren, Dinesh Manocha, Kun Zhou

Figure 1 for BEDRF: Bidirectional Edge Diffraction Response Function for Interactive Sound Propagation
Figure 2 for BEDRF: Bidirectional Edge Diffraction Response Function for Interactive Sound Propagation
Figure 3 for BEDRF: Bidirectional Edge Diffraction Response Function for Interactive Sound Propagation
Figure 4 for BEDRF: Bidirectional Edge Diffraction Response Function for Interactive Sound Propagation

We introduce bidirectional edge diffraction response function (BEDRF), a new approach to model wave diffraction around edges with path tracing. The diffraction part of the wave is expressed as an integration on path space, and the wave-edge interaction is expressed using only the localized information around points on the edge similar to a bidirectional scattering distribution function (BSDF) for visual rendering. For an infinite single wedge, our model generates the same result as the analytic solution. Our approach can be easily integrated into interactive geometric sound propagation algorithms that use path tracing to compute specular and diffuse reflections. Our resulting propagation algorithm can approximate complex wave propagation phenomena involving high-order diffraction, and is able to handle dynamic, deformable objects and moving sources and listeners. We highlight the performance of our approach in different scenarios to generate smooth auralization.

Viaarxiv icon