Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qi Li

Empirical Analysis for Unsupervised Universal Dependency Parse Tree Aggregation

Apr 03, 2024

Adithya Kulkarni, Oliver Eulenstein, Qi Li

Abstract:Dependency parsing is an essential task in NLP, and the quality of dependency parsers is crucial for many downstream tasks. Parsers' quality often varies depending on the domain and the language involved. Therefore, it is essential to combat the issue of varying quality to achieve stable performance. In various NLP tasks, aggregation methods are used for post-processing aggregation and have been shown to combat the issue of varying quality. However, aggregation methods for post-processing aggregation have not been sufficiently studied in dependency parsing tasks. In an extensive empirical study, we compare different unsupervised post-processing aggregation methods to identify the most suitable dependency tree structure aggregation method.

Via

Access Paper or Ask Questions

VideoBadminton: A Video Dataset for Badminton Action Recognition

Mar 19, 2024

Qi Li, Tzu-Chen Chiu, Hsiang-Wei Huang, Min-Te Sun, Wei-Shinn Ku

Figure 1 for VideoBadminton: A Video Dataset for Badminton Action Recognition

Figure 2 for VideoBadminton: A Video Dataset for Badminton Action Recognition

Figure 3 for VideoBadminton: A Video Dataset for Badminton Action Recognition

Figure 4 for VideoBadminton: A Video Dataset for Badminton Action Recognition

Abstract:In the dynamic and evolving field of computer vision, action recognition has become a key focus, especially with the advent of sophisticated methodologies like Convolutional Neural Networks (CNNs), Convolutional 3D, Transformer, and spatial-temporal feature fusion. These technologies have shown promising results on well-established benchmarks but face unique challenges in real-world applications, particularly in sports analysis, where the precise decomposition of activities and the distinction of subtly different actions are crucial. Existing datasets like UCF101, HMDB51, and Kinetics have offered a diverse range of video data for various scenarios. However, there's an increasing need for fine-grained video datasets that capture detailed categorizations and nuances within broader action categories. In this paper, we introduce the VideoBadminton dataset derived from high-quality badminton footage. Through an exhaustive evaluation of leading methodologies on this dataset, this study aims to advance the field of action recognition, particularly in badminton sports. The introduction of VideoBadminton could not only serve for badminton action recognition but also provide a dataset for recognizing fine-grained actions. The insights gained from these evaluations are expected to catalyze further research in action comprehension, especially within sports contexts.

Via

Access Paper or Ask Questions

Has Approximate Machine Unlearning been evaluated properly? From Auditing to Side Effects

Mar 19, 2024

Cheng-Long Wang, Qi Li, Zihang Xiang, Di Wang

Abstract:The growing concerns surrounding data privacy and security have underscored the critical necessity for machine unlearning--aimed at fully removing data lineage from machine learning models. MLaaS providers expect this to be their ultimate safeguard for regulatory compliance. Despite its critical importance, the pace at which privacy communities have been developing and implementing strong methods to verify the effectiveness of machine unlearning has been disappointingly slow, with this vital area often receiving insufficient focus. This paper seeks to address this shortfall by introducing well-defined and effective metrics for black-box unlearning auditing tasks. We transform the auditing challenge into a question of non-membership inference and develop efficient metrics for auditing. By relying exclusively on the original and unlearned models--eliminating the need to train additional shadow models--our approach simplifies the evaluation of unlearning at the individual data point level. Utilizing these metrics, we conduct an in-depth analysis of current approximate machine unlearning algorithms, identifying three key directions where these approaches fall short: utility, resilience, and equity. Our aim is that this work will greatly improve our understanding of approximate machine unlearning methods, taking a significant stride towards converting the theoretical right to data erasure into a auditable reality.

Via

Access Paper or Ask Questions

Brain-on-Switch: Towards Advanced Intelligent Network Data Plane via NN-Driven Traffic Analysis at Line-Speed

Mar 17, 2024

Jinzhu Yan, Haotian Xu, Zhuotao Liu, Qi Li, Ke Xu, Mingwei Xu, Jianping Wu

Figure 1 for Brain-on-Switch: Towards Advanced Intelligent Network Data Plane via NN-Driven Traffic Analysis at Line-Speed

Figure 2 for Brain-on-Switch: Towards Advanced Intelligent Network Data Plane via NN-Driven Traffic Analysis at Line-Speed

Figure 3 for Brain-on-Switch: Towards Advanced Intelligent Network Data Plane via NN-Driven Traffic Analysis at Line-Speed

Figure 4 for Brain-on-Switch: Towards Advanced Intelligent Network Data Plane via NN-Driven Traffic Analysis at Line-Speed

Abstract:The emerging programmable networks sparked significant research on Intelligent Network Data Plane (INDP), which achieves learning-based traffic analysis at line-speed. Prior art in INDP focus on deploying tree/forest models on the data plane. We observe a fundamental limitation in tree-based INDP approaches: although it is possible to represent even larger tree/forest tables on the data plane, the flow features that are computable on the data plane are fundamentally limited by hardware constraints. In this paper, we present BoS to push the boundaries of INDP by enabling Neural Network (NN) driven traffic analysis at line-speed. Many types of NNs (such as Recurrent Neural Network (RNN), and transformers) that are designed to work with sequential data have advantages over tree-based models, because they can take raw network data as input without complex feature computations on the fly. However, the challenge is significant: the recurrent computation scheme used in RNN inference is fundamentally different from the match-action paradigm used on the network data plane. BoS addresses this challenge by (i) designing a novel data plane friendly RNN architecture that can execute unlimited RNN time steps with limited data plane stages, effectively achieving line-speed RNN inference; and (ii) complementing the on-switch RNN model with an off-switch transformer-based traffic analysis module to further boost the overall performance. We implement a prototype of BoS using a P4 programmable switch as our data plane, and extensively evaluate it over multiple traffic analysis tasks. The results show that BoS outperforms state-of-the-art in both analysis accuracy and scalability.

* 12 pages body, 22 pages total, 14 figures, accepted by the 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI'24)

Via

Access Paper or Ask Questions

Pencil: Private and Extensible Collaborative Learning without the Non-Colluding Assumption

Mar 17, 2024

Xuanqi Liu, Zhuotao Liu, Qi Li, Ke Xu, Mingwei Xu

Figure 1 for Pencil: Private and Extensible Collaborative Learning without the Non-Colluding Assumption

Figure 2 for Pencil: Private and Extensible Collaborative Learning without the Non-Colluding Assumption

Figure 3 for Pencil: Private and Extensible Collaborative Learning without the Non-Colluding Assumption

Figure 4 for Pencil: Private and Extensible Collaborative Learning without the Non-Colluding Assumption

Abstract:The escalating focus on data privacy poses significant challenges for collaborative neural network training, where data ownership and model training/deployment responsibilities reside with distinct entities. Our community has made substantial contributions to addressing this challenge, proposing various approaches such as federated learning (FL) and privacy-preserving machine learning based on cryptographic constructs like homomorphic encryption (HE) and secure multiparty computation (MPC). However, FL completely overlooks model privacy, and HE has limited extensibility (confined to only one data provider). While the state-of-the-art MPC frameworks provide reasonable throughput and simultaneously ensure model/data privacy, they rely on a critical non-colluding assumption on the computing servers, and relaxing this assumption is still an open problem. In this paper, we present Pencil, the first private training framework for collaborative learning that simultaneously offers data privacy, model privacy, and extensibility to multiple data providers, without relying on the non-colluding assumption. Our fundamental design principle is to construct the n-party collaborative training protocol based on an efficient two-party protocol, and meanwhile ensuring that switching to different data providers during model training introduces no extra cost. We introduce several novel cryptographic protocols to realize this design principle and conduct a rigorous security and privacy analysis. Our comprehensive evaluations of Pencil demonstrate that (i) models trained in plaintext and models trained privately using Pencil exhibit nearly identical test accuracies; (ii) The training overhead of Pencil is greatly reduced: Pencil achieves 10 ~ 260x higher throughput and 2 orders of magnitude less communication than prior art; (iii) Pencil is resilient against both existing and adaptive (white-box) attacks.

* Proceedings 2024 Network and Distributed System Security Symposium (2024)
* Network and Distributed System Security Symposium (NDSS) 2024

Via

Access Paper or Ask Questions

Deep Contrastive Multi-view Clustering under Semantic Feature Guidance

Mar 09, 2024

Siwen Liu, Jinyan Liu, Hanning Yuan, Qi Li, Jing Geng, Ziqiang Yuan, Huaxu Han

Figure 1 for Deep Contrastive Multi-view Clustering under Semantic Feature Guidance

Figure 2 for Deep Contrastive Multi-view Clustering under Semantic Feature Guidance

Figure 3 for Deep Contrastive Multi-view Clustering under Semantic Feature Guidance

Figure 4 for Deep Contrastive Multi-view Clustering under Semantic Feature Guidance

Abstract:Contrastive learning has achieved promising performance in the field of multi-view clustering recently. However, the positive and negative sample construction mechanisms ignoring semantic consistency lead to false negative pairs, limiting the performance of existing algorithms from further improvement. To solve this problem, we propose a multi-view clustering framework named Deep Contrastive Multi-view Clustering under Semantic feature guidance (DCMCS) to alleviate the influence of false negative pairs. Specifically, view-specific features are firstly extracted from raw features and fused to obtain fusion view features according to view importance. To mitigate the interference of view-private information, specific view and fusion view semantic features are learned by cluster-level contrastive learning and concatenated to measure the semantic similarity of instances. By minimizing instance-level contrastive loss weighted by semantic similarity, DCMCS adaptively weakens contrastive leaning between false negative pairs. Experimental results on several public datasets demonstrate the proposed framework outperforms the state-of-the-art methods.

Via

Access Paper or Ask Questions

AceMap: Knowledge Discovery through Academic Graph

Mar 05, 2024

Xinbing Wang, Luoyi Fu, Xiaoying Gan, Ying Wen, Guanjie Zheng, Jiaxin Ding, Liyao Xiang, Nanyang Ye, Meng Jin, Shiyu Liang(+16 more)

Figure 1 for AceMap: Knowledge Discovery through Academic Graph

Figure 2 for AceMap: Knowledge Discovery through Academic Graph

Figure 3 for AceMap: Knowledge Discovery through Academic Graph

Figure 4 for AceMap: Knowledge Discovery through Academic Graph

Abstract:The exponential growth of scientific literature requires effective management and extraction of valuable insights. While existing scientific search engines excel at delivering search results based on relational databases, they often neglect the analysis of collaborations between scientific entities and the evolution of ideas, as well as the in-depth analysis of content within scientific publications. The representation of heterogeneous graphs and the effective measurement, analysis, and mining of such graphs pose significant challenges. To address these challenges, we present AceMap, an academic system designed for knowledge discovery through academic graph. We present advanced database construction techniques to build the comprehensive AceMap database with large-scale academic publications that contain rich visual, textual, and numerical information. AceMap also employs innovative visualization, quantification, and analysis methods to explore associations and logical relationships among academic entities. AceMap introduces large-scale academic network visualization techniques centered on nebular graphs, providing a comprehensive view of academic networks from multiple perspectives. In addition, AceMap proposes a unified metric based on structural entropy to quantitatively measure the knowledge content of different academic entities. Moreover, AceMap provides advanced analysis capabilities, including tracing the evolution of academic ideas through citation relationships and concept co-occurrence, and generating concise summaries informed by this evolutionary process. In addition, AceMap uses machine reading methods to generate potential new ideas at the intersection of different fields. Exploring the integration of large language models and knowledge graphs is a promising direction for future research in idea evolution. Please visit \url{https://www.acemap.info} for further exploration.

* Technical Report for AceMap (https://www.acemap.info)

Via

Access Paper or Ask Questions

Defending Against Data Reconstruction Attacks in Federated Learning: An Information Theory Approach

Mar 02, 2024

Qi Tan, Qi Li, Yi Zhao, Zhuotao Liu, Xiaobing Guo, Ke Xu

Figure 1 for Defending Against Data Reconstruction Attacks in Federated Learning: An Information Theory Approach

Figure 2 for Defending Against Data Reconstruction Attacks in Federated Learning: An Information Theory Approach

Figure 3 for Defending Against Data Reconstruction Attacks in Federated Learning: An Information Theory Approach

Figure 4 for Defending Against Data Reconstruction Attacks in Federated Learning: An Information Theory Approach

Abstract:Federated Learning (FL) trains a black-box and high-dimensional model among different clients by exchanging parameters instead of direct data sharing, which mitigates the privacy leak incurred by machine learning. However, FL still suffers from membership inference attacks (MIA) or data reconstruction attacks (DRA). In particular, an attacker can extract the information from local datasets by constructing DRA, which cannot be effectively throttled by existing techniques, e.g., Differential Privacy (DP). In this paper, we aim to ensure a strong privacy guarantee for FL under DRA. We prove that reconstruction errors under DRA are constrained by the information acquired by an attacker, which means that constraining the transmitted information can effectively throttle DRA. To quantify the information leakage incurred by FL, we establish a channel model, which depends on the upper bound of joint mutual information between the local dataset and multiple transmitted parameters. Moreover, the channel model indicates that the transmitted information can be constrained through data space operation, which can improve training efficiency and the model accuracy under constrained information. According to the channel model, we propose algorithms to constrain the information transmitted in a single round of local training. With a limited number of training rounds, the algorithms ensure that the total amount of transmitted information is limited. Furthermore, our channel model can be applied to various privacy-enhancing techniques (such as DP) to enhance privacy guarantees against DRA. Extensive experiments with real-world datasets validate the effectiveness of our methods.

* Accepted by USENIX Security '24

Via

Access Paper or Ask Questions

Re-Examine Distantly Supervised NER: A New Benchmark and a Simple Approach

Feb 26, 2024

Yuepei Li, Kang Zhou, Qiao Qiao, Qing Wang, Qi Li

Figure 1 for Re-Examine Distantly Supervised NER: A New Benchmark and a Simple Approach

Figure 2 for Re-Examine Distantly Supervised NER: A New Benchmark and a Simple Approach

Figure 3 for Re-Examine Distantly Supervised NER: A New Benchmark and a Simple Approach

Figure 4 for Re-Examine Distantly Supervised NER: A New Benchmark and a Simple Approach

Abstract:This paper delves into Named Entity Recognition (NER) under the framework of Distant Supervision (DS-NER), where the main challenge lies in the compromised quality of labels due to inherent errors such as false positives, false negatives, and positive type errors. We critically assess the efficacy of current DS-NER methodologies using a real-world benchmark dataset named QTL, revealing that their performance often does not meet expectations. To tackle the prevalent issue of label noise, we introduce a simple yet effective approach, Curriculum-based Positive-Unlabeled Learning CuPUL, which strategically starts on "easy" and cleaner samples during the training process to enhance model resilience to noisy samples. Our empirical results highlight the capability of CuPUL to significantly reduce the impact of noisy labels and outperform existing methods. QTL dataset and our code is available on GitHub.

Via

Access Paper or Ask Questions

Low Complexity Turbo SIC-MMSE Detection for Orthogonal Time Frequency Space Modulation

Jan 19, 2024

Qi Li, Jinhong Yuan, Min Qiu, Shuangyang Li, Yixuan Xie

Figure 1 for Low Complexity Turbo SIC-MMSE Detection for Orthogonal Time Frequency Space Modulation

Figure 2 for Low Complexity Turbo SIC-MMSE Detection for Orthogonal Time Frequency Space Modulation

Figure 3 for Low Complexity Turbo SIC-MMSE Detection for Orthogonal Time Frequency Space Modulation

Figure 4 for Low Complexity Turbo SIC-MMSE Detection for Orthogonal Time Frequency Space Modulation

Abstract:Recently, orthogonal time frequency space (OTFS) modulation has garnered considerable attention due to its robustness against doubly-selective wireless channels. In this paper, we propose a low-complexity iterative successive interference cancellation based minimum mean squared error (SIC-MMSE) detection algorithm for zero-padded OTFS (ZP-OTFS) modulation. In the proposed algorithm, signals are detected based on layers processed by multiple SIC-MMSE linear filters for each sub-channel, with interference on the targeted signal layer being successively canceled either by hard or soft information. To reduce the complexity of computing individual layer filter coefficients, we also propose a novel filter coefficients recycling approach in place of generating the exact form of MMSE filter weights. Moreover, we design a joint detection and decoding algorithm for ZP-OTFS to enhance error performance. Compared to the conventional SIC-MMSE detection, our proposed algorithms outperform other linear detectors, e.g., maximal ratio combining (MRC), for ZP-OTFS with up to 3 dB gain while maintaining comparable computation complexity.

* 15 pages, 12 figures, accepted by IEEE Transactions on Communications

Via

Access Paper or Ask Questions