Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dilara Yesilbas

Routing in Sparsely-gated Language Models responds to Context

Sep 21, 2024

Stefan Arnold, Marian Fietta, Dilara Yesilbas

Figure 1 for Routing in Sparsely-gated Language Models responds to Context

Figure 2 for Routing in Sparsely-gated Language Models responds to Context

Figure 3 for Routing in Sparsely-gated Language Models responds to Context

Figure 4 for Routing in Sparsely-gated Language Models responds to Context

Abstract:Language Models (LMs) recently incorporate mixture-of-experts layers consisting of a router and a collection of experts to scale up their parameter count given a fixed computational budget. Building on previous efforts indicating that token-expert assignments are predominantly influenced by token identities and positions, we trace routing decisions of similarity-annotated text pairs to evaluate the context sensitivity of learned token-expert assignments. We observe that routing in encoder layers mainly depends on (semantic) associations, but contextual cues provide an additional layer of refinement. Conversely, routing in decoder layers is more variable and markedly less sensitive to context.

Via

Access Paper or Ask Questions

Documentation Practices of Artificial Intelligence

Jun 26, 2024

Stefan Arnold, Dilara Yesilbas, Rene Gröbner, Dominik Riedelbauch, Maik Horn, Sven Weinzierl

Figure 1 for Documentation Practices of Artificial Intelligence

Figure 2 for Documentation Practices of Artificial Intelligence

Figure 3 for Documentation Practices of Artificial Intelligence

Figure 4 for Documentation Practices of Artificial Intelligence

Abstract:Artificial Intelligence (AI) faces persistent challenges in terms of transparency and accountability, which requires rigorous documentation. Through a literature review on documentation practices, we provide an overview of prevailing trends, persistent issues, and the multifaceted interplay of factors influencing the documentation. Our examination of key characteristics such as scope, target audiences, support for multimodality, and level of automation, highlights a dynamic evolution in documentation practices, underscored by a shift towards a more holistic, engaging, and automated documentation.

Via

Access Paper or Ask Questions

Driving Context into Text-to-Text Privatization

Jun 02, 2023

Stefan Arnold, Dilara Yesilbas, Sven Weinzierl

Figure 1 for Driving Context into Text-to-Text Privatization

Figure 2 for Driving Context into Text-to-Text Privatization

Figure 3 for Driving Context into Text-to-Text Privatization

Figure 4 for Driving Context into Text-to-Text Privatization

Abstract:\textit{Metric Differential Privacy} enables text-to-text privatization by adding calibrated noise to the vector of a word derived from an embedding space and projecting this noisy vector back to a discrete vocabulary using a nearest neighbor search. Since words are substituted without context, this mechanism is expected to fall short at finding substitutes for words with ambiguous meanings, such as \textit{'bank'}. To account for these ambiguous words, we leverage a sense embedding and incorporate a sense disambiguation step prior to noise injection. We encompass our modification to the privatization mechanism with an estimation of privacy and utility. For word sense disambiguation on the \textit{Words in Context} dataset, we demonstrate a substantial increase in classification accuracy by $6.05\%$.

Via

Access Paper or Ask Questions

Guiding Text-to-Text Privatization by Syntax

Jun 02, 2023

Stefan Arnold, Dilara Yesilbas, Sven Weinzierl

Abstract:Metric Differential Privacy is a generalization of differential privacy tailored to address the unique challenges of text-to-text privatization. By adding noise to the representation of words in the geometric space of embeddings, words are replaced with words located in the proximity of the noisy representation. Since embeddings are trained based on word co-occurrences, this mechanism ensures that substitutions stem from a common semantic context. Without considering the grammatical category of words, however, this mechanism cannot guarantee that substitutions play similar syntactic roles. We analyze the capability of text-to-text privatization to preserve the grammatical category of words after substitution and find that surrogate texts consist almost exclusively of nouns. Lacking the capability to produce surrogate texts that correlate with the structure of the sensitive texts, we encompass our analysis by transforming the privatization step into a candidate selection problem in which substitutions are directed to words with matching grammatical properties. We demonstrate a substantial improvement in the performance of downstream tasks by up to $4.66\%$ while retaining comparative privacy guarantees.

Via

Access Paper or Ask Questions

Demystifying the Effects of Non-Independence in Federated Learning

Mar 20, 2021

Stefan Arnold, Dilara Yesilbas

Figure 1 for Demystifying the Effects of Non-Independence in Federated Learning

Figure 2 for Demystifying the Effects of Non-Independence in Federated Learning

Figure 3 for Demystifying the Effects of Non-Independence in Federated Learning

Figure 4 for Demystifying the Effects of Non-Independence in Federated Learning

Abstract:Federated Learning (FL) enables statistical models to be built on user-generated data without compromising data security and user privacy. For this reason, FL is well suited for on-device learning from mobile devices where data is abundant and highly privatized. Constrained by the temporal availability of mobile devices, only a subset of devices is accessible to participate in the iterative protocol consisting of training and aggregation. In this study, we take a step toward better understanding the effect of non-independent data distributions arising from block-cyclic sampling. By conducting extensive experiments on visual classification, we measure the effects of block-cyclic sampling (both standalone and in combination with non-balanced block distributions). Specifically, we measure the alterations induced by block-cyclic sampling from the perspective of accuracy, fairness, and convergence rate. Experimental results indicate robustness to cycling over a two-block structure, e.g., due to time zones. In contrast, drawing data samples dependently from a multi-block structure significantly degrades the performance and rate of convergence by up to 26%. Moreover, we find that this performance degeneration is further aggravated by unbalanced block distributions to a point that can no longer be adequately compensated by higher communication and more frequent synchronization.

* 8 pages, 7 figures

Via

Access Paper or Ask Questions