Novelty, akin to gene mutation in evolution, opens possibilities for scholarly advancement. Although peer review remains the gold standard for evaluating novelty in scholarly communication and resource allocation, the vast volume of submissions necessitates an automated measure of scholarly novelty. Adopting a perspective that views novelty as the atypical combination of existing knowledge, we introduce an information-theoretic measure of novelty in scholarly publications. This measure quantifies the degree of 'surprise' perceived by a language model that represents the word distribution of scholarly discourse. The proposed measure is accompanied by face and construct validity evidence; the former demonstrates correspondence to scientific common sense, and the latter is endorsed through alignment with novelty evaluations from a select panel of domain experts. Additionally, characterized by its interpretability, fine granularity, and accessibility, this measure addresses gaps prevalent in existing methods. We believe this measure holds great potential to benefit editors, stakeholders, and policymakers, and it provides a reliable lens for examining the relationship between novelty and academic dynamics such as creativity, interdisciplinarity, and scientific advances.
Authorship identification has proven unsettlingly effective in inferring the identity of the author of an unsigned document, even when sensitive personal information has been carefully omitted. In the digital era, individuals leave a lasting digital footprint through their written content, whether it is posted on social media, stored on their employer's computers, or located elsewhere. When individuals need to communicate publicly yet wish to remain anonymous, there is little available to protect them from unwanted authorship identification. This unprecedented threat to privacy is evident in scenarios such as whistle-blowing. Proposed defenses against authorship identification attacks primarily aim to obfuscate one's writing style, thereby making it unlinkable to their pre-existing writing, while concurrently preserving the original meaning and grammatical integrity. The presented work offers a comprehensive review of the advancements in this research area spanning over the past two decades and beyond. It emphasizes the methodological frameworks of modification and generation-based strategies devised to evade authorship identification attacks, highlighting joint efforts from the differential privacy community. Limitations of current research are discussed, with a spotlight on open challenges and potential research avenues.
Lu Xun and Zhou Zuoren stand as two of the most influential writers in modern Chinese literature. Beyond their familial ties as brothers, they were also intimate collaborators during the nascent stages of their writing careers. This research employs quantitative methods to revisit three disputed essays pseudonymously published by the brothers in 1912. Our stylometric analysis uses an interpretable authorship attribution model to investigate the essays' authorship and examine the brothers' respective writing styles. Our findings suggest that 'Looking at the Country of China' was authored by Lu Xun. Moreover, 'People of Yue, Forget Not Your Ancestors' Instructions' seems to be either predominantly authored or extensively revised by Lu Xun given its notable stylistic similarities to 'Looking at the Land of Yue,' a piece Zhou Zuoren recognized as his own, but edited by Lu Xun. The third essay, 'Where Has the Character of the Republic Gone?,' exhibits a 'diluted', mixed writing style, suggesting thorough collaboration. We offer visual representations of essay features to facilitate a nuanced and intuitive understanding. We have uncovered evidence suggesting Lu Xun's covert engagement with social issues during his purported 'silent era' and provided insights into the brothers' formative intellectual trajectories.
Authorship identification ascertains the authorship of texts whose origins remain undisclosed. That authorship identification techniques work as reliably as they do has been attributed to the fact that authorial style is properly captured and represented. Although modern authorship identification methods have evolved significantly over the years and have proven effective in distinguishing authorial styles, the generalization of stylistic features across domains has not been systematically reviewed. The presented work addresses the challenge of enhancing the generalization of stylistic representations in authorship identification, particularly when there are discrepancies between training and testing samples. A comprehensive review of empirical studies was conducted, focusing on various stylistic features and their effectiveness in representing an author's style. The influencing factors such as topic, genre, and register on writing style were also explored, along with strategies to mitigate their impact. While some stylistic features, like character n-grams and function words, have proven to be robust and discriminative, others, such as content words, can introduce biases and hinder cross-domain generalization. Representations learned using deep learning models, especially those incorporating character n-grams and syntactic information, show promise in enhancing representation generalization. The findings underscore the importance of selecting appropriate stylistic features for authorship identification, especially in cross-domain scenarios. The recognition of the strengths and weaknesses of various linguistic features paves the way for more accurate authorship identification in diverse contexts.
Maintaining anonymity while communicating using natural language remains a challenge. Standard authorship attribution techniques that analyze candidate authors' writing styles achieve uncomfortably high accuracy even when the number of candidate authors is high. Adversarial stylometry defends against authorship attribution with the goal of preventing unwanted deanonymization. This paper reproduces and replicates experiments in a seminal study of defenses against authorship attribution (Brennan et al., 2012). We are able to successfully reproduce and replicate the original results, although we conclude that the effectiveness of the defenses studied is overstated due to a lack of a control group in the original study. In our replication, we find new evidence suggesting that an entirely automatic method, round-trip translation, merits re-examination as it appears to reduce the effectiveness of established authorship attribution methods.
Drones in many applications need the ability to fly fully or partially autonomously to accomplish their mission. To allow these fully/partially autonomous flights, first, the drone needs to be able to locate itself constantly. Then the navigation command signal would be generated and passed on to the controller unit of the drone. In this paper, we propose a localization scheme for drones called iDROP (Robust Localization for Indoor Navigation of Drones with Optimized Beacon Placement) that is specifically devised for GPS-denied environments (e.g., indoor spaces). Instead of GPS signals, iDROP relies on speaker-generated ultrasonic acoustic signals to enable a drone to estimate its location. In general, localization error is due to two factors: the ranging error and the error induced by relative geometry between the transmitters and the receiver. iDROP mitigates these two types of errors and provides a high-precision three-dimensional localization scheme for drones. iDROP employs a waveform that is robust against multi-path fading. Moreover, by placing beacons in optimal locations, it reduces the localization error induced by the relative geometry between the transmitters and the receiver.
For many applications, drones are required to operate entirely or partially autonomously. To fly completely or partially on their own, drones need access to location services to get navigation commands. While using the Global Positioning System (GPS) is an obvious choice, GPS is not always available, can be spoofed or jammed, and is highly error-prone for indoor and underground environments. The ranging method using beacons is one of the popular methods for localization, specially for indoor environments. In general, localization error in this class is due to two factors: the ranging error and the error induced by the relative geometry between the beacons and the target object to localize. This paper proposes OPTILOD (Optimal Beacon Placement for High-Accuracy Indoor Localization of Drones), an optimization algorithm for the optimal placement of beacons deployed in three-dimensional indoor environments. OPTILOD leverages advances in Evolutionary Algorithms to compute the minimum number of beacons and their optimal placement to minimize the localization error. These problems belong to the Mixed Integer Programming (MIP) class and are both considered NP-Hard. Despite that, OPTILOD can provide multiple optimal beacon configurations that minimize the localization error and the number of deployed beacons concurrently and time efficiently.
In many scenarios, unmanned aerial vehicles (UAVs), aka drones, need to have the capability of autonomous flying to carry out their mission successfully. In order to allow these autonomous flights, drones need to know their location constantly. Then, based on the current position and the final destination, navigation commands will be generated and drones will be guided to their destination. Localization can be easily carried out in outdoor environments using GPS signals and drone inertial measurement units (IMUs). However, such an approach is not feasible in indoor environments or GPS-denied areas. In this paper, we propose a localization scheme for drones called PILOT (High-Precision Indoor Localization for Autonomous Drones) that is specifically designed for indoor environments. PILOT relies on ultrasonic acoustic signals to estimate the target drone's location. In order to have a precise final estimation of the drone's location, PILOT deploys a three-stage localization scheme. The first two stages provide robustness against the multi-path fading effect of indoor environments and mitigate the ranging error. Then, in the third stage, PILOT deploys a simple yet effective technique to reduce the localization error induced by the relative geometry between transmitters and receivers and significantly reduces the height estimation error. The performance of PILOT was assessed under different scenarios and the results indicate that PILOT achieves centimeter-level accuracy for three-dimensional localization of drones.
Navigating in environments where the GPS signal is unavailable, weak, purposefully blocked, or spoofed has become crucial for a wide range of applications. A prime example is autonomous navigation for drones in indoor environments: to fly fully or partially autonomously, drones demand accurate and frequent updates of their locations. This paper proposes a Robust Acoustic Indoor Localization (RAIL) scheme for drones designed explicitly for GPS-denied environments. Instead of depending on GPS, RAIL leverages ultrasonic acoustic signals to achieve precise localization using a novel hybrid Frequency Hopping Code Division Multiple Access (FH-CDMA) technique. Contrary to previous approaches, RAIL is able to both overcome the multi-path fading effect and provide precise signal separation in the receiver. Comprehensive simulations and experiments using a prototype implementation demonstrate that RAIL provides high-accuracy three-dimensional localization with an average error of less than $1.5$~cm.
There has been a rapid growth in the deployment of Unmanned Aerial Vehicles (UAVs) in various applications ranging from vital safety-of-life such as surveillance and reconnaissance at nuclear power plants to entertainment and hobby applications. While popular, drones can pose serious security threats that can be unintentional or intentional. Thus, there is an urgent need for real-time accurate detection and classification of drones. In this article, we perform a survey of drone detection approaches presenting their advantages and limitations. We analyze detection techniques that employ radars, acoustic and optical sensors, and emitted radio frequency (RF) signals. We compare their performance, accuracy, and cost, concluding that combining multiple sensing modalities might be the path forward.