Abstract:Southeast Asia (SEA) is a region rich in linguistic diversity and cultural variety, with over 1,300 indigenous languages and a population of 671 million people. However, prevailing AI models suffer from a significant lack of representation of texts, images, and audio datasets from SEA, compromising the quality of AI models for SEA languages. Evaluating models for SEA languages is challenging due to the scarcity of high-quality datasets, compounded by the dominance of English training data, raising concerns about potential cultural misrepresentation. To address these challenges, we introduce SEACrowd, a collaborative initiative that consolidates a comprehensive resource hub that fills the resource gap by providing standardized corpora in nearly 1,000 SEA languages across three modalities. Through our SEACrowd benchmarks, we assess the quality of AI models on 36 indigenous languages across 13 tasks, offering valuable insights into the current AI landscape in SEA. Furthermore, we propose strategies to facilitate greater AI advancements, maximizing potential utility and resource equity for the future of AI in SEA.
Abstract:Overfitting in RL has become one of the main obstacles to applications in reinforcement learning(RL). Existing methods do not provide explicit semantic constrain for the feature extractor, hindering the agent from learning a unified cross-domain representation and resulting in performance degradation on unseen domains. Besides, abundant data from multiple domains are needed. To address these issues, in this work, we propose prompt-based visual alignment (PVA), a robust framework to mitigate the detrimental domain bias in the image for zero-shot policy transfer. Inspired that Visual-Language Model (VLM) can serve as a bridge to connect both text space and image space, we leverage the semantic information contained in a text sequence as an explicit constraint to train a visual aligner. Thus, the visual aligner can map images from multiple domains to a unified domain and achieve good generalization performance. To better depict semantic information, prompt tuning is applied to learn a sequence of learnable tokens. With explicit constraints of semantic information, PVA can learn unified cross-domain representation under limited access to cross-domain data and achieves great zero-shot generalization ability in unseen domains. We verify PVA on a vision-based autonomous driving task with CARLA simulator. Experiments show that the agent generalizes well on unseen domains under limited access to multi-domain data.
Abstract:Electric vertical-takeoff and landing (eVTOL) aircraft, recognized for their maneuverability and flexibility, offer a promising alternative to our transportation system. However, the operational effectiveness of these aircraft faces many challenges, such as the delicate balance between energy and time efficiency, stemming from unpredictable environmental factors, including wind fields. Mathematical modeling-based approaches have been adopted to plan aircraft flight path in urban wind fields with the goal to save energy and time costs. While effective, they are limited in adapting to dynamic and complex environments. To optimize energy and time efficiency in eVTOL's flight through dynamic wind fields, we introduce a novel path planning method leveraging deep reinforcement learning. We assess our method with extensive experiments, comparing it to Dijkstra's algorithm -- the theoretically optimal approach for determining shortest paths in a weighted graph, where weights represent either energy or time cost. The results show that our method achieves a graceful balance between energy and time efficiency, closely resembling the theoretically optimal values for both objectives.
Abstract:Events describe the state changes of entities. In a document, multiple events are connected by various relations (e.g., Coreference, Temporal, Causal, and Subevent). Therefore, obtaining the connections between events through Event-Event Relation Extraction (ERE) is critical to understand natural language. There are two main problems in the current ERE works: a. Only embeddings of the event triggers are used for event feature representation, ignoring event arguments (e.g., time, place, person, etc.) and their structure within the event. b. The interconnection between relations (e.g., temporal and causal relations usually interact with each other ) is ignored. To solve the above problems, this paper proposes a jointly multiple ERE framework called GraphERE based on Graph-enhanced Event Embeddings. First, we enrich the event embeddings with event argument and structure features by using static AMR graphs and IE graphs; Then, to jointly extract multiple event relations, we use Node Transformer and construct Task-specific Dynamic Event Graphs for each type of relation. Finally, we used a multi-task learning strategy to train the whole framework. Experimental results on the latest MAVEN-ERE dataset validate that GraphERE significantly outperforms existing methods. Further analyses indicate the effectiveness of the graph-enhanced event embeddings and the joint extraction strategy.
Abstract:Events are essential components of speech and texts, describing the changes in the state of entities. The event extraction task aims to identify and classify events and find their participants according to event schemas. Manually predefined event schemas have limited coverage and are hard to migrate across domains. Therefore, the researchers propose Liberal Event Extraction (LEE), which aims to extract events and discover event schemas simultaneously. However, existing LEE models rely heavily on external language knowledge bases and require the manual development of numerous rules for noise removal and knowledge alignment, which is complex and laborious. To this end, we propose a Prompt-based Graph Model for Liberal Event Extraction (PGLEE). Specifically, we use a prompt-based model to obtain candidate triggers and arguments, and then build heterogeneous event graphs to encode the structures within and between events. Experimental results prove that our approach achieves excellent performance with or without predefined event schemas, while the automatically detected event schemas are proven high quality.
Abstract:Large Language Models (LLMs) have shown prominent performance in various downstream tasks in which prompt engineering plays a pivotal role in optimizing LLMs' performance. This paper, not as an overview of current prompt engineering methods, aims to highlight the limitation of designing prompts while holding an anthropomorphic assumption that expects LLMs to think like humans. From our review of 35 representative studies, we demonstrate that a goal-oriented prompt formulation, which guides LLMs to follow established human logical thinking, significantly improves the performance of LLMs. Furthermore, We introduce a novel taxonomy that categorizes goal-oriented prompting methods into five interconnected stages and we demonstrate the broad applicability of our framework by summarizing ten applicable tasks. With four future directions proposed, we hope to further emphasize and promote goal-oriented prompt engineering.
Abstract:In code search, the Generation-Augmented Retrieval (GAR) framework, which generates exemplar code snippets to augment queries, has emerged as a promising strategy to address the principal challenge of modality misalignment between code snippets and natural language queries, particularly with the demonstrated code generation capabilities of Large Language Models (LLMs). Nevertheless, our preliminary investigations indicate that the improvements conferred by such an LLM-augmented framework are somewhat constrained. This limitation could potentially be ascribed to the fact that the generated codes, albeit functionally accurate, frequently display a pronounced stylistic deviation from the ground truth code in the codebase. In this paper, we extend the foundational GAR framework and propose a simple yet effective method that additionally Rewrites the Code (ReCo) within the codebase for style normalization. Experimental results demonstrate that ReCo significantly boosts retrieval accuracy across sparse (up to 35.7%), zero-shot dense (up to 27.6%), and fine-tuned dense (up to 23.6%) retrieval settings in diverse search scenarios. To further elucidate the advantages of ReCo and stimulate research in code style normalization, we introduce Code Style Similarity, the first metric tailored to quantify stylistic similarities in code. Notably, our empirical findings reveal the inadequacy of existing metrics in capturing stylistic nuances.
Abstract:A near-field integrated sensing, positioning, and communication (ISPAC) framework is proposed, where a base station (BS) simultaneously serves multiple communication users and carries out target sensing and positioning. A novel double-array structure is proposed to enable the near-field ISPAC at the BS. Specifically, a small-scale assisting transceiver (AT) is attached to the large-scale main transceiver (MT) to empower the communication system with the ability of sensing and positioning. Based on the proposed framework, the joint angle and distance Cram\'er-Rao bound (CRB) is first derived. Then, the CRB is minimized subject to the minimum communication rate requirement in both downlink and uplink ISPAC scenarios: 1) For downlink ISPAC, a downlink target positioning algorithm is proposed and a penalty dual decomposition (PDD)-based double-loop algorithm is developed to tackle the non-convex optimization problem. 2) For uplink ISPAC, an uplink target positioning algorithm is proposed and an efficient alternating optimization algorithm is conceived to solve the non-convex CRB minimization problem with coupled user communication and target probing design. Both proposed optimization algorithms can converge to a stationary point of the CRB minimization problem. Numerical results show that: 1) The proposed ISPAC system can locate the target in both angle and distance domains merely relying on single BS and limited bandwidths; and 2) the positioning performance achieved by the hybrid-analog-and-digital ISPAC approaches that achieved by fully digital ISPAC when the communication rate requirement is not stringent.
Abstract:Recently, contrastive learning has become a key component in fine-tuning code search models for software development efficiency and effectiveness. It pulls together positive code snippets while pushing negative samples away given search queries. Among contrastive learning, InfoNCE is the most widely used loss function due to its better performance. However, the following problems in negative samples of InfoNCE may deteriorate its representation learning: 1) The existence of false negative samples in large code corpora due to duplications. 2). The failure to explicitly differentiate between the potential relevance of negative samples. As an example, a bubble sorting algorithm example is less ``negative'' than a file saving function for the quick sorting algorithm query. In this paper, we tackle the above problems by proposing a simple yet effective Soft-InfoNCE loss that inserts weight terms into InfoNCE. In our proposed loss function, we apply three methods to estimate the weights of negative pairs and show that the vanilla InfoNCE loss is a special case of Soft-InfoNCE. Theoretically, we analyze the effects of Soft-InfoNCE on controlling the distribution of learnt code representations and on deducing a more precise mutual information estimation. We furthermore discuss the superiority of proposed loss functions with other design alternatives. Extensive experiments demonstrate the effectiveness of Soft-InfoNCE and weights estimation methods under state-of-the-art code search models on a large-scale public dataset consisting of six programming languages. Source code is available at \url{https://github.com/Alex-HaochenLi/Soft-InfoNCE}.
Abstract:Reconfigurable intelligent surface (RIS) is a promising candidate technology of the upcoming Sixth Generation (6G) communication system for its ability to provide unprecedented spectral and energy efficiency increment through passive beamforming. However, it is challenging to obtain instantaneous channel state information (I-CSI) for RIS, which obliges us to use statistical channel state information (S-CSI) to achieve passive beamforming. In this paper, RIS-aided multiple-input single-output (MISO) multi-user downlink communication system with correlated channels is investigated. Then, we formulate the problem of joint beamforming design at the AP and RIS to maximize the sum ergodic spectral efficiency (ESE) of all users to improve the network capacity. Since it is too hard to compute sum ESE, an ESE approximation is adopted to reformulate the problem into a more tractable form. Then, we present two joint beamforming algorithms, namely the singular value decomposition-gradient descent (SVD-GD) algorithm and the fractional programming-gradient descent (FP-GD) algorithm. Simulation results show the effectiveness of our proposed algorithms and validate that 2-bits quantizer is enough for RIS phase shifts implementation.