Abstract:Artificial intelligence is poised to transform teaching and coaching practices,yet its optimal role alongside human expertise remains unclear.This study investigates human and AI collaboration in fitness education through a one year qualitative case study with a Pilates instructor.The researcher participated in the instructor classes and conducted biweekly semi structured interviews to explore how generative AI could be integrated into class planning and instruction.
Abstract:Sybil attacks pose a significant security threat to blockchain ecosystems, particularly in token airdrop events. This paper proposes a novel sybil address identification method based on subgraph feature extraction lightGBM. The method first constructs a two-layer deep transaction subgraph for each address, then extracts key event operation features according to the lifecycle of sybil addresses, including the time of first transaction, first gas acquisition, participation in airdrop activities, and last transaction. These temporal features effectively capture the consistency of sybil address behavior operations. Additionally, the method extracts amount and network structure features, comprehensively describing address behavior patterns and network topology through feature propagation and fusion. Experiments conducted on a dataset containing 193,701 addresses (including 23,240 sybil addresses) show that this method outperforms existing approaches in terms of precision, recall, F1 score, and AUC, with all metrics exceeding 0.9. The methods and results of this study can be further applied to broader blockchain security areas such as transaction manipulation identification and token liquidity risk assessment, contributing to the construction of a more secure and fair blockchain ecosystem.
Abstract:Typical dynamic ST data includes trajectory data (representing individual-level mobility) and traffic state data (representing population-level mobility). Traditional studies often treat trajectory and traffic state data as distinct, independent modalities, each tailored to specific tasks within a single modality. However, real-world applications, such as navigation apps, require joint analysis of trajectory and traffic state data. Treating these data types as two separate domains can lead to suboptimal model performance. Although recent advances in ST data pre-training and ST foundation models aim to develop universal models for ST data analysis, most existing models are "multi-task, solo-data modality" (MTSM), meaning they can handle multiple tasks within either trajectory data or traffic state data, but not both simultaneously. To address this gap, this paper introduces BIGCity, the first multi-task, multi-data modality (MTMD) model for ST data analysis. The model targets two key challenges in designing an MTMD ST model: (1) unifying the representations of different ST data modalities, and (2) unifying heterogeneous ST analysis tasks. To overcome the first challenge, BIGCity introduces a novel ST-unit that represents both trajectories and traffic states in a unified format. Additionally, for the second challenge, BIGCity adopts a tunable large model with ST task-oriented prompt, enabling it to perform a range of heterogeneous tasks without the need for fine-tuning. Extensive experiments on real-world datasets demonstrate that BIGCity achieves state-of-the-art performance across 8 tasks, outperforming 18 baselines. To the best of our knowledge, BIGCity is the first model capable of handling both trajectories and traffic states for diverse heterogeneous tasks. Our code are available at https://github.com/bigscity/BIGCity
Abstract:This study aims to investigate whether GPT-4 can effectively grade assignments for design university students and provide useful feedback. In design education, assignments do not have a single correct answer and often involve solving an open-ended design problem. This subjective nature of design projects often leads to grading problems,as grades can vary between different raters,for instance instructor from engineering background or architecture background. This study employs an iterative research approach in developing a Custom GPT with the aim of achieving more reliable results and testing whether it can provide design students with constructive feedback. The findings include: First,through several rounds of iterations the inter-reliability between GPT and human raters reached a level that is generally accepted by educators. This indicates that by providing accurate prompts to GPT,and continuously iterating to build a Custom GPT, it can be used to effectively grade students' design assignments, serving as a reliable complement to human raters. Second, the intra-reliability of GPT's scoring at different times is between 0.65 and 0.78. This indicates that, with adequate instructions, a Custom GPT gives consistent results which is a precondition for grading students. As consistency and comparability are the two main rules to ensure the reliability of educational assessment, this study has looked at whether a Custom GPT can be developed that adheres to these two rules. We finish the paper by testing whether Custom GPT can provide students with useful feedback and reflecting on how educators can develop and iterate a Custom GPT to serve as a complementary rater.
Abstract:ChatGPT has been used in several educational contexts,including learning, teaching and research. It also has potential to conduct the systematic literature review (SLR). However, there are limited empirical studies on how to use ChatGPT in conducting a SLR. Based on a SLR published,this study used ChatGPT to conduct a SLR of the same 33 papers in a design-based approach, to see what the differences are by comparing the reviews' results,and to answer: To what extent can ChatGPT conduct SLR? What strategies can human researchers utilize to structure prompts for ChatGPT that enhance the reliability and validity of a SLR? This study found that ChatGPT could conduct a SLR. It needs detailed and accurate prompts to analyze the literature. It also has limitations. Guiding principles are summarized from this study for researchers to follow when they need to conduct SLRs using ChatGPT.
Abstract:Point cloud sequence-based 3D action recognition has achieved impressive performance and efficiency. However, existing point cloud sequence modeling methods cannot adequately balance the precision of limb micro-movements with the integrity of posture macro-structure, leading to the loss of crucial information cues in action inference. To overcome this limitation, we introduce D-Hyperpoint, a novel data type generated through a D-Hyperpoint Embedding module. D-Hyperpoint encapsulates both regional-momentary motion and global-static posture, effectively summarizing the unit human action at each moment. In addition, we present a D-Hyperpoint KANsMixer module, which is recursively applied to nested groupings of D-Hyperpoints to learn the action discrimination information and creatively integrates Kolmogorov-Arnold Networks (KAN) to enhance spatio-temporal interaction within D-Hyperpoints. Finally, we propose KAN-HyperpointNet, a spatio-temporal decoupled network architecture for 3D action recognition. Extensive experiments on two public datasets: MSR Action3D and NTU-RGB+D 60, demonstrate the state-of-the-art performance of our method.
Abstract:Large language model (LLM) agents have demonstrated impressive capability in utilizing external tools and knowledge to boost accuracy and reduce hallucinations. However, developing the prompting techniques that make LLM agents able to effectively use external tools and knowledge is a heuristic and laborious task. Here, we introduce AvaTaR, a novel and automatic framework that optimizes an LLM agent to effectively use the provided tools and improve its performance on a given task/domain. During optimization, we design a comparator module to iteratively provide insightful and holistic prompts to the LLM agent via reasoning between positive and negative examples sampled from training data. We demonstrate AvaTaR on four complex multimodal retrieval datasets featuring textual, visual, and relational information. We find AvaTaR consistently outperforms state-of-the-art approaches across all four challenging tasks and exhibits strong generalization ability when applied to novel cases, achieving an average relative improvement of 14% on the Hit@1 metric. Code and dataset are available at https://github.com/zou-group/avatar.
Abstract:Agents based on large language models have shown great potential in accelerating scientific discovery by leveraging their rich background knowledge and reasoning capabilities. Here, we develop BioDiscoveryAgent, an agent that designs new experiments, reasons about their outcomes, and efficiently navigates the hypothesis space to reach desired solutions. We demonstrate our agent on the problem of designing genetic perturbation experiments, where the aim is to find a small subset out of many possible genes that, when perturbed, result in a specific phenotype (e.g., cell growth). Utilizing its biological knowledge, BioDiscoveryAgent can uniquely design new experiments without the need to train a machine learning model or explicitly design an acquisition function. Moreover, BioDiscoveryAgent achieves an average of 18% improvement in detecting desired phenotypes across five datasets, compared to existing Bayesian optimization baselines specifically trained for this task. Our evaluation includes one dataset that is unpublished, ensuring it is not part of the language model's training data. Additionally, BioDiscoveryAgent predicts gene combinations to perturb twice as accurately as a random baseline, a task so far not explored in the context of closed-loop experiment design. The agent also has access to tools for searching the biomedical literature, executing code to analyze biological datasets, and prompting another agent to critically evaluate its predictions. Overall, BioDiscoveryAgent is interpretable at every stage, representing an accessible new paradigm in the computational design of biological experiments with the potential to augment scientists' capabilities.
Abstract:Answering real-world user queries, such as product search, often requires accurate retrieval of information from semi-structured knowledge bases or databases that involve blend of unstructured (e.g., textual descriptions of products) and structured (e.g., entity relations of products) information. However, previous works have mostly studied textual and relational retrieval tasks as separate topics. To address the gap, we develop STARK, a large-scale Semi-structure retrieval benchmark on Textual and Relational Knowledge Bases. We design a novel pipeline to synthesize natural and realistic user queries that integrate diverse relational information and complex textual properties, as well as their ground-truth answers. Moreover, we rigorously conduct human evaluation to validate the quality of our benchmark, which covers a variety of practical applications, including product recommendations, academic paper searches, and precision medicine inquiries. Our benchmark serves as a comprehensive testbed for evaluating the performance of retrieval systems, with an emphasis on retrieval approaches driven by large language models (LLMs). Our experiments suggest that the STARK datasets present significant challenges to the current retrieval and LLM systems, indicating the demand for building more capable retrieval systems that can handle both textual and relational aspects.
Abstract:Informationization is a prevailing trend in today's world. The increasing demand for information in decision-making processes poses significant challenges for investigation activities, particularly in terms of effectively allocating limited resources to plan investigation programs. This paper addresses the investigation path planning problem by formulating it as a multi-traveling salesman problem (MTSP). Our objective is to minimize costs, and to achieve this, we propose a chaotic artificial fish swarm algorithm based on multiple population differential evolution (DE-CAFSA). To overcome the limitations of the artificial fish swarm algorithm, such as low optimization accuracy and the inability to consider global and local information, we incorporate adaptive field of view and step size adjustments, replace random behavior with the 2-opt operation, and introduce chaos theory and sub-optimal solutions to enhance optimization accuracy and search performance. Additionally, we integrate the differential evolution algorithm to create a hybrid algorithm that leverages the complementary advantages of both approaches. Experimental results demonstrate that DE-CAFSA outperforms other algorithms on various public datasets of different sizes, as well as showcasing excellent performance on the examples proposed in this study.