Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhi Chen

Spring

Phononic materials with effectively scale-separated hierarchical features using interpretable machine learning

Aug 15, 2024

Mary V. Bastawrous, Zhi Chen, Alexander C. Ogren, Chiara Daraio, Cynthia Rudin, L. Catherine Brinson

Abstract:Manipulating the dispersive characteristics of vibrational waves is beneficial for many applications, e.g., high-precision instruments. architected hierarchical phononic materials have sparked promise tunability of elastodynamic waves and vibrations over multiple frequency ranges. In this article, hierarchical unit-cells are obtained, where features at each length scale result in a band gap within a targeted frequency range. Our novel approach, the ``hierarchical unit-cell template method,'' is an interpretable machine-learning approach that uncovers global unit-cell shape/topology patterns corresponding to predefined band-gap objectives. A scale-separation effect is observed where the coarse-scale band-gap objective is mostly unaffected by the fine-scale features despite the closeness of their length scales, thus enabling an efficient hierarchical algorithm. Moreover, the hierarchical patterns revealed are not predefined or self-similar hierarchies as common in current hierarchical phononic materials. Thus, our approach offers a flexible and efficient method for the exploration of new regions in the hierarchical design space, extracting minimal effective patterns for inverse design in applications targeting multiple frequency ranges.

Via

Access Paper or Ask Questions

On the Element-Wise Representation and Reasoning in Zero-Shot Image Recognition: A Systematic Survey

Aug 09, 2024

Jingcai Guo, Zhijie Rao, Zhi Chen, Song Guo, Jingren Zhou, Dacheng Tao

Abstract:Zero-shot image recognition (ZSIR) aims at empowering models to recognize and reason in unseen domains via learning generalized knowledge from limited data in the seen domain. The gist for ZSIR is to execute element-wise representation and reasoning from the input visual space to the target semantic space, which is a bottom-up modeling paradigm inspired by the process by which humans observe the world, i.e., capturing new concepts by learning and combining the basic components or shared characteristics. In recent years, element-wise learning techniques have seen significant progress in ZSIR as well as widespread application. However, to the best of our knowledge, there remains a lack of a systematic overview of this topic. To enrich the literature and provide a sound basis for its future development, this paper presents a broad review of recent advances in element-wise ZSIR. Concretely, we first attempt to integrate the three basic ZSIR tasks of object recognition, compositional recognition, and foundation model-based open-world recognition into a unified element-wise perspective and provide a detailed taxonomy and analysis of the main research approaches. Then, we collect and summarize some key information and benchmarks, such as detailed technical implementations and common datasets. Finally, we sketch out the wide range of its related applications, discuss vital challenges, and suggest potential future directions.

* 24 pages, 7 figures

Via

Access Paper or Ask Questions

FastEdit: Fast Text-Guided Single-Image Editing via Semantic-Aware Diffusion Fine-Tuning

Aug 06, 2024

Zhi Chen, Zecheng Zhao, Yadan Luo, Zi Huang

Figure 1 for FastEdit: Fast Text-Guided Single-Image Editing via Semantic-Aware Diffusion Fine-Tuning

Figure 2 for FastEdit: Fast Text-Guided Single-Image Editing via Semantic-Aware Diffusion Fine-Tuning

Figure 3 for FastEdit: Fast Text-Guided Single-Image Editing via Semantic-Aware Diffusion Fine-Tuning

Figure 4 for FastEdit: Fast Text-Guided Single-Image Editing via Semantic-Aware Diffusion Fine-Tuning

Abstract:Conventional Text-guided single-image editing approaches require a two-step process, including fine-tuning the target text embedding for over 1K iterations and the generative model for another 1.5K iterations. Although it ensures that the resulting image closely aligns with both the input image and the target text, this process often requires 7 minutes per image, posing a challenge for practical application due to its time-intensive nature. To address this bottleneck, we introduce FastEdit, a fast text-guided single-image editing method with semantic-aware diffusion fine-tuning, dramatically accelerating the editing process to only 17 seconds. FastEdit streamlines the generative model's fine-tuning phase, reducing it from 1.5K to a mere 50 iterations. For diffusion fine-tuning, we adopt certain time step values based on the semantic discrepancy between the input image and target text. Furthermore, FastEdit circumvents the initial fine-tuning step by utilizing an image-to-image model that conditions on the feature space, rather than the text embedding space. It can effectively align the target text prompt and input image within the same feature space and save substantial processing time. Additionally, we apply the parameter-efficient fine-tuning technique LoRA to U-net. With LoRA, FastEdit minimizes the model's trainable parameters to only 0.37\% of the original size. At the same time, we can achieve comparable editing outcomes with significantly reduced computational overhead. We conduct extensive experiments to validate the editing performance of our approach and show promising editing capabilities, including content addition, style transfer, background replacement, and posture manipulation, etc.

* Technical Report

Via

Access Paper or Ask Questions

Benchmarking In-the-wild Multimodal Disease Recognition and A Versatile Baseline

Aug 06, 2024

Tianqi Wei, Zhi Chen, Zi Huang, Xin Yu

Abstract:Existing plant disease classification models have achieved remarkable performance in recognizing in-laboratory diseased images. However, their performance often significantly degrades in classifying in-the-wild images. Furthermore, we observed that in-the-wild plant images may exhibit similar appearances across various diseases (i.e., small inter-class discrepancy) while the same diseases may look quite different (i.e., large intra-class variance). Motivated by this observation, we propose an in-the-wild multimodal plant disease recognition dataset that contains the largest number of disease classes but also text-based descriptions for each disease. Particularly, the newly provided text descriptions are introduced to provide rich information in textual modality and facilitate in-the-wild disease classification with small inter-class discrepancy and large intra-class variance issues. Therefore, our proposed dataset can be regarded as an ideal testbed for evaluating disease recognition methods in the real world. In addition, we further present a strong yet versatile baseline that models text descriptions and visual data through multiple prototypes for a given class. By fusing the contributions of multimodal prototypes in classification, our baseline can effectively address the small inter-class discrepancy and large intra-class variance issues. Remarkably, our baseline model can not only classify diseases but also recognize diseases in few-shot or training-free scenarios. Extensive benchmarking results demonstrate that our proposed in-the-wild multimodal dataset sets many new challenges to the plant disease recognition task and there is a large space to improve for future works.

Via

Access Paper or Ask Questions

Flexible Beam Coverage Optimization for Movable-Antenna Array

Aug 01, 2024

Dong Wang, Weidong Mei, Boyu Ning, Zhi Chen

Figure 1 for Flexible Beam Coverage Optimization for Movable-Antenna Array

Figure 2 for Flexible Beam Coverage Optimization for Movable-Antenna Array

Figure 3 for Flexible Beam Coverage Optimization for Movable-Antenna Array

Figure 4 for Flexible Beam Coverage Optimization for Movable-Antenna Array

Abstract:Fluid antennas (FAs) and movable antennas (MAs) have attracted increasing attention in wireless communications recently. As compared to the conventional fixed-position antennas (FPAs), their geometry can be dynamically reconfigured, such that more flexible beamforming can be achieved for signal coverage and/or interference nulling. In this paper, we investigate the use of MAs to achieve uniform coverage for multiple regions with arbitrary number and width in the spatial domain. In particular, we aim to jointly optimize the MAs weights and positions within a linear array to maximize the minimum beam gain over the desired spatial regions. However, the resulting problem is non-convex and difficult to be optimally solved. To tackle this difficulty, we propose an alternating optimization (AO) algorithm to obtain a high-quality suboptimal solution, where the MAs weights and positions are alternately optimized by applying successive convex approximation (SCA) technique. Numerical results show that our proposed MAbased beam coverage scheme can achieve much better performance than conventional FPAs.

Via

Access Paper or Ask Questions

Movable-Antenna Position Optimization for Physical-Layer Security via Discrete Sampling

Aug 01, 2024

Weidong Mei, Xin Wei, Yijie Liu, Boyu Ning, Zhi Chen

Figure 1 for Movable-Antenna Position Optimization for Physical-Layer Security via Discrete Sampling

Figure 2 for Movable-Antenna Position Optimization for Physical-Layer Security via Discrete Sampling

Figure 3 for Movable-Antenna Position Optimization for Physical-Layer Security via Discrete Sampling

Figure 4 for Movable-Antenna Position Optimization for Physical-Layer Security via Discrete Sampling

Abstract:Fluid antennas (FAs) and mobile antennas (MAs) are innovative technologies in wireless communications that are able to proactively improve channel conditions by dynamically adjusting the transmit/receive antenna positions within a given spatial region. In this paper, we investigate an MA-enhanced multiple-input single-output (MISO) secure communication system, aiming to maximize the secrecy rate by jointly optimizing the positions of multiple MAs. Instead of continuously searching for the optimal MA positions as in prior works, we propose to discretize the transmit region into multiple sampling points, thereby converting the continuous antenna position optimization into a discrete sampling point selection problem. However, this point selection problem is combinatory and thus difficult to be optimally solved. To tackle this challenge, we ingeniously transform this combinatory problem into a recursive path selection problem in graph theory and propose a partial enumeration algorithm to obtain its optimal solution without the need for high-complexity exhaustive search. To further reduce the complexity, a linear-time sequential update algorithm is also proposed to obtain a high-quality suboptimal solution. Numerical results show that our proposed algorithms yield much higher secrecy rates as compared to the conventional FPA and other baseline schemes.

* This paper is accepted by IEEE Globecom 2024. arXiv admin note: substantial text overlap with arXiv:2403.16886

Via

Access Paper or Ask Questions

FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making

Jul 10, 2024

Yangyang Yu, Zhiyuan Yao, Haohang Li, Zhiyang Deng, Yupeng Cao, Zhi Chen, Jordan W. Suchow, Rong Liu, Zhenyu Cui, Denghui Zhang(+6 more)

Figure 1 for FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making

Figure 2 for FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making

Figure 3 for FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making

Figure 4 for FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making

Abstract:Large language models (LLMs) have demonstrated notable potential in conducting complex tasks and are increasingly utilized in various financial applications. However, high-quality sequential financial investment decision-making remains challenging. These tasks require multiple interactions with a volatile environment for every decision, demanding sufficient intelligence to maximize returns and manage risks. Although LLMs have been used to develop agent systems that surpass human teams and yield impressive investment returns, opportunities to enhance multi-sourced information synthesis and optimize decision-making outcomes through timely experience refinement remain unexplored. Here, we introduce the FinCon, an LLM-based multi-agent framework with CONceptual verbal reinforcement tailored for diverse FINancial tasks. Inspired by effective real-world investment firm organizational structures, FinCon utilizes a manager-analyst communication hierarchy. This structure allows for synchronized cross-functional agent collaboration towards unified goals through natural language interactions and equips each agent with greater memory capacity than humans. Additionally, a risk-control component in FinCon enhances decision quality by episodically initiating a self-critiquing mechanism to update systematic investment beliefs. The conceptualized beliefs serve as verbal reinforcement for the future agent's behavior and can be selectively propagated to the appropriate node that requires knowledge updates. This feature significantly improves performance while reducing unnecessary peer-to-peer communication costs. Moreover, FinCon demonstrates strong generalization capabilities in various financial tasks, including single stock trading and portfolio management.

* LLM Applications, LLM Agents, Financial Technology, Quantitative Finance, Algorithmic Trading, Cognitive Science

Via

Access Paper or Ask Questions

CatMemo at the FinLLM Challenge Task: Fine-Tuning Large Language Models using Data Fusion in Financial Applications

Jul 02, 2024

Yupeng Cao, Zhiyuan Yao, Zhi Chen, Zhiyang Deng

Abstract:The integration of Large Language Models (LLMs) into financial analysis has garnered significant attention in the NLP community. This paper presents our solution to IJCAI-2024 FinLLM challenge, investigating the capabilities of LLMs within three critical areas of financial tasks: financial classification, financial text summarization, and single stock trading. We adopted Llama3-8B and Mistral-7B as base models, fine-tuning them through Parameter Efficient Fine-Tuning (PEFT) and Low-Rank Adaptation (LoRA) approaches. To enhance model performance, we combine datasets from task 1 and task 2 for data fusion. Our approach aims to tackle these diverse tasks in a comprehensive and integrated manner, showcasing LLMs' capacity to address diverse and complex financial tasks with improved accuracy and decision-making capabilities.

Via

Access Paper or Ask Questions

Joint Beamforming and Antenna Position Optimization for Movable Antenna-Assisted Spectrum Sharing

Jun 28, 2024

Xin Wei, Weidong Mei, Dong Wang, Boyu Ning, Zhi Chen

Figure 1 for Joint Beamforming and Antenna Position Optimization for Movable Antenna-Assisted Spectrum Sharing

Figure 2 for Joint Beamforming and Antenna Position Optimization for Movable Antenna-Assisted Spectrum Sharing

Abstract:Fluid antennas (FAs) and movable antennas (MAs) have drawn increasing attention in wireless communications recently due to their ability to create favorable channel conditions via local antenna movement within a confined region. In this letter, we advance their application for cognitive radio to facilitate efficient spectrum sharing between primary and secondary communication systems. In particular, we aim to jointly optimize the transmit beamforming and MA positions at a secondary transmitter (ST) to maximize the received signal power at a secondary receiver (SR) subject to the constraints on its imposed co-channel interference power with multiple primary receivers (PRs). However, such an optimization problem is difficult to be optimally solved due to the highly nonlinear functions of the received signal/interference power at the SR/all PRs in terms of the MA positions. To drive useful insights, we first perform theoretical analyses to unveil MAs' capability to achieve maximum-ratio transmission with the SR and effective interference mitigation for all PRs at the same time. To solve the MA position optimization problem, we propose an alternating optimization (AO) algorithm to obtain a high-quality suboptimal solution. Numerical results demonstrate that our proposed algorithms can significantly outperform the conventional fixed-position antennas (FPAs) and other baseline schemes.

Via

Access Paper or Ask Questions

DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection

Jun 21, 2024

Jia Syuen Lim, Zhuoxiao Chen, Mahsa Baktashmotlagh, Zhi Chen, Xin Yu, Zi Huang, Yadan Luo

Figure 1 for DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection

Figure 2 for DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection

Figure 3 for DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection

Figure 4 for DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection

Abstract:Class-agnostic object detection (OD) can be a cornerstone or a bottleneck for many downstream vision tasks. Despite considerable advancements in bottom-up and multi-object discovery methods that leverage basic visual cues to identify salient objects, consistently achieving a high recall rate remains difficult due to the diversity of object types and their contextual complexity. In this work, we investigate using vision-language models (VLMs) to enhance object detection via a self-supervised prompt learning strategy. Our initial findings indicate that manually crafted text queries often result in undetected objects, primarily because detection confidence diminishes when the query words exhibit semantic overlap. To address this, we propose a Dispersing Prompt Expansion (DiPEx) approach. DiPEx progressively learns to expand a set of distinct, non-overlapping hyperspherical prompts to enhance recall rates, thereby improving performance in downstream tasks such as out-of-distribution OD. Specifically, DiPEx initiates the process by self-training generic parent prompts and selecting the one with the highest semantic uncertainty for further expansion. The resulting child prompts are expected to inherit semantics from their parent prompts while capturing more fine-grained semantics. We apply dispersion losses to ensure high inter-class discrepancy among child prompts while preserving semantic consistency between parent-child prompt pairs. To prevent excessive growth of the prompt sets, we utilize the maximum angular coverage (MAC) of the semantic space as a criterion for early termination. We demonstrate the effectiveness of DiPEx through extensive class-agnostic OD and OOD-OD experiments on MS-COCO and LVIS, surpassing other prompting methods by up to 20.1% in AR and achieving a 21.3% AP improvement over SAM. The code is available at https://github.com/jason-lim26/DiPEx.

* 19 pages

Via

Access Paper or Ask Questions