Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Liang Zhao

School of Computer Science, Shenyang Aerospace University

HiReview: Hierarchical Taxonomy-Driven Automatic Literature Review Generation

Oct 02, 2024

Yuntong Hu, Zhuofeng Li, Zheng Zhang, Chen Ling, Raasikh Kanjiani, Boxin Zhao, Liang Zhao

Abstract:In this work, we present HiReview, a novel framework for hierarchical taxonomy-driven automatic literature review generation. With the exponential growth of academic documents, manual literature reviews have become increasingly labor-intensive and time-consuming, while traditional summarization models struggle to generate comprehensive document reviews effectively. Large language models (LLMs), with their powerful text processing capabilities, offer a potential solution; however, research on incorporating LLMs for automatic document generation remains limited. To address key challenges in large-scale automatic literature review generation (LRG), we propose a two-stage taxonomy-then-generation approach that combines graph-based hierarchical clustering with retrieval-augmented LLMs. First, we retrieve the most relevant sub-community within the citation network, then generate a hierarchical taxonomy tree by clustering papers based on both textual content and citation relationships. In the second stage, an LLM generates coherent and contextually accurate summaries for clusters or topics at each hierarchical level, ensuring comprehensive coverage and logical organization of the literature. Extensive experiments demonstrate that HiReview significantly outperforms state-of-the-art methods, achieving superior hierarchical organization, content relevance, and factual accuracy in automatic literature review generation tasks.

Via

Access Paper or Ask Questions

Few-Shot Class-Incremental Learning with Non-IID Decentralized Data

Sep 18, 2024

Cuiwei Liu, Siang Xu, Huaijun Qiu, Jing Zhang, Zhi Liu, Liang Zhao

Figure 1 for Few-Shot Class-Incremental Learning with Non-IID Decentralized Data

Figure 2 for Few-Shot Class-Incremental Learning with Non-IID Decentralized Data

Figure 3 for Few-Shot Class-Incremental Learning with Non-IID Decentralized Data

Figure 4 for Few-Shot Class-Incremental Learning with Non-IID Decentralized Data

Abstract:Few-shot class-incremental learning is crucial for developing scalable and adaptive intelligent systems, as it enables models to acquire new classes with minimal annotated data while safeguarding the previously accumulated knowledge. Nonetheless, existing methods deal with continuous data streams in a centralized manner, limiting their applicability in scenarios that prioritize data privacy and security. To this end, this paper introduces federated few-shot class-incremental learning, a decentralized machine learning paradigm tailored to progressively learn new classes from scarce data distributed across multiple clients. In this learning paradigm, clients locally update their models with new classes while preserving data privacy, and then transmit the model updates to a central server where they are aggregated globally. However, this paradigm faces several issues, such as difficulties in few-shot learning, catastrophic forgetting, and data heterogeneity. To address these challenges, we present a synthetic data-driven framework that leverages replay buffer data to maintain existing knowledge and facilitate the acquisition of new knowledge. Within this framework, a noise-aware generative replay module is developed to fine-tune local models with a balance of new and replay data, while generating synthetic data of new classes to further expand the replay buffer for future tasks. Furthermore, a class-specific weighted aggregation strategy is designed to tackle data heterogeneity by adaptively aggregating class-specific parameters based on local models performance on synthetic data. This enables effective global model optimization without direct access to client data. Comprehensive experiments across three widely-used datasets underscore the effectiveness and preeminence of the introduced framework.

Via

Access Paper or Ask Questions

A Unified Approach to Inferring Chemical Compounds with the Desired Aqueous Solubility

Sep 06, 2024

Muniba Batool, Naveed Ahmed Azam, Jianshen Zhu, Kazuya Haraguchi, Liang Zhao, Tatsuya Akutsu

Figure 1 for A Unified Approach to Inferring Chemical Compounds with the Desired Aqueous Solubility

Figure 2 for A Unified Approach to Inferring Chemical Compounds with the Desired Aqueous Solubility

Figure 3 for A Unified Approach to Inferring Chemical Compounds with the Desired Aqueous Solubility

Figure 4 for A Unified Approach to Inferring Chemical Compounds with the Desired Aqueous Solubility

Abstract:Aqueous solubility (AS) is a key physiochemical property that plays a crucial role in drug discovery and material design. We report a novel unified approach to predict and infer chemical compounds with the desired AS based on simple deterministic graph-theoretic descriptors, multiple linear regression (MLR) and mixed integer linear programming (MILP). Selected descriptors based on a forward stepwise procedure enabled the simplest regression model, MLR, to achieve significantly good prediction accuracy compared to the existing approaches, achieving the accuracy in the range [0.7191, 0.9377] for 29 diverse datasets. By simulating these descriptors and learning models as MILPs, we inferred mathematically exact and optimal compounds with the desired AS, prescribed structures, and up to 50 non-hydrogen atoms in a reasonable time range [6, 1204] seconds. These findings indicate a strong correlation between the simple graph-theoretic descriptors and the AS of compounds, potentially leading to a deeper understanding of their AS without relying on widely used complicated chemical descriptors and complex machine learning models that are computationally expensive, and therefore difficult to use for inference. An implementation of the proposed approach is available at https://github.com/ku-dml/mol-infer/tree/master/AqSol.

Via

Access Paper or Ask Questions

LibMOON: A Gradient-based MultiObjective OptimizatioN Library in PyTorch

Sep 04, 2024

Xiaoyuan Zhang, Liang Zhao, Yingying Yu, Xi Lin, Zhenkun Wang, Han Zhao, Qingfu Zhang

Figure 1 for LibMOON: A Gradient-based MultiObjective OptimizatioN Library in PyTorch

Figure 2 for LibMOON: A Gradient-based MultiObjective OptimizatioN Library in PyTorch

Figure 3 for LibMOON: A Gradient-based MultiObjective OptimizatioN Library in PyTorch

Figure 4 for LibMOON: A Gradient-based MultiObjective OptimizatioN Library in PyTorch

Abstract:Multiobjective optimization problems (MOPs) are prevalent in machine learning, with applications in multi-task learning, learning under fairness or robustness constraints, etc. Instead of reducing multiple objective functions into a scalar objective, MOPs aim to optimize for the so-called Pareto optimality or Pareto set learning, which involves optimizing more than one objective function simultaneously, over models with millions of parameters. Existing benchmark libraries for MOPs mainly focus on evolutionary algorithms, most of which are zeroth-order methods that do not effectively utilize higher-order information from objectives and cannot scale to large-scale models with millions of parameters. In light of the above gap, this paper introduces LibMOON, the first multiobjective optimization library that supports state-of-the-art gradient-based methods, provides a fair benchmark, and is open-sourced for the community.

Via

Access Paper or Ask Questions

4D-CAT: Synthesis of 4D Coronary Artery Trees from Systole and Diastole

Sep 03, 2024

Daosong Hu, Ruomeng Wang, Liang Zhao, Mingyue Cui, Song Ding, Kai Huang

Figure 1 for 4D-CAT: Synthesis of 4D Coronary Artery Trees from Systole and Diastole

Figure 2 for 4D-CAT: Synthesis of 4D Coronary Artery Trees from Systole and Diastole

Figure 3 for 4D-CAT: Synthesis of 4D Coronary Artery Trees from Systole and Diastole

Figure 4 for 4D-CAT: Synthesis of 4D Coronary Artery Trees from Systole and Diastole

Abstract:The three-dimensional vascular model reconstructed from CT images is widely used in medical diagnosis. At different phases, the beating of the heart can cause deformation of vessels, resulting in different vascular imaging states and false positive diagnostic results. The 4D model can simulate a complete cardiac cycle. Due to the dose limitation of contrast agent injection in patients, it is valuable to synthesize a 4D coronary artery trees through finite phases imaging. In this paper, we propose a method for generating a 4D coronary artery trees, which maps the systole to the diastole through deformation field prediction, interpolates on the timeline, and the motion trajectory of points are obtained. Specifically, the centerline is used to represent vessels and to infer deformation fields using cube-based sorting and neural networks. Adjacent vessel points are aggregated and interpolated based on the deformation field of the centerline point to obtain displacement vectors of different phases. Finally, the proposed method is validated through experiments to achieve the registration of non-rigid vascular points and the generation of 4D coronary trees.

Via

Access Paper or Ask Questions

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Sep 03, 2024

Haoran Wei, Chenglong Liu, Jinyue Chen, Jia Wang, Lingyu Kong, Yanming Xu, Zheng Ge, Liang Zhao, Jianjian Sun, Yuang Peng(+2 more)

Figure 1 for General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Figure 2 for General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Figure 3 for General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Figure 4 for General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Abstract:Traditional OCR systems (OCR-1.0) are increasingly unable to meet people's usage due to the growing demand for intelligent processing of man-made optical characters. In this paper, we collectively refer to all artificial optical signals (e.g., plain texts, math/molecular formulas, tables, charts, sheet music, and even geometric shapes) as "characters" and propose the General OCR Theory along with an excellent model, namely GOT, to promote the arrival of OCR-2.0. The GOT, with 580M parameters, is a unified, elegant, and end-to-end model, consisting of a high-compression encoder and a long-contexts decoder. As an OCR-2.0 model, GOT can handle all the above "characters" under various OCR tasks. On the input side, the model supports commonly used scene- and document-style images in slice and whole-page styles. On the output side, GOT can generate plain or formatted results (markdown/tikz/smiles/kern) via an easy prompt. Besides, the model enjoys interactive OCR features, i.e., region-level recognition guided by coordinates or colors. Furthermore, we also adapt dynamic resolution and multi-page OCR technologies to GOT for better practicality. In experiments, we provide sufficient results to prove the superiority of our model.

Via

Access Paper or Ask Questions

Representation Learning of Geometric Trees

Aug 16, 2024

Zheng Zhang, Allen Zhang, Ruth Nelson, Giorgio Ascoli, Liang Zhao

Abstract:Geometric trees are characterized by their tree-structured layout and spatially constrained nodes and edges, which significantly impacts their topological attributes. This inherent hierarchical structure plays a crucial role in domains such as neuron morphology and river geomorphology, but traditional graph representation methods often overlook these specific characteristics of tree structures. To address this, we introduce a new representation learning framework tailored for geometric trees. It first features a unique message passing neural network, which is both provably geometrical structure-recoverable and rotation-translation invariant. To address the data label scarcity issue, our approach also includes two innovative training targets that reflect the hierarchical ordering and geometric structure of these geometric trees. This enables fully self-supervised learning without explicit labels. We validate our method's effectiveness on eight real-world datasets, demonstrating its capability to represent geometric trees.

Via

Access Paper or Ask Questions

Cycle-Configuration: A Novel Graph-theoretic Descriptor Set for Molecular Inference

Aug 09, 2024

Bowen Song, Jianshen Zhu, Naveed Ahmed Azam, Kazuya Haraguchi, Liang Zhao, Tatsuya Akutsu

Abstract:In this paper, we propose a novel family of descriptors of chemical graphs, named cycle-configuration (CC), that can be used in the standard "two-layered (2L) model" of mol-infer, a molecular inference framework based on mixed integer linear programming (MILP) and machine learning (ML). Proposed descriptors capture the notion of ortho/meta/para patterns that appear in aromatic rings, which has been impossible in the framework so far. Computational experiments show that, when the new descriptors are supplied, we can construct prediction functions of similar or better performance for all of the 27 tested chemical properties. We also provide an MILP formulation that asks for a chemical graph with desired properties under the 2L model with CC descriptors (2L+CC model). We show that a chemical graph with up to 50 non-hydrogen vertices can be inferred in a practical time.

Via

Access Paper or Ask Questions

Contrastive Learning for Image Complexity Representation

Aug 06, 2024

Shipeng Liu, Liang Zhao, Dengfeng Chen, Zhanping Song

Figure 1 for Contrastive Learning for Image Complexity Representation

Figure 2 for Contrastive Learning for Image Complexity Representation

Figure 3 for Contrastive Learning for Image Complexity Representation

Figure 4 for Contrastive Learning for Image Complexity Representation

Abstract:Quantifying and evaluating image complexity can be instrumental in enhancing the performance of various computer vision tasks. Supervised learning can effectively learn image complexity features from well-annotated datasets. However, creating such datasets requires expensive manual annotation costs. The models may learn human subjective biases from it. In this work, we introduce the MoCo v2 framework. We utilize contrastive learning to represent image complexity, named CLIC (Contrastive Learning for Image Complexity). We find that there are complexity differences between different local regions of an image, and propose Random Crop and Mix (RCM), which can produce positive samples consisting of multi-scale local crops. RCM can also expand the train set and increase data diversity without introducing additional data. We conduct extensive experiments with CLIC, comparing it with both unsupervised and supervised methods. The results demonstrate that the performance of CLIC is comparable to that of state-of-the-art supervised methods. In addition, we establish the pipelines that can apply CLIC to computer vision tasks to effectively improve their performance.

Via

Access Paper or Ask Questions

CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction

Jul 23, 2024

Liang Zhao, Qing Guo, Xiaoguang Li, Song Wang

Figure 1 for CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction

Figure 2 for CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction

Figure 3 for CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction

Figure 4 for CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction

Abstract:Image inpainting aims to fill missing pixels in damaged images and has achieved significant progress with cut-edging learning techniques. Nevertheless, state-of-the-art inpainting methods are mainly designed for nature images and cannot correctly recover text within scene text images, and training existing models on the scene text images cannot fix the issues. In this work, we identify the visual-text inpainting task to achieve high-quality scene text image restoration and text completion: Given a scene text image with unknown missing regions and the corresponding text with unknown missing characters, we aim to complete the missing information in both images and text by leveraging their complementary information. Intuitively, the input text, even if damaged, contains language priors of the contents within the images and can guide the image inpainting. Meanwhile, the scene text image includes the appearance cues of the characters that could benefit text recovery. To this end, we design the cross-modal predictive interaction (CLII) model containing two branches, i.e., ImgBranch and TxtBranch, for scene text inpainting and text completion, respectively while leveraging their complementary effectively. Moreover, we propose to embed our model into the SOTA scene text spotting method and significantly enhance its robustness against missing pixels, which demonstrates the practicality of the newly developed task. To validate the effectiveness of our method, we construct three real datasets based on existing text-related datasets, containing 1838 images and covering three scenarios with curved, incidental, and styled texts, and conduct extensive experiments to show that our method outperforms baselines significantly.

Via

Access Paper or Ask Questions