In the rapidly evolving area of image synthesis, a serious challenge is the presence of complex artifacts that compromise perceptual realism of synthetic images. To alleviate artifacts and improve quality of synthetic images, we fine-tune Vision-Language Model (VLM) as artifact classifier to automatically identify and classify a wide range of artifacts and provide supervision for further optimizing generative models. Specifically, we develop a comprehensive artifact taxonomy and construct a dataset of synthetic images with artifact annotations for fine-tuning VLM, named SynArtifact-1K. The fine-tuned VLM exhibits superior ability of identifying artifacts and outperforms the baseline by 25.66%. To our knowledge, this is the first time such end-to-end artifact classification task and solution have been proposed. Finally, we leverage the output of VLM as feedback to refine the generative model for alleviating artifacts. Visualization results and user study demonstrate that the quality of images synthesized by the refined diffusion model has been obviously improved.
Since the traffic administration at road intersections determines the capacity bottleneck of modern transportation systems, intelligent cooperative coordination for connected autonomous vehicles (CAVs) has shown to be an effective solution. In this paper, we try to formulate a Bi-Level CAV intersection coordination framework, where coordinators from High and Low levels are tightly coupled. In the High-Level coordinator where vehicles from multiple roads are involved, we take various metrics including throughput, safety, fairness and comfort into consideration. Motivated by the time consuming space-time resource allocation framework in [1], we try to give a low complexity solution by transforming the complicated original problem into a sequential linear programming one. Based on the "feasible tunnels" (FT) generated from the High-Level coordinator, we then propose a rapid gradient-based trajectory optimization strategy in the Low-Level planner, to effectively avoid collisions beyond High-level considerations, such as the pedestrian or bicycles. Simulation results and laboratory experiments show that our proposed method outperforms existing strategies. Moreover, the most impressive advantage is that the proposed strategy can plan vehicle trajectory in milliseconds, which is promising in realworld deployments. A detailed description include the coordination framework and experiment demo could be found at the supplement materials, or online at https://youtu.be/MuhjhKfNIOg.
Federated learning (FL) has emerged as a promising master/slave learning paradigm to alleviate systemic privacy risks and communication costs incurred by cloud-centric machine learning methods. However, it is very challenging to resist the single point of failure of the master aggregator and attacks from malicious participants while guaranteeing model convergence speed and accuracy. Recently, blockchain has been brought into FL systems transforming the paradigm to a decentralized manner thus further improve the system security and learning reliability. Unfortunately, the traditional consensus mechanism and architecture of blockchain systems can hardly handle the large-scale FL task due to the huge resource consumption, limited transaction throughput, and high communication complexity. To address these issues, this paper proposes a two-layer blockchaindriven FL framework, called as ChainsFL, which is composed of multiple subchain networks (subchain layer) and a direct acyclic graph (DAG)-based mainchain (mainchain layer). In ChainsFL, the subchain layer limits the scale of each shard for a small range of information exchange, and the mainchain layer allows each shard to share and validate the learning model in parallel and asynchronously to improve the efficiency of cross-shard validation. Furthermore, the FL procedure is customized to deeply integrate with blockchain technology, and the modified DAG consensus mechanism is proposed to mitigate the distortion caused by abnormal models. In order to provide a proof-ofconcept implementation and evaluation, multiple subchains base on Hyperledger Fabric are deployed as the subchain layer, and the self-developed DAG-based mainchain is deployed as the mainchain layer. The experimental results show that ChainsFL provides acceptable and sometimes better training efficiency and stronger robustness compared with the typical existing FL systems.
Due to the distributed characteristics of Federated Learning (FL), the vulnerability of global model and coordination of devices are the main obstacle. As a promising solution of decentralization, scalability and security, leveraging blockchain in FL has attracted much attention in recent years. However, the traditional consensus mechanisms designed for blockchain like Proof of Work (PoW) would cause extreme resource consumption, which reduces the efficiency of FL greatly, especially when the participating devices are wireless and resource-limited. In order to address device asynchrony and anomaly detection in FL while avoiding the extra resource consumption caused by blockchain, this paper introduces a framework for empowering FL using Direct Acyclic Graph (DAG)-based blockchain systematically (DAG-FL). Accordingly, DAG-FL is first introduced from a three-layer architecture in details, and then two algorithms DAG-FL Controlling and DAG-FL Updating are designed running on different nodes to elaborate the operation of DAG-FL consensus mechanism. After that, a Poisson process model is formulated to discuss that how to set deployment parameters to maintain DAG-FL stably in different federated learning tasks. The extensive simulations and experiments show that DAG-FL can achieve better performance in terms of training efficiency and model accuracy compared with the typical existing on-device federated learning systems as the benchmarks.
Machine Comprehension (MC) is a challenging task in Natural Language Processing field, which aims to guide the machine to comprehend a passage and answer the given question. Many existing approaches on MC task are suffering the inefficiency in some bottlenecks, such as insufficient lexical understanding, complex question-passage interaction, incorrect answer extraction and so on. In this paper, we address these problems from the viewpoint of how humans deal with reading tests in a scientific way. Specifically, we first propose a novel lexical gating mechanism to dynamically combine the words and characters representations. We then guide the machines to read in an interactive way with attention mechanism and memory network. Finally we add a checking layer to refine the answer for insurance. The extensive experiments on two popular datasets SQuAD and TriviaQA show that our method exceeds considerable performance than most state-of-the-art solutions at the time of submission.
Machine comprehension(MC) style question answering is a representative problem in natural language processing. Previous methods rarely spend time on the improvement of encoding layer, especially the embedding of syntactic information and name entity of the words, which are very crucial to the quality of encoding. Moreover, existing attention methods represent each query word as a vector or use a single vector to represent the whole query sentence, neither of them can handle the proper weight of the key words in query sentence. In this paper, we introduce a novel neural network architecture called Multi-layer Embedding with Memory Network(MEMEN) for machine reading task. In the encoding layer, we employ classic skip-gram model to the syntactic and semantic information of the words to train a new kind of embedding layer. We also propose a memory network of full-orientation matching of the query and passage to catch more pivotal information. Experiments show that our model has competitive results both from the perspectives of precision and efficiency in Stanford Question Answering Dataset(SQuAD) among all published results and achieves the state-of-the-art results on TriviaQA dataset.
Collaborative filtering is an effective recommendation approach in which the preference of a user on an item is predicted based on the preferences of other users with similar interests. A big challenge in using collaborative filtering methods is the data sparsity problem which often arises because each user typically only rates very few items and hence the rating matrix is extremely sparse. In this paper, we address this problem by considering multiple collaborative filtering tasks in different domains simultaneously and exploiting the relationships between domains. We refer to it as a multi-domain collaborative filtering (MCF) problem. To solve the MCF problem, we propose a probabilistic framework which uses probabilistic matrix factorization to model the rating problem in each domain and allows the knowledge to be adaptively transferred across different domains by automatically learning the correlation between domains. We also introduce the link function for different domains to correct their biases. Experiments conducted on several real-world applications demonstrate the effectiveness of our methods when compared with some representative methods.