Abstract:In this work, we present a novel method for extensive multi-scale generative terrain modeling. At the core of our model is a cascade of superresolution diffusion models that can be combined to produce consistent images across multiple resolutions. Pairing this concept with a tiled generation method yields a scalable system that can generate thousands of square kilometers of realistic Earth surfaces at high resolution. We evaluate our method on a dataset collected from Bing Maps and show that it outperforms super-resolution baselines on the extreme super-resolution task of 1024x zoom. We also demonstrate its ability to create diverse and coherent scenes via an interactive gigapixel-scale generated map. Finally, we demonstrate how our system can be extended to enable novel content creation applications including controllable world generation and 3D scene generation.
Abstract:Heterogeneous treatment effect (HTE) estimation from observational data poses significant challenges due to treatment selection bias. Existing methods address this bias by minimizing distribution discrepancies between treatment groups in latent space, focusing on global alignment. However, the fruitful aspect of local proximity, where similar units exhibit similar outcomes, is often overlooked. In this study, we propose Proximity-aware Counterfactual Regression (PCR) to exploit proximity for representation balancing within the HTE estimation context. Specifically, we introduce a local proximity preservation regularizer based on optimal transport to depict the local proximity in discrepancy calculation. Furthermore, to overcome the curse of dimensionality that renders the estimation of discrepancy ineffective, exacerbated by limited data availability for HTE estimation, we develop an informative subspace projector, which trades off minimal distance precision for improved sample complexity. Extensive experiments demonstrate that PCR accurately matches units across different treatment groups, effectively mitigates treatment selection bias, and significantly outperforms competitors. Code is available at https://anonymous.4open.science/status/ncr-B697.
Abstract:We present a simple, modular, and generic method that upsamples coarse 3D models by adding geometric and appearance details. While generative 3D models now exist, they do not yet match the quality of their counterparts in image and video domains. We demonstrate that it is possible to directly repurpose existing (pretrained) video models for 3D super-resolution and thus sidestep the problem of the shortage of large repositories of high-quality 3D training models. We describe how to repurpose video upsampling models, which are not 3D consistent, and combine them with 3D consolidation to produce 3D-consistent results. As output, we produce high quality Gaussian Splat models, which are object centric and effective. Our method is category agnostic and can be easily incorporated into existing 3D workflows. We evaluate our proposed SuperGaussian on a variety of 3D inputs, which are diverse both in terms of complexity and representation (e.g., Gaussian Splats or NeRFs), and demonstrate that our simple method significantly improves the fidelity of the final 3D models. Check our project website for details: supergaussian.github.io
Abstract:Future wireless networks will integrate sensing, learning and communication to provide new services beyond communication and to become more resilient. Sensors at the network infrastructure, sensors on the user equipment, and the sensing capability of the communication signal itself provide a new source of data that connects the physical and radio frequency environments. A wireless network that harnesses all these sensing data can not only enable additional sensing services, but also become more resilient to channel-dependent effects like blockage and better support adaptation in dynamic environments as networks reconfigure. In this paper, we provide a vision for integrated sensing and communication (ISAC) networks and an overview of how signal processing, optimization and machine learning techniques can be leveraged to make them a reality in the context of 6G. We also include some examples of the performance of several of these strategies when evaluated using a simulation framework based on a combination of ray tracing measurements and mathematical models that mix the digital and physical worlds.
Abstract:Can computers perceive the physical properties of objects solely through vision? Research in cognitive science and vision science has shown that humans excel at identifying materials and estimating their physical properties based purely on visual appearance. In this paper, we present a novel approach for dense prediction of the physical properties of objects using a collection of images. Inspired by how humans reason about physics through vision, we leverage large language models to propose candidate materials for each object. We then construct a language-embedded point cloud and estimate the physical properties of each 3D point using a zero-shot kernel regression approach. Our method is accurate, annotation-free, and applicable to any object in the open world. Experiments demonstrate the effectiveness of the proposed approach in various physical property reasoning tasks, such as estimating the mass of common objects, as well as other properties like friction and hardness.
Abstract:This paper proposes a scalable distributed policy gradient method and proves its convergence to near-optimal solution in multi-agent linear quadratic networked systems. The agents engage within a specified network under local communication constraints, implying that each agent can only exchange information with a limited number of neighboring agents. On the underlying graph of the network, each agent implements its control input depending on its nearby neighbors' states in the linear quadratic control setting. We show that it is possible to approximate the exact gradient only using local information. Compared with the centralized optimal controller, the performance gap decreases to zero exponentially as the communication and control ranges increase. We also demonstrate how increasing the communication range enhances system stability in the gradient descent process, thereby elucidating a critical trade-off. The simulation results verify our theoretical findings.
Abstract:Accurate localization and perception are pivotal for enhancing the safety and reliability of vehicles. However, current localization methods suffer from reduced accuracy when the line-of-sight (LOS) path is obstructed, or a combination of reflections and scatterings is present. In this paper, we present an integrated localization and sensing method that delivers superior performance in complex environments while being computationally efficient. Our method uniformly leverages various types of multipath components (MPCs) through the lens of random finite sets (RFSs), encompassing reflections, scatterings, and their combinations. This advancement eliminates the need for the multipath identification step and streamlines the filtering process by removing the necessity for distinct filters for different multipath types, a requirement that was critical in previous research. The simulation results demonstrate the superior performance of our method in both robustness and effectiveness, particularly in complex environments where the LOS MPC is obscured and in situations involving clutter and missed detection of MPC measurements.
Abstract:Communication in multi-agent reinforcement learning (MARL) has been proven to effectively promote cooperation among agents recently. Since communication in real-world scenarios is vulnerable to noises and adversarial attacks, it is crucial to develop robust communicative MARL technique. However, existing research in this domain has predominantly focused on passive defense strategies, where agents receive all messages equally, making it hard to balance performance and robustness. We propose an active defense strategy, where agents automatically reduce the impact of potentially harmful messages on the final decision. There are two challenges to implement this strategy, that are defining unreliable messages and adjusting the unreliable messages' impact on the final decision properly. To address them, we design an Active Defense Multi-Agent Communication framework (ADMAC), which estimates the reliability of received messages and adjusts their impact on the final decision accordingly with the help of a decomposable decision structure. The superiority of ADMAC over existing methods is validated by experiments in three communication-critical tasks under four types of attacks.
Abstract:Diffusion models have been demonstrated as powerful deep learning tools for image generation in CT reconstruction and restoration. Recently, diffusion posterior sampling, where a score-based diffusion prior is combined with a likelihood model, has been used to produce high quality CT images given low-quality measurements. This technique is attractive since it permits a one-time, unsupervised training of a CT prior; which can then be incorporated with an arbitrary data model. However, current methods only rely on a linear model of x-ray CT physics to reconstruct or restore images. While it is common to linearize the transmission tomography reconstruction problem, this is an approximation to the true and inherently nonlinear forward model. We propose a new method that solves the inverse problem of nonlinear CT image reconstruction via diffusion posterior sampling. We implement a traditional unconditional diffusion model by training a prior score function estimator, and apply Bayes rule to combine this prior with a measurement likelihood score function derived from the nonlinear physical model to arrive at a posterior score function that can be used to sample the reverse-time diffusion process. This plug-and-play method allows incorporation of a diffusion-based prior with generalized nonlinear CT image reconstruction into multiple CT system designs with different forward models, without the need for any additional training. We develop the algorithm that performs this reconstruction, including an ordered-subsets variant for accelerated processing and demonstrate the technique in both fully sampled low dose data and sparse-view geometries using a single unsupervised training of the prior.
Abstract:Integrated sensing and communication (ISAC) has been proposed as a promising paradigm in the future wireless networks, where the spectral and hardware resources are shared to provide a considerable performance gain. It is essential to understand how sensing and communication (S\&C) influences each other to guide the practical algorithm and system design in ISAC. In this paper, we investigate the performance tradeoff between S\&C in a single-input single-output (SISO) ISAC system with finite blocklength. In particular, we present the system model and the ISAC scheme, after which the rate-error tradeoff is introduced as the performance metric. Then we derive the achievability and converse bounds for the rate-error tradeoff, determining the boundary of the joint S\&C performance. Furthermore, we develop the asymptotic analysis at large blocklength regime, where the performance tradeoff between S\&C is proved to vanish as the blocklength tends to infinity. Finally, our theoretical analysis is consolidated by simulation results.