In many environmental applications, recurrent neural networks (RNNs) are often used to model physical variables with long temporal dependencies. However, due to mini-batch training, temporal relationships between training segments within the batch (intra-batch) as well as between batches (inter-batch) are not considered, which can lead to limited performance. Stateful RNNs aim to address this issue by passing hidden states between batches. Since Stateful RNNs ignore intra-batch temporal dependency, there exists a trade-off between training stability and capturing temporal dependency. In this paper, we provide a quantitative comparison of different Stateful RNN modeling strategies, and propose two strategies to enforce both intra- and inter-batch temporal dependency. First, we extend Stateful RNNs by defining a batch as a temporally ordered set of training segments, which enables intra-batch sharing of temporal information. While this approach significantly improves the performance, it leads to much larger training times due to highly sequential training. To address this issue, we further propose a new strategy which augments a training segment with an initial value of the target variable from the timestep right before the starting of the training segment. In other words, we provide an initial value of the target variable as additional input so that the network can focus on learning changes relative to that initial value. By using this strategy, samples can be passed in any order (mini-batch training) which significantly reduces the training time while maintaining the performance. In demonstrating our approach in hydrological modeling, we observe that the most significant gains in predictive accuracy occur when these methods are applied to state variables whose values change more slowly, such as soil water and snowpack, rather than continuously moving flux variables such as streamflow.
Semantic textual similarity (STS) in the clinical domain helps improve diagnostic efficiency and produce concise texts for downstream data mining tasks. However, given the high degree of domain knowledge involved in clinic text, it remains challenging for general language models to infer implicit medical relationships behind clinical sentences and output similarities correctly. In this paper, we present a graph-augmented cyclic learning framework for similarity estimation in the clinical domain. The framework can be conveniently implemented on a state-of-art backbone language model, and improve its performance by leveraging domain knowledge through co-training with an auxiliary graph convolution network (GCN) based network. We report the success of introducing domain knowledge in GCN and the co-training framework by improving the Bio-clinical BERT baseline by 16.3% and 27.9%, respectively.
This paper proposes a new data-driven method for predicting water temperature in stream networks with reservoirs. The water flows released from reservoirs greatly affect the water temperature of downstream river segments. However, the information of released water flow is often not available for many reservoirs, which makes it difficult for data-driven models to capture the impact to downstream river segments. In this paper, we first build a state-aware graph model to represent the interactions amongst streams and reservoirs, and then propose a parallel learning structure to extract the reservoir release information and use it to improve the prediction. In particular, for reservoirs with no available release information, we mimic the water managers' release decision process through a pseudo-prospective learning method, which infers the release information from anticipated water temperature dynamics. For reservoirs with the release information, we leverage a physics-based model to simulate the water release temperature and transfer such information to guide the learning process for other reservoirs. The evaluation for the Delaware River Basin shows that the proposed method brings over 10\% accuracy improvement over existing data-driven models for stream temperature prediction when the release data is not available for any reservoirs. The performance is further improved after we incorporate the release data and physical simulations for a subset of reservoirs.
Accurate prediction of water temperature in streams is critical for monitoring and understanding biogeochemical and ecological processes in streams. Stream temperature is affected by weather patterns (such as solar radiation) and water flowing through the stream network. Additionally, stream temperature can be substantially affected by water releases from man-made reservoirs to downstream segments. In this paper, we propose a heterogeneous recurrent graph model to represent these interacting processes that underlie stream-reservoir networks and improve the prediction of water temperature in all river segments within a network. Because reservoir release data may be unavailable for certain reservoirs, we further develop a data assimilation mechanism to adjust the deep learning model states to correct for the prediction bias caused by reservoir releases. A well-trained temporal modeling component is needed in order to use adjusted states to improve future predictions. Hence, we also introduce a simulation-based pre-training strategy to enhance the model training. Our evaluation for the Delaware River Basin has demonstrated the superiority of our proposed method over multiple existing methods. We have extensively studied the effect of the data assimilation mechanism under different scenarios. Moreover, we show that the proposed method using the pre-training strategy can still produce good predictions even with limited training data.
Machine Learning is being extensively used in hydrology, especially streamflow prediction of basins/watersheds. Basin characteristics are essential for modeling the rainfall-runoff response of these watersheds and therefore data-driven methods must take into account this ancillary characteristics data. However there are several limitations, namely uncertainty in the measured characteristics, partially missing characteristics for some of the basins or unknown characteristics that may not be present in the known measured set. In this paper we present an inverse model that uses a knowledge-guided self-supervised learning algorithm to infer basin characteristics using the meteorological drivers and streamflow response data. We evaluate our model on the the CAMELS dataset and the results validate its ability to reduce measurement uncertainty, impute missing characteristics, and identify unknown characteristics.
Direct numerical simulation (DNS) of turbulent flows is computationally expensive and cannot be applied to flows with large Reynolds numbers. Large eddy simulation (LES) is an alternative that is computationally less demanding, but is unable to capture all of the scales of turbulent transport accurately. Our goal in this work is to build a new data-driven methodology based on super-resolution techniques to reconstruct DNS data from LES predictions. We leverage the underlying physical relationships to regularize the relationships amongst different physical variables. We also introduce a hierarchical generative process and a reverse degradation process to fully explore the correspondence between DNS and LES data. We demonstrate the effectiveness of our method through a single-snapshot experiment and a cross-time experiment. The results confirm that our method can better reconstruct high-resolution DNS data over space and over time in terms of pixel-wise reconstruction error and structural similarity. Visual comparisons show that our method performs much better in capturing fine-level flow dynamics.
In many applications, finding adequate labeled data to train predictive models is a major challenge. In this work, we propose methods to use group-level binary labels as weak supervision to train instance-level binary classification models. Aggregate labels are common in several domains where annotating on a group-level might be cheaper or might be the only way to provide annotated data without infringing on privacy. We model group-level labels as Class Conditional Noisy (CCN) labels for individual instances and use the noisy labels to regularize predictions of the model trained on the strongly-labeled instances. Our experiments on real-world application of land cover mapping shows the utility of the proposed method in leveraging group-level labels, both in the presence and absence of class imbalance.
Collecting large annotated datasets in Remote Sensing is often expensive and thus can become a major obstacle for training advanced machine learning models. Common techniques of addressing this issue, based on the underlying idea of pre-training the Deep Neural Networks (DNN) on freely available large datasets, cannot be used for Remote Sensing due to the unavailability of such large-scale labeled datasets and the heterogeneity of data sources caused by the varying spatial and spectral resolution of different sensors. Self-supervised learning is an alternative approach that learns feature representation from unlabeled images without using any human annotations. In this paper, we introduce a new method for land cover mapping by using a clustering based pretext task for self-supervised learning. We demonstrate the effectiveness of the method on two societally relevant applications from the aspect of segmentation performance, discriminative feature representation learning and the underlying cluster structure. We also show the effectiveness of the active sampling using the clusters obtained from our method in improving the mapping accuracy given a limited budget of annotating.
Mapping and monitoring crops is a key step towards sustainable intensification of agriculture and addressing global food security. A dataset like ImageNet that revolutionized computer vision applications can accelerate development of novel crop mapping techniques. Currently, the United States Department of Agriculture (USDA) annually releases the Cropland Data Layer (CDL) which contains crop labels at 30m resolution for the entire United States of America. While CDL is state of the art and is widely used for a number of agricultural applications, it has a number of limitations (e.g., pixelated errors, labels carried over from previous errors and absence of input imagery along with class labels). In this work, we create a new semantic segmentation benchmark dataset, which we call CalCROP21, for the diverse crops in the Central Valley region of California at 10m spatial resolution using a Google Earth Engine based robust image processing pipeline and a novel attention based spatio-temporal semantic segmentation algorithm STATT. STATT uses re-sampled (interpolated) CDL labels for training, but is able to generate a better prediction than CDL by leveraging spatial and temporal patterns in Sentinel2 multi-spectral image series to effectively capture phenologic differences amongst crops and uses attention to reduce the impact of clouds and other atmospheric disturbances. We also present a comprehensive evaluation to show that STATT has significantly better results when compared to the resampled CDL labels. We have released the dataset and the processing pipeline code for generating the benchmark dataset.
The availability of massive earth observing satellite data provide huge opportunities for land use and land cover mapping. However, such mapping effort is challenging due to the existence of various land cover classes, noisy data, and the lack of proper labels. Also, each land cover class typically has its own unique temporal pattern and can be identified only during certain periods. In this article, we introduce a novel architecture that incorporates the UNet structure with Bidirectional LSTM and Attention mechanism to jointly exploit the spatial and temporal nature of satellite data and to better identify the unique temporal patterns of each land cover. We evaluate this method for mapping crops in multiple regions over the world. We compare our method with other state-of-the-art methods both quantitatively and qualitatively on two real-world datasets which involve multiple land cover classes. We also visualise the attention weights to study its effectiveness in mitigating noise and identifying discriminative time period.