Recent advances achieved by deep learning models rely on the independent and identically distributed assumption, hindering their applications in real-world scenarios with domain shifts. To address the above issues, cross-domain learning aims at extracting domain-invariant knowledge to reduce the domain shift between training and testing data. However, in visual cross-domain learning, traditional methods concentrate solely on the image modality, neglecting the use of the text modality to alleviate the domain shift. In this work, we propose Large Language models as Visual cross-dOmain learners (LLaVO). LLaVO uses vision-language models to convert images into detailed textual descriptions. A large language model is then finetuned on textual descriptions of the source/target domain generated by a designed instruction template. Extensive experimental results on various cross-domain tasks under the domain generalization and unsupervised domain adaptation settings have demonstrated the effectiveness of the proposed method.
Limited transferability hinders the performance of deep learning models when applied to new application scenarios. Recently, Unsupervised Domain Adaptation (UDA) has achieved significant progress in addressing this issue via learning domain-invariant features. However, the performance of existing UDA methods is constrained by the large domain shift and limited target domain data. To alleviate these issues, we propose DomAin-guided Conditional Diffusion Model (DACDM) to generate high-fidelity and diversity samples for the target domain. In the proposed DACDM, by introducing class information, the labels of generated samples can be controlled, and a domain classifier is further introduced in DACDM to guide the generated samples for the target domain. The generated samples help existing UDA methods transfer from the source domain to the target domain more easily, thus improving the transfer performance. Extensive experiments on various benchmarks demonstrate that DACDM brings a large improvement to the performance of existing UDA methods.
Limited transferability hinders the performance of deep learning models when applied to new application scenarios. Recently, unsupervised domain adaptation (UDA) has achieved significant progress in addressing this issue via learning domain-invariant features. However, large domain shifts and the sample scarcity in the target domain make existing UDA methods achieve suboptimal performance. To alleviate these issues, we propose a plug-and-play Diffusion-based Target Sampler (DTS) to generate high fidelity and diversity pseudo target samples. By introducing class-conditional information, the labels of the generated target samples can be controlled. The generated samples can well simulate the data distribution of the target domain and help existing UDA methods transfer from the source domain to the target domain more easily, thus improving the transfer performance. Extensive experiments on various benchmarks demonstrate that the performance of existing UDA methods can be greatly improved through the proposed DTS method.
The reverberation time is one of the most important parameters used to characterize the acoustic property of an enclosure. In real-world scenarios, it is much more convenient to estimate the reverberation time blindly from recorded speech compared to the traditional acoustic measurement techniques using professional measurement instruments. However, the recorded speech is often corrupted by noise, which has a detrimental effect on the estimation accuracy of the reverberation time. To address this issue, this paper proposes a two-stage blind reverberation time estimation method based on noise-aware time-frequency masking. This proposed method has a good ability to distinguish the reverberation tails from the noise, thus improving the estimation accuracy of reverberation time in noisy scenarios. The simulated and real-world acoustic experimental results show that the proposed method significantly outperforms other methods in challenging scenarios.
Density Ratio Estimation has attracted attention from machine learning community due to its ability of comparing the underlying distributions of two datasets. However, in some applications, we want to compare distributions of \emph{latent} random variables that can be only inferred from observations. In this paper, we study the problem of estimating the ratio between two posterior probability density functions of a latent variable. Particularly, we assume the posterior ratio function can be well-approximated by a parametric model, which is then estimated using observed datasets and synthetic prior samples. We prove consistency of our estimator and the asymptotic normality of the estimated parameters as the number of prior samples tending to infinity. Finally, we validate our theories using numerical experiments and demonstrate the usefulness of the proposed method through some real-world applications.