Abstract:Remote heart rate measurement is an increasingly concerned research field, usually using remote photoplethysmography (rPPG) to collect heart rate information through video data collection. However, in certain specific scenarios (such as low light conditions, intense lighting, and non-line-of-sight situations), traditional imaging methods fail to capture image information effectively, that may lead to difficulty or inability in measuring heart rate. To address these limitations, this study proposes using ghost imaging as a substitute for traditional imaging in the aforementioned scenarios. The mean absolute error between experimental measurements and reference true values is 4.24 bpm.Additionally, the bucket signals obtained by the ghost imaging system can be directly processed using digital signal processing techniques, thereby enhancing personal privacy protection.
Abstract:We study the stochastic multi-armed bandit problem in the $P$-pass streaming model. In this problem, the $n$ arms are present in a stream and at most $m<n$ arms and their statistics can be stored in the memory. We give a complete characterization of the optimal regret in terms of $m, n$ and $P$. Specifically, we design an algorithm with $\tilde O\left((n-m)^{1+\frac{2^{P}-2}{2^{P+1}-1}} n^{\frac{2-2^{P+1}}{2^{P+1}-1}} T^{\frac{2^P}{2^{P+1}-1}}\right)$ regret and complement it with an $\tilde \Omega\left((n-m)^{1+\frac{2^{P}-2}{2^{P+1}-1}} n^{\frac{2-2^{P+1}}{2^{P+1}-1}} T^{\frac{2^P}{2^{P+1}-1}}\right)$ lower bound when the number of rounds $T$ is sufficiently large. Our results are tight up to a logarithmic factor in $n$ and $P$.
Abstract:The immense popularity of racket sports has fueled substantial demand in tactical analysis with broadcast videos. However, existing manual methods require laborious annotation, and recent attempts leveraging video perception models are limited to low-level annotations like ball trajectories, overlooking tactics that necessitate an understanding of stroke techniques. State-of-the-art action segmentation models also struggle with technique recognition due to frequent occlusions and motion-induced blurring in racket sports videos. To address these challenges, We propose ViSTec, a Video-based Sports Technique recognition model inspired by human cognition that synergizes sparse visual data with rich contextual insights. Our approach integrates a graph to explicitly model strategic knowledge in stroke sequences and enhance technique recognition with contextual inductive bias. A two-stage action perception model is jointly trained to align with the contextual knowledge in the graph. Experiments demonstrate that our method outperforms existing models by a significant margin. Case studies with experts from the Chinese national table tennis team validate our model's capacity to automate analysis for technical actions and tactical strategies. More details are available at: https://ViSTec2024.github.io/.
Abstract:Learning with expert advice and multi-armed bandit are two classic online decision problems which differ on how the information is observed in each round of the game. We study a family of problems interpolating the two. For a vector $\mathbf{m}=(m_1,\dots,m_K)\in \mathbb{N}^K$, an instance of $\mathbf{m}$-MAB indicates that the arms are partitioned into $K$ groups and the $i$-th group contains $m_i$ arms. Once an arm is pulled, the losses of all arms in the same group are observed. We prove tight minimax regret bounds for $\mathbf{m}$-MAB and design an optimal PAC algorithm for its pure exploration version, $\mathbf{m}$-BAI, where the goal is to identify the arm with minimum loss with as few rounds as possible. We show that the minimax regret of $\mathbf{m}$-MAB is $\Theta\left(\sqrt{T\sum_{k=1}^K\log (m_k+1)}\right)$ and the minimum number of pulls for an $(\epsilon,0.05)$-PAC algorithm of $\mathbf{m}$-BAI is $\Theta\left(\frac{1}{\epsilon^2}\cdot \sum_{k=1}^K\log (m_k+1)\right)$. Both our upper bounds and lower bounds for $\mathbf{m}$-MAB can be extended to a more general setting, namely the bandit with graph feedback, in terms of the clique cover and related graph parameters. As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.
Abstract:Auction-based recommender systems are prevalent in online advertising platforms, but they are typically optimized to allocate recommendation slots based on immediate expected return metrics, neglecting the downstream effects of recommendations on user behavior. In this study, we employ reinforcement learning to optimize for long-term return metrics in an auction-based recommender system. Utilizing temporal difference learning, a fundamental reinforcement learning algorithm, we implement an one-step policy improvement approach that biases the system towards recommendations with higher long-term user engagement metrics. This optimizes value over long horizons while maintaining compatibility with the auction framework. Our approach is grounded in dynamic programming ideas which show that our method provably improves upon the existing auction-based base policy. Through an online A/B test conducted on an auction-based recommender system which handles billions of impressions and users daily, we empirically establish that our proposed method outperforms the current production system in terms of long-term user engagement metrics.
Abstract:Due to the complex of mixed spectral point spread function within memory effect range, it is unreliable and slow to use speckle correlation technology for non-invasive imaging through scattering medium under broadband illumination. The contrast of the speckles will drastically drop as the light source's spectrum width increases. Here, we propose a method for producing the optical transfer function with several speckle frames within memory effect range to image under broadband illumination. The method can be applied to image amplitude and color objects under white LED illumination. Compared to other approaches of imaging under broadband illumination, such as deep learning and modified phase retrieval, our method can provide more stable results with faster convergence speed, which can be applied in high speed scattering imaging under natural light illumination.
Abstract:The rise of deep neural networks provides an important driver in optimizing recommender systems. However, the success of recommender systems lies in delicate architecture fabrication, and thus calls for Neural Architecture Search (NAS) to further improve its modeling. We propose NASRec, a paradigm that trains a single supernet and efficiently produces abundant models/sub-architectures by weight sharing. To overcome the data multi-modality and architecture heterogeneity challenges in recommendation domain, NASRec establishes a large supernet (i.e., search space) to search the full architectures, with the supernet incorporating versatile operator choices and dense connectivity minimizing human prior for flexibility. The scale and heterogeneity in NASRec impose challenges in search, such as training inefficiency, operator-imbalance, and degraded rank correlation. We tackle these challenges by proposing single-operator any-connection sampling, operator-balancing interaction modules, and post-training fine-tuning. Our results on three Click-Through Rates (CTR) prediction benchmarks show that NASRec can outperform both manually designed models and existing NAS methods, achieving state-of-the-art performance.
Abstract:The problem of bandit with graph feedback generalizes both the multi-armed bandit (MAB) problem and the learning with expert advice problem by encoding in a directed graph how the loss vector can be observed in each round of the game. The mini-max regret is closely related to the structure of the feedback graph and their connection is far from being fully understood. We propose a new algorithmic framework for the problem based on a partition of the feedback graph. Our analysis reveals the interplay between various parts of the graph by decomposing the regret to the sum of the regret caused by small parts and the regret caused by their interaction. As a result, our algorithm can be viewed as an interpolation and generalization of the optimal algorithms for MAB and learning with expert advice. Our framework unifies previous algorithms for both strongly observable graphs and weakly observable graphs, resulting in improved and optimal regret bounds on a wide range of graph families including graphs of bounded degree and strongly observable graphs with a few corrupted arms.
Abstract:Benefit from the promising features of second-order correlation, ghost imaging (GI) has received extensive attentions in recent years. Simultaneously, GI is affected by the poor trade-off between sampling rate and imaging quality. The traditional image reconstruction method in GI is to accumulate the action result of each speckle and the corresponding bucket signal. We found that the image reconstruction process of GI is very similar to the Recurrent Neural Network (RNN), which is one of the deep learning algorithm. In this paper, we proposed a novel method that effectively implements GI on the RNN architecture, called GI-RNN. The state of each layer in RNN is determined by the output of the previous layer and the input of this layer, and the output of the network is the sum of all previous states. Therefore, we take the speckle of each illumination and the corresponding bucket signal as the input of each layer, and the output of the network is the sum of all previous speckle and bucket signal, which is the image of the target. The testing results show that the proposed method can achieve image reconstruction at a very low sampling rate (0.38$\%$). Moreover, we compare GI-RNN with traditional GI algorithm and compressed sensing algorithm. The results of different targets show that GI-RNN is 6.61 dB higher than compressed sensing algorithm and 12.58 dB higher than traditional GI algorithm on average. In our view, the proposed method makes an important step to applications of GI.
Abstract:Multi-party dialogue machine reading comprehension (MRC) raises an even more challenging understanding goal on dialogue with more than two involved speakers, compared with the traditional plain passage style MRC. To accurately perform the question-answering (QA) task according to such multi-party dialogue, models have to handle fundamentally different discourse relationships from common non-dialogue plain text, where discourse relations are supposed to connect two far apart utterances in a linguistics-motivated way.To further explore the role of such unusual discourse structure on the correlated QA task in terms of MRC, we propose the first multi-task model for jointly performing QA and discourse parsing (DP) on the multi-party dialogue MRC task. Our proposed model is evaluated on the latest benchmark Molweni, whose results indicate that training with complementary tasks indeed benefits not only QA task, but also DP task itself. We further find that the joint model is distinctly stronger when handling longer dialogues which again verifies the necessity of DP in the related MRC.