Neural Radiance Fields (NeRFs) learn to represent a 3D scene from just a set of registered images. Increasing sizes of a scene demands more complex functions, typically represented by neural networks, to capture all details. Training and inference then involves querying the neural network millions of times per image, which becomes impractically slow. Since such complex functions can be replaced by multiple simpler functions to improve speed, we show that a hierarchy of Voronoi diagrams is a suitable choice to partition the scene. By equipping each Voronoi cell with its own NeRF, our approach is able to quickly learn a scene representation. We propose an intuitive partitioning of the space that increases quality gains during training by distributing information evenly among the networks and avoids artifacts through a top-down adaptive refinement. Our framework is agnostic to the underlying NeRF method and easy to implement, which allows it to be applied to various NeRF variants for improved learning and rendering speeds.
Probabilistic constellation shaping has been used in long-haul optically amplified coherent systems for its capability to approach the Shannon limit and realize fine rate granularity. The availability of high-bandwidth optical-electronic components and the previously mentioned advantages have invigorated researchers to explore probabilistic shaping (PS) in intensity-modulation and direct-detection (IM/DD) systems. This article presents an extensive comparison of uniform 8-ary pulse amplitude modulation (PAM) with PS PAM-8 using cap and cup Maxwell-Boltzmann (MB) distributions as well as MB distributions of different Gaussian orders. We report that in the presence of linear equalization, PS-PAM-8 outperforms uniform PAM-8 in terms of bit error ratio, achievable information rate and operational net bit rate indicating that cap-shaped PS-PAM-8 shows high tolerance against nonlinearities. In this paper, we have focused our investigations on O-band electro-absorption modulated laser unamplified IM/DD systems, which are operated close to the zero dispersion wavelength.
Deep Learning (DL)-based channel state information (CSI) feedback is a promising technique for the transmitter to accurately acquire the CSI of massive multiple-input multiple-output (MIMO) systems. As a critical concern about DL-based physical layer application, although most existing CSI feedback methods have shown advantages in dedicated channel environments and scenarios, the generalization and adaptivity of these methods remain challenging. Therefore, we propose a Hybrid Complex-Valued Lightweight framework, namely HybridCVLNet, with a hybrid structure, task, and codeword and correspondent domain adaptation scheme to overcome the data drift and dataset bias for CSI feedback. The experiment verifies the validity of the proposed lightweight HybridCVLNet regularization to the regression process. It achieves stable generalizability and performance gain over the SOTA feedback schemes in an intra-domain multi-category setting. In addition, its hybrid domain adaptation scheme is more efficient and superior to the online direct-finetune method under the unseen cross-domain dataset.
We develop a novel credit assignment algorithm for information processing with spiking neurons without requiring feedback synapses. Specifically, we propose an event-driven generalization of the forward-forward and the predictive forward-forward learning processes for a spiking neural system that iteratively processes sensory input over a stimulus window. As a result, the recurrent circuit computes the membrane potential of each neuron in each layer as a function of local bottom-up, top-down, and lateral signals, facilitating a dynamic, layer-wise parallel form of neural computation. Unlike spiking neural coding, which relies on feedback synapses to adjust neural electrical activity, our model operates purely online and forward in time, offering a promising way to learn distributed representations of sensory data patterns with temporal spike signals. Notably, our experimental results on several pattern datasets demonstrate that the even-driven forward-forward (ED-FF) framework works well for training a dynamic recurrent spiking system capable of both classification and reconstruction.
Existing conversational models are handled by a database(DB) and API based systems. However, very often users' questions require information that cannot be handled by such systems. Nonetheless, answers to these questions are available in the form of customer reviews and FAQs. DSTC-11 proposes a three stage pipeline consisting of knowledge seeking turn detection, knowledge selection and response generation to create a conversational model grounded on this subjective knowledge. In this paper, we focus on improving the knowledge selection module to enhance the overall system performance. In particular, we propose entity retrieval methods which result in an accurate and faster knowledge search. Our proposed Named Entity Recognition (NER) based entity retrieval method results in 7X faster search compared to the baseline model. Additionally, we also explore a potential keyword extraction method which can improve the accuracy of knowledge selection. Preliminary results show a 4 \% improvement in exact match score on knowledge selection task. The code is available https://github.com/raja-kumar/knowledge-grounded-TODS
Over the past years, the ever-growing trend on data storage demand, more specifically for "cold" data (i.e. rarely accessed), has motivated research for alternative systems of data storage. Because of its biochemical characteristics, synthetic DNA molecules are considered as potential candidates for a new storage paradigm. Because of this trend, several coding solutions have been proposed over the past years for the storage of digital information into DNA. Despite being a promising solution, DNA storage faces two major obstacles: the large cost of synthesis and the noise introduced during sequencing. Additionally, this noise increases when biochemically defined coding constraints are not respected: avoiding homopolymers and patterns, as well as balancing the GC content. This paper describes a novel entropy coder which can be embedded to any block-based image-coding schema and aims to robustify the decoded results. Our proposed solution introduces variability in the generated quaternary streams, reduces the amount of homopolymers and repeated patterns to reduce the probability of errors occurring. In this paper, we integrate the proposed entropy coder into four existing JPEG-inspired DNA coders. We then evaluate the quality -- in terms of biochemical constraints -- of the encoded data for all the different methods.
We introduce Sketch-based Video Object Localization (SVOL), a new task aimed at localizing spatio-temporal object boxes in video queried by the input sketch. We first outline the challenges in the SVOL task and build the Sketch-Video Attention Network (SVANet) with the following design principles: (i) to consider temporal information of video and bridge the domain gap between sketch and video; (ii) to accurately identify and localize multiple objects simultaneously; (iii) to handle various styles of sketches; (iv) to be classification-free. In particular, SVANet is equipped with a Cross-modal Transformer that models the interaction between learnable object tokens, query sketch, and video through attention operations, and learns upon a per-frame set macthing strategy that enables frame-wise prediction while utilizing global video context. We evaluate SVANet on a newly curated SVOL dataset. By design, SVANet successfully learns the mapping between the query sketch and video objects, achieving state-of-the-art results on the SVOL benchmark. We further confirm the effectiveness of SVANet via extensive ablation studies and visualizations. Lastly, we demonstrate its zero-shot capability on unseen datasets and novel categories, suggesting its high scalability in real-world applications.
Existing audio analysis methods generally first transform the audio stream to spectrogram, and then feed it into CNN for further analysis. A standard CNN recognizes specific visual patterns over feature map, then pools for high-level representation, which overlooks the positional information of recognized patterns. However, unlike natural image, the semantic of an audio spectrogram is sensitive to positional change, as its vertical and horizontal axes indicate the frequency and temporal information of the audio, instead of naive rectangular coordinates. Thus, the insensitivity of CNN to positional change plays a negative role on audio spectrogram encoding. To address this issue, this paper proposes a new self-supervised learning mechanism, which enhances the audio representation by first generating adversarial samples (\textit{i.e.}, negative samples), then driving CNN to distinguish the embeddings of negative pairs in the latent space. Extensive experiments show that the proposed approach achieves best or competitive results on 9 downstream datasets compared with previous methods, which verifies its effectiveness on audio representation learning.
Human affective behavior analysis plays a vital role in human-computer interaction (HCI) systems. In this paper, we introduce our submission to the CVPR 2023 Competition on Affective Behavior Analysis in-the-wild (ABAW). We propose a single-stage trained AU detection framework. Specifically, in order to effectively extract facial local region features related to AU detection, we use a local region perception module to effectively extract features of different AUs. Meanwhile, we use a graph neural network-based relational learning module to capture the relationship between AUs. In addition, considering the role of the overall feature of the target face on AU detection, we also use the feature fusion module to fuse the feature information extracted by the backbone network and the AU feature information extracted by the relationship learning module. We also adopted some sampling methods, data augmentation techniques and post-processing strategies to further improve the performance of the model.
Today, people use email services such as Gmail, Outlook, AOL Mail, etc. to communicate with each other as quickly as possible to send information and official letters. Spam or junk mail is a major challenge to this type of communication, usually sent by botnets with the aim of advertising, harming and stealing information in bulk to different people. Receiving unwanted spam emails on a daily basis fills up the inbox folder. Therefore, spam detection is a fundamental challenge, so far many works have been done to detect spam using clustering and text categorisation methods. In this article, the author has used the spaCy natural language processing library and 3 machine learning (ML) algorithms Naive Bayes (NB), Decision Tree C45 and Multilayer Perceptron (MLP) in the Python programming language to detect spam emails collected from the Gmail service. Observations show the accuracy rate (96%) of the Multilayer Perceptron (MLP) algorithm in spam detection.