Abstract:Smoke segmentation is critical for wildfire management and industrial safety applications. Traditional visible-light-based methods face limitations due to insufficient spectral information, particularly struggling with cloud interference and semi-transparent smoke regions. To address these challenges, we introduce hyperspectral imaging for smoke segmentation and present the first hyperspectral smoke segmentation dataset (HSSDataset) with carefully annotated samples collected from over 18,000 frames across 20 real-world scenarios using a Many-to-One annotations protocol. However, different spectral bands exhibit varying discriminative capabilities across spatial regions, necessitating adaptive band weighting strategies. We decompose this into three technical challenges: spectral interaction contamination, limited spectral pattern modeling, and complex weighting router problems. We propose a mixture of prototypes (MoP) network with: (1) Band split for spectral isolation, (2) Prototype-based spectral representation for diverse patterns, and (3) Dual-level router for adaptive spatial-aware band weighting. We further construct a multispectral dataset (MSSDataset) with RGB-infrared images. Extensive experiments validate superior performance across both hyperspectral and multispectral modalities, establishing a new paradigm for spectral-based smoke segmentation.
Abstract:This paper introduces MixTex, an end-to-end LaTeX OCR model designed for low-bias multilingual recognition, along with its novel data collection method. In applying Transformer architectures to LaTeX text recognition, we identified specific bias issues, such as the frequent misinterpretation of $e-t$ as $e^{-t}$. We attribute this bias to the characteristics of the arXiv dataset commonly used for training. To mitigate this bias, we propose an innovative data augmentation method. This approach introduces controlled noise into the recognition targets by blending genuine text with pseudo-text and incorporating a small proportion of disruptive characters. We further suggest that this method has broader applicability to various disambiguation recognition tasks, including the accurate identification of erroneous notes in musical performances. MixTex's architecture leverages the Swin Transformer as its encoder and RoBERTa as its decoder. Our experimental results demonstrate that this approach significantly reduces bias in recognition tasks. Notably, when processing clear and unambiguous images, the model adheres strictly to the image rather than over-relying on contextual cues for token prediction.
Abstract:In LaTeX text recognition using Transformer-based architectures, this paper identifies certain "bias" issues. For instance, $e-t$ is frequently misrecognized as $e^{-t}$. This bias stems from the inherent characteristics of the dataset. To mitigate this bias, we propose a LaTeX printed text recognition model trained on a mixed dataset of pseudo-formulas and pseudo-text. The model employs a Swin Transformer as the encoder and a RoBERTa model as the decoder. Experimental results demonstrate that this approach reduces "bias", enhancing the accuracy and robustness of text recognition. For clear images, the model strictly adheres to the image content; for blurred images, it integrates both image and contextual information to produce reasonable recognition results.
Abstract:In the process of projecting the surface of a three-dimensional object onto a two-dimensional surface, due to the perspective distortion, the image on the surface of the object will have different degrees of distortion according to the level of the surface curvature. This paper presents an imprecise method for flattening this type of distortion on the surface of a regularly curved body. The main idea of this method is to roughly estimate the gridded surface subdivision that can be used to describe the surface of the three-dimensional object through the contour curve of the two-dimensional image of the object. Then, take each grid block with different sizes and shapes inversely transformed into a rectangle with exactly the same shape and size. Finally, each of the same rectangles is splicing and recombining in turn to obtain a roughly flat rectangle. This paper will introduce and show the specific process and results of using this method to solve the problem of bending page flattening, then demonstrate the feasibility and limitations of this method.