In the rapidly evolving landscape of retail, assortment planning plays a crucial role in determining the success of a business. With the rise of sponsored products and their increasing prominence in online marketplaces, retailers face new challenges in effectively managing their product assortment in the presence of sponsored products. Remarkably, previous research in assortment planning largely overlooks the existence of sponsored products and their potential impact on overall recommendation effectiveness. Instead, they commonly make the simplifying assumption that all products are either organic or non-sponsored. This research gap underscores the necessity for a more thorough investigation of the assortment planning challenge when sponsored products are in play. We formulate the assortment planning problem in the presence of sponsored products as a combinatorial optimization task. The ultimate objective is to compute an assortment plan that optimizes expected revenue while considering the specific requirements of placing sponsored products strategically.
In this paper, we study a fundamental problem in submodular optimization, which is called sequential submodular maximization. Specifically, we aim to select and rank a group of $k$ items from a ground set $V$ such that the weighted summation of $k$ (possibly non-monotone) submodular functions $f_1, \cdots ,f_k: 2^V \rightarrow \mathbb{R}^+$ is maximized, here each function $f_j$ takes the first $j$ items from this sequence as input. The existing research on sequential submodular maximization has predominantly concentrated on the monotone setting, assuming that the submodular functions are non-decreasing. However, in various real-world scenarios, like diversity-aware recommendation systems, adding items to an existing set might negatively impact the overall utility. In response, this paper pioneers the examination of the aforementioned problem with non-monotone submodular functions and offers effective solutions for both flexible and fixed length constraints, as well as a special case with identical utility functions. The empirical evaluations further validate the effectiveness of our proposed algorithms in the domain of video recommendations. The results of this research have implications in various fields, including recommendation systems and assortment optimization, where the ordering of items significantly impacts the overall value obtained.
Tables are prevalent in real-world databases, requiring significant time and effort for humans to analyze and manipulate. The advancements in large language models (LLMs) have made it possible to interact with tables using natural language input, bringing this capability closer to reality. In this paper, we present TableGPT, a unified fine-tuned framework that enables LLMs to understand and operate on tables using external functional commands. It introduces the capability to seamlessly interact with tables, enabling a wide range of functionalities such as question answering, data manipulation (e.g., insert, delete, query, and modify operations), data visualization, analysis report generation, and automated prediction. TableGPT aims to provide convenience and accessibility to users by empowering them to effortlessly leverage tabular data. At the core of TableGPT lies the novel concept of global tabular representations, which empowers LLMs to gain a comprehensive understanding of the entire table beyond meta-information. By jointly training LLMs on both table and text modalities, TableGPT achieves a deep understanding of tabular data and the ability to perform complex operations on tables through chain-of-command instructions. Importantly, TableGPT offers the advantage of being a self-contained system rather than relying on external API interfaces. Moreover, it supports efficient data process flow, query rejection (when appropriate) and private deployment, enabling faster domain data fine-tuning and ensuring data privacy, which enhances the framework's adaptability to specific use cases.
Machine learning algorithms play an important role in a variety of important decision-making processes, including targeted advertisement displays, home loan approvals, and criminal behavior predictions. Given the far-reaching impact of these algorithms, it is crucial that they operate fairly, free from bias or prejudice towards certain groups in the population. Ensuring impartiality in these algorithms is essential for promoting equality and avoiding discrimination. To this end we introduce a unified framework for randomized subset selection that incorporates group fairness constraints. Our problem involves a global utility function and a set of group utility functions for each group, here a group refers to a group of individuals (e.g., people) sharing the same attributes (e.g., gender). Our aim is to generate a distribution across feasible subsets, specifying the selection probability of each feasible set, to maximize the global utility function while meeting a predetermined quota for each group utility function in expectation. Note that there may not necessarily be any direct connections between the global utility function and each group utility function. We demonstrate that this framework unifies and generalizes many significant applications in machine learning and operations research. Our algorithmic results either improves the best known result or provide the first approximation algorithms for new applications.
Submodular function optimization has numerous applications in machine learning and data analysis, including data summarization which aims to identify a concise and diverse set of data points from a large dataset. It is important to implement fairness-aware algorithms when dealing with data items that may contain sensitive attributes like race or gender, to prevent biases that could lead to unequal representation of different groups. With this in mind, we investigate the problem of maximizing a monotone submodular function while meeting group fairness constraints. Unlike previous studies in this area, we allow for randomized solutions, with the objective being to calculate a distribution over feasible sets such that the expected number of items selected from each group is subject to constraints in the form of upper and lower thresholds, ensuring that the representation of each group remains balanced in the long term. Here a set is considered feasible if its size does not exceed a constant value of $b$. Our research includes the development of a series of approximation algorithms for this problem.
Maximizing a submodular function has a wide range of applications in machine learning and data mining. One such application is data summarization whose goal is to select a small set of representative and diverse data items from a large dataset. However, data items might have sensitive attributes such as race or gender, in this setting, it is important to design \emph{fairness-aware} algorithms to mitigate potential algorithmic bias that may cause over- or under- representation of particular groups. Motivated by that, we propose and study the classic non-monotone submodular maximization problem subject to novel group fairness constraints. Our goal is to select a set of items that maximizes a non-monotone submodular function, while ensuring that the number of selected items from each group is proportionate to its size, to the extent specified by the decision maker. We develop the first constant-factor approximation algorithms for this problem. We also extend the basic model to incorporate an additional global size constraint on the total number of selected items.
Polyp segmentation is a crucial step towards computer-aided diagnosis of colorectal cancer. However, most of the polyp segmentation methods require pixel-wise annotated datasets. Annotated datasets are tedious and time-consuming to produce, especially for physicians who must dedicate their time to their patients. We tackle this issue by proposing a novel framework that can be trained using only weakly annotated images along with exploiting unlabeled images. To this end, we propose three ideas to address this problem, more specifically our contributions are: 1) a novel sparse foreground loss that suppresses false positives and improves weakly-supervised training, 2) a batch-wise weighted consistency loss utilizing predicted segmentation maps from identical networks trained using different initialization during semi-supervised training, 3) a deformable transformer encoder neck for feature enhancement by fusing information across levels and flexible spatial locations. Extensive experimental results demonstrate the merits of our ideas on five challenging datasets outperforming some state-of-the-art fully supervised models. Also, our framework can be utilized to fine-tune models trained on natural image segmentation datasets drastically improving their performance for polyp segmentation and impressively demonstrating superior performance to fully supervised fine-tuning.
In this paper, we study the adaptive submodular cover problem under the worst-case setting. This problem generalizes many previously studied problems, namely, the pool-based active learning and the stochastic submodular set cover. The input of our problem is a set of items (e.g., medical tests) and each item has a random state (e.g., the outcome of a medical test), whose realization is initially unknown. One must select an item at a fixed cost in order to observe its realization. There is an utility function which is defined over items and their states. Our goal is to sequentially select a group of items to achieve a ``goal value'' while minimizing the maximum cost across realizations (a.k.a. worst-case cost). To facilitate our study, we introduce a broad class of stochastic functions, called \emph{worst-case submodular function}. Assume the utility function is worst-case submodular, we develop a tight $(\log (Q/\eta)+1)$-approximation policy, where $Q$ is the ``goal value'' and $\eta$ is the minimum gap between $Q$ and any attainable utility value $\hat{Q}<Q$. We also study a worst-case maximum-coverage problem, whose goal is to select a group of items to maximize its worst-case utility subject to a budget constraint. This is a flipped problem of the minimum-cost-cover problem, and to solve this problem, we develop a $(1-1/e)/2$-approximation solution.
The non-terrestrial networks (NTNs) are recognized as a key component to provide cost-effective and high-capacity ubiquitous connectivity in the future wireless communications. In this paper, we investigate the secure transmission in a terahertz (THz)-empowered reconfigurable intelligent surface (RIS)-assisted NTN (T-RANTN), which is composed of a low-Earth orbit satellite transmitter, an RIS-installed high-altitude platform (HAP) and two unmanned aerial vehicle (UAV) receivers, only one of which is trustworthy. An approximate ergodic secrecy rate (ESR) expression is derived when the atmosphere turbulence and pointing error due to the characteristics of THz as well as the phase errors resulting from finite precision of RIS and imperfect channel estimation are taken into account simultaneously. Furthermore, according to the statistical and perfect channel state information of the untrustworthy receiver, we optimize the phase shifts of RIS to maximize the lower bound of secrecy rate (SR) and instantaneous SR, respectively, by using semidefinite relaxation method. Simulation results show that both the approximate expression for the ESR and the optimization algorithms are serviceable, and even when the jitter standard variance of the trustworthy receiver is greater than that of the untrustworthy one, a positive SR can still be guaranteed.