With the recent advances of conversational recommendations, the recommender system is able to actively and dynamically elicit user preference via conversational interactions. To achieve this, the system periodically queries users' preference on attributes and collects their feedback. However, most existing conversational recommender systems only enable the user to provide absolute feedback to the attributes. In practice, the absolute feedback is usually limited, as the users tend to provide biased feedback when expressing the preference. Instead, the user is often more inclined to express comparative preferences, since user preferences are inherently relative. To enable users to provide comparative preferences during conversational interactions, we propose a novel comparison-based conversational recommender system. The relative feedback, though more practical, is not easy to be incorporated since its feedback scale is always mismatched with users' absolute preferences. With effectively collecting and understanding the relative feedback from an interactive manner, we further propose a new bandit algorithm, which we call RelativeConUCB. The experiments on both synthetic and real-world datasets validate the advantage of our proposed method, compared to the existing bandit algorithms in the conversational recommender systems.
In pace with the electronic technology development and the production technology improvement, industrial robot Give Scope to the Advantage in social services and industrial production. However, due to long-term mechanical wear and structural deformation, the absolute positioning accuracy is low, which greatly hinders the development of manufacturing industry. Calibrating the kinematic parameters of the robot is an effective way to address it. However, the main measuring equipment such as laser trackers and coordinate measuring machines are expensive and need special personnel to operate. Additionally, in the measurement process, due to the influence of many environmental factors, measurement noises are generated, which will affect the calibration accuracy of the robot. Basing on these, we have done the following work: a) developing a robot calibration method based on plane constraint to simplify measurement steps; b) employing Square-root Culture Kalman Filter (SCKF) algorithm for reducing the influence of measurement noises; c) proposing a novel algorithm for identifying kinematic parameters based on SCKF algorithm and Levenberg Marquardt (LM) algorithm to achieve the high calibration accuracy; d) adopting the dial indicator as the measuring equipment for slashing costs. The enough experiments verify the effectiveness of the proposed calibration algorithm and experimental platform.
Industrial robots play a vital role in automatic production, which have been widely utilized in industrial production activities, like handling and welding. However, due to an uncalibrated robot with machining tolerance and assembly tolerance, it suffers from low absolute positioning accuracy, which cannot satisfy the requirements of high-precision manufacture. To address this hot issue, we propose a novel calibration method based on an unscented Kalman filter and variable step-size Levenberg-Marquardt algorithm. This work has three ideas: a) proposing a novel variable step-size Levenberg-Marquardt algorithm to addresses the issue of local optimum in a Levenberg-Marquardt algorithm; b) employing an unscented Kalman filter to reduce the influence of the measurement noises; and c) developing a novel calibration method incorporating an unscented Kalman filter with a variable step-size Levenberg-Marquardt algorithm. Furthermore, we conduct enough experiments on an ABB IRB 120 industrial robot. From the experimental results, the proposed method achieves much higher calibration accuracy than some state-of-the-art calibration methods. Hence, this work is an important milestone in the field of robot calibration.
A novel meta-heuristic algorithm, Egret Swarm Optimization Algorithm (ESOA), is proposed in this paper, which is inspired by two egret species' (Great Egret and Snowy Egret) hunting behavior. ESOA consists of three primary components: Sit-And-Wait Strategy, Aggressive Strategy as well as Discriminant Conditions. The performance of ESOA on 36 benchmark functions as well as 2 engineering problems are compared with Particle Swarm Optimization (PSO), Genetic Algorithm (GA), Differential Evolution (DE), Grey Wolf Optimizer (GWO), and Harris Hawks Optimization (HHO). The result proves the superior effectiveness and robustness of ESOA. The source code used in this work can be retrieved from https://github.com/Knightsll/Egret_Swarm_Optimization_Algorithm; https://ww2.mathworks.cn/matlabcentral/fileexchange/115595-egret-swarm-optimization-algorithm-esoa.
Online learning to rank (OLTR) interactively learns to choose lists of items from a large collection based on certain click models that describe users' click behaviors. Most recent works for this problem focus on the stochastic environment where the item attractiveness is assumed to be invariant during the learning process. In many real-world scenarios, however, the environment could be dynamic or even arbitrarily changing. This work studies the OLTR problem in both stochastic and adversarial environments under the position-based model (PBM). We propose a method based on the follow-the-regularized-leader (FTRL) framework with Tsallis entropy and develop a new self-bounding constraint especially designed for PBM. We prove the proposed algorithm simultaneously achieves $O(\log{T})$ regret in the stochastic environment and $O(m\sqrt{nT})$ regret in the adversarial environment, where $T$ is the number of rounds, $n$ is the number of items and $m$ is the number of positions. We also provide a lower bound of order $\Omega(m\sqrt{nT})$ for adversarial PBM, which matches our upper bound and improves over the state-of-the-art lower bound. The experiments show that our algorithm could simultaneously learn in both stochastic and adversarial environments and is competitive compared to existing methods that are designed for a single environment.
To better exploit search logs and model users' behavior patterns, numerous click models are proposed to extract users' implicit interaction feedback. Most traditional click models are based on the probabilistic graphical model (PGM) framework, which requires manually designed dependencies and may oversimplify user behaviors. Recently, methods based on neural networks are proposed to improve the prediction accuracy of user behaviors by enhancing the expressive ability and allowing flexible dependencies. However, they still suffer from the data sparsity and cold-start problems. In this paper, we propose a novel graph-enhanced click model (GraphCM) for web search. Firstly, we regard each query or document as a vertex, and propose novel homogeneous graph construction methods for queries and documents respectively, to fully exploit both intra-session and inter-session information for the sparsity and cold-start problems. Secondly, following the examination hypothesis, we separately model the attractiveness estimator and examination predictor to output the attractiveness scores and examination probabilities, where graph neural networks and neighbor interaction techniques are applied to extract the auxiliary information encoded in the pre-constructed homogeneous graphs. Finally, we apply combination functions to integrate examination probabilities and attractiveness scores into click predictions. Extensive experiments conducted on three real-world session datasets show that GraphCM not only outperforms the state-of-art models, but also achieves superior performance in addressing the data sparsity and cold-start problems.
The problem of online learning with graph feedback has been extensively studied in the literature due to its generality and potential to model various learning tasks. Existing works mainly study the adversarial and stochastic feedback separately. If the prior knowledge of the feedback mechanism is unavailable or wrong, such specially designed algorithms could suffer great loss. To avoid this problem, \citet{erez2021towards} try to optimize for both environments. However, they assume the feedback graphs are undirected and each vertex has a self-loop, which compromises the generality of the framework and may not be satisfied in applications. With a general feedback graph, the observation of an arm may not be available when this arm is pulled, which makes the exploration more expensive and the algorithms more challenging to perform optimally in both environments. In this work, we overcome this difficulty by a new trade-off mechanism with a carefully-designed proportion for exploration and exploitation. We prove the proposed algorithm simultaneously achieves $\mathrm{poly} \log T$ regret in the stochastic setting and minimax-optimal regret of $\tilde{O}(T^{2/3})$ in the adversarial setting where $T$ is the horizon and $\tilde{O}$ hides parameters independent of $T$ as well as logarithmic terms. To our knowledge, this is the first best-of-both-worlds result for general feedback graphs.
The problem of two-sided matching markets has a wide range of real-world applications and has been extensively studied in the literature. A line of recent works have focused on the problem setting where the preferences of one-side market participants are unknown \emph{a priori} and are learned by iteratively interacting with the other side of participants. All these works are based on explore-then-commit (ETC) and upper confidence bound (UCB) algorithms, two common strategies in multi-armed bandits (MAB). Thompson sampling (TS) is another popular approach, which attracts lots of attention due to its easier implementation and better empirical performances. In many problems, even when UCB and ETC-type algorithms have already been analyzed, researchers are still trying to study TS for its benefits. However, the convergence analysis of TS is much more challenging and remains open in many problem settings. In this paper, we provide the first regret analysis for TS in the new setting of iterative matching markets. Extensive experiments demonstrate the practical advantages of the TS-type algorithm over the ETC and UCB-type baselines.
Over the past decades, industrial manipulators play a vital role in in various fields, like aircraft manufacturing and automobile manufacturing. However, an industrial manipulator without calibration suffers from its low absolute positioning accuracy, which extensively restricts its application in high-precision intelligent manufacture. Recent manipulator calibration methods are developed to address this issue, while they frequently encounter long-tail convergence and low calibration accuracy. To address this thorny issue, this work proposes a novel manipulator calibration method incorporating an extended Kalman filter with a Quadratic Interpolated Beetle Antennae Search algorithm. This paper has three-fold ideas: a) proposing a new Quadratic Interpolated Beetle Antennae Search algorithm to deal with the issue of local optimum and low convergence rate in a Beetle Antennae Search algorithm; b) adopting an extended Kalman filter algorithm to suppress non-Gaussian noises and c) developing a new manipulator calibration method incorporating an extended Kalman filter with a Quadratic Interpolated Beetle Antennae Search algorithm to calibrating a manipulator. Extensively experimental results on an ABB IRB120 industrial manipulator demonstrate that the proposed method achieves much higher calibration accuracy than several state-of-the-art calibration methods.