Contrastive learning has gained widespread adoption for retrieval tasks due to its minimal requirement for manual annotations. However, popular contrastive frameworks typically learn from binary relevance, making them ineffective at incorporating direct fine-grained rankings. In this paper, we curate a large-scale dataset featuring detailed relevance scores for each query-document pair to facilitate future research and evaluation. Subsequently, we propose Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking (GCL), which is designed to learn from fine-grained rankings beyond binary relevance scores. Our results show that GCL achieves a 94.5% increase in NDCG@10 for in-domain and 26.3 to 48.8% increases for cold-start evaluations, all relative to the CLIP baseline and involving ground truth rankings.
Active communication between robots and humans is essential for effective human-robot interaction. To accomplish this objective, Cloud Robotics (CR) was introduced to make robots enhance their capabilities. It enables robots to perform extensive computations in the cloud by sharing their outcomes. Outcomes include maps, images, processing power, data, activities, and other robot resources. But due to the colossal growth of data and traffic, CR suffers from serious latency issues. Therefore, it is unlikely to scale a large number of robots particularly in human-robot interaction scenarios, where responsiveness is paramount. Furthermore, other issues related to security such as privacy breaches and ransomware attacks can increase. To address these problems, in this paper, we have envisioned the next generation of social robotic architectures based on Fog Robotics (FR) that inherits the strengths of Fog Computing to augment the future social robotic systems. These new architectures can escalate the dexterity of robots by shoving the data closer to the robot. Additionally, they can ensure that human-robot interaction is more responsive by resolving the problems of CR. Moreover, experimental results are further discussed by considering a scenario of FR and latency as a primary factor comparing to CR models.