This report aims to report my thesis progress so far. My work attempts to show the differences in the perspectives of two search engines, Bing and Google on several selected controversial topics. In this work, we try to make a distinction on the viewpoints of Bing \& Google by using sentiment as well as the ranking of the document returned from these two search engines on the same queries, these queries are related mainly to controversial topics. You can find the methods we used with experimental results below.
Students are increasingly using online materials to learn new subjects or to supplement their learning process in educational institutions. Issues regarding gender bias have been raised in the context of formal education and some measures have been proposed to mitigate them. In our previous work, we investigate the perceived gender bias in YouTube using manually annotations for detecting the narrators' perceived gender in educational videos. In this work, our goal is to evaluate the perceived gender bias in online education by exploiting an automated annotations. The automated pipeline has already proposed in a recent paper, thus in this paper we only share our empirical results with important findings. Our results show that educational videos are biased towards the male and STEM-related videos are more biased than their NON-STEM counterparts.
Students are increasingly using online materials to learn new subjects or to supplement their learning process in educational institutions. Issues regarding gender bias have been raised in the context of formal education and some measures have been proposed to mitigate them. However, online educational materials in terms of possible gender bias and stereotypes which may appear in different forms are yet to be investigated in the context of search bias in a widely-used search platform. As a first step towards measuring possible gender bias in online platforms, we have investigated YouTube educational videos in terms of the perceived gender of their narrators. We adopted bias measures for ranked search results to evaluate educational videos returned by YouTube in response to queries related to STEM (Science, Technology, Engineering, and Mathematics) and NON-STEM fields of education. For this, we propose automated pipeline to annotate narrators' perceived gender in YouTube videos for analysing perceived gender bias in online education.
This work first presents our attempts to establish an automated model using state-of-the-art approaches for analysing bias in search results of Bing and Google. Secondly, in this paper we also aim to analyse YouTube video search results in terms of perceived gender bias, i.e. narrator's gender from the viewer's perspective. Experimental results indicate that the current class-wise F1-scores of our best model are not sufficient to establish an automated model for bias analysis. Thus, to evaluate YouTube video search results in terms of perceived gender bias, we use manual annotations.
Search bias analysis is getting more attention in recent years since search results could affect In this work, we aim to establish an automated model for evaluating ideological bias in online news articles. The dataset is composed of news articles in search results as well as the newspaper articles. The current automated model results show that model capability is not sufficient to be exploited for annotating the documents automatically, thereby computing bias in search results.
In this work, we aim to investigate the impact of location (different countries) on bias in search results. For this, we use the search results of Google and Bing in the UK and US locations. The query set is composed of controversial queries obtained from ProCon.org that have specific ideological leanings as conservative or liberal. In a previous work, researchers analyse search results in terms of stance and ideological bias with rank and relevance based measures. Yet, in the scope of this work, by using the query subset of controversial queries we examine the effect of location on the existence of bias as well as the magnitude of bias difference between Bing and Google. Note that this study follows a similar evaluation procedure. Our preliminary results show that location might affect the retrieval performance of search engines as well as the bias in the search results returned by Bing and Google towards the controversial queries.
Students are increasingly using online materials to learn new subjects or to supplement their learning process in educational institutions. Issues regarding gender bias have been raised in the context of formal education and some measures have been proposed to mitigate them. However, online educational materials in terms of possible gender bias and stereotypes which may appear in different forms are yet to be investigated in the context of search bias in a widely-used search platform. As a first step towards measuring possible gender bias in online platforms, we have investigated YouTube educational videos in terms of the perceived gender of their narrators. We adopted bias measures for ranked search results to evaluate educational videos returned by YouTube in response to queries related to STEM (Science, Technology, Engineering, and Mathematics) and NON-STEM fields of education. Gender is a research area by itself in social sciences which is beyond the scope of this work. In this respect, for annotating the perceived gender of the narrator of an instructional video we used only a crude classification of gender into Male, and Female. Then, for analysing perceived gender bias we utilised bias measures that have been inspired by search platforms and further incorporated rank information into our analysis. Our preliminary results demonstrate that there is a significant bias towards the male gender on the returned YouTube educational videos, and the degree of bias varies when we compare STEM and NON-STEM queries. Finally, there is a strong evidence that rank information might affect the results.
Search engines play an essential role in our daily lives. Nonetheless, they are also very crucial in enterprise domain to access documents from various information sources. Since traditional search systems index the documents mainly by looking at the frequency of the occurring words in these documents, they are barely able to support natural language search, but rather keyword search. It seems that keyword based search will not be sufficient for enterprise data which is growing extremely fast. Thus, enterprise search becomes increasingly critical in corporate domain. In this report, we present an overview of the state-of-the-art technologies in literature for three main purposes: i) to increase the retrieval performance of a search engine, ii) to deploy a search platform to a cloud environment, and iii) to select the best terms in expanding queries for achieving even a higher retrieval performance as well as to provide good query suggestions to its users for a better user experience.
Creating alternative queries, also known as query suggestion, has been proved to be helpful on improving users' search experience. Owing to the suggestions, users could retrieve their information need more quickly and accurately. In many scenarios, these suggestions could be generated from the click-through logs by establishing a bipartite graph of the clicked query-document pairs. Most of the existing methods focused on click-existing queries which possess clicked information in the search logs, to suggest related queries using the co-clicked documents. In this paper, we propose a simple yet effective query suggestion method particularly for click-absent queries by ensuring semantic consistency without utilising any additional resources. Our experimental results show that the proposed technique generates comparatively good suggestions for click-absent queries on a real bilingual enterprise search log.
Search engines can be considered as a gate to the world of WEB, and they also decide what we see for a given search query. Since many people are exposed to information through search engines, it is fair to expect that search engines should be neutral; i.e. the returned results must cover all the elements or aspects of the search topic, and they should be impartial where the results are returned based on relevance. However, the search engine results are based on many features and sophisticated algorithms where search neutrality is not necessarily the focal point. In this work we performed an empirical study on two popular search engines and analysed the search engine result pages for controversial topics such as abortion, medical marijuana, and gay marriage. Our analysis is based on the sentiment in search results to identify their viewpoint as conservative or liberal. We also propose three sentiment-based metrics to show the existence of bias as well as to compare viewpoints of the two search engines. Extensive experiments performed on controversial topics show that both search engines are biased, moreover they have the same kind of bias towards a given controversial topic.