The number of biomedical image analysis challenges organized per year is steadily increasing. These international competitions have the purpose of benchmarking algorithms on common data sets, typically to identify the best method for a given problem. Recent research, however, revealed that common practice related to challenge reporting does not allow for adequate interpretation and reproducibility of results. To address the discrepancy between the impact of challenges and the quality (control), the Biomedical I mage Analysis ChallengeS (BIAS) initiative developed a set of recommendations for the reporting of challenges. The BIAS statement aims to improve the transparency of the reporting of a biomedical image analysis challenge regardless of field of application, image modality or task category assessed. This article describes how the BIAS statement was developed and presents a checklist which authors of biomedical image analysis challenges are encouraged to include in their submission when giving a paper on a challenge into review. The purpose of the checklist is to standardize and facilitate the review process and raise interpretability and reproducibility of challenge results by making relevant information explicit.
Graph neural networks (GNNs), consisting of a cascade of layers applying a graph convolution followed by a pointwise nonlinearity, have become a powerful architecture to process signals supported on graphs. Graph convolutions (and thus, GNNs), rely heavily on knowledge of the graph for operation. However, in many practical cases the GSO is not known and needs to be estimated, or might change from training time to testing time. In this paper, we are set to study the effect that a change in the underlying graph topology that supports the signal has on the output of a GNN. We prove that graph convolutions with integral Lipschitz filters lead to GNNs whose output change is bounded by the size of the relative change in the topology. Furthermore, we leverage this result to show that the main reason for the success of GNNs is that they are stable architectures capable of discriminating features on high eigenvalues, which is a feat that cannot be achieved by linear graph filters (which are either stable or discriminative, but cannot be both). Finally, we comment on the use of this result to train GNNs with increased stability and run experiments on movie recommendation systems.
Forestry is a major industry in many parts of the world. It relies on forest inventory, which consists of measuring tree attributes. We propose to use 3D mapping, based on the iterative closest point algorithm, to automatically measure tree diameters in forests from mobile robot observations. While previous studies showed the potential for such technology, they lacked a rigorous analysis of diameter estimation methods in challenging forest environments. Here, we validated multiple diameter estimation methods, including two novel ones, in a new varied dataset of four different forest sites, 11 trajectories, totaling 1458 tree observations and 1.4 hectares. We provide recommendations for the deployment of mobile robots in a forestry context. We conclude that our mapping method is usable in the context of automated forest inventory, with our best method yielding a root mean square error of 3.45 cm for our whole dataset, and 2.04 cm in ideal conditions consisting of mature forest with well spaced trees.
Tagging news articles or blog posts with relevant tags from a collection of predefined ones is coined as document tagging in this work. Accurate tagging of articles can benefit several downstream applications such as recommendation and search. In this work, we propose a novel yet simple approach called DocTag2Vec to accomplish this task. We substantially extend Word2Vec and Doc2Vec---two popular models for learning distributed representation of words and documents. In DocTag2Vec, we simultaneously learn the representation of words, documents, and tags in a joint vector space during training, and employ the simple $k$-nearest neighbor search to predict tags for unseen documents. In contrast to previous multi-label learning methods, DocTag2Vec directly deals with raw text instead of provided feature vector, and in addition, enjoys advantages like the learning of tag representation, and the ability of handling newly created tags. To demonstrate the effectiveness of our approach, we conduct experiments on several datasets and show promising results against state-of-the-art methods.
This article describes a data driven method for deriving the relationship between personality and media preferences. A qunatifiable representation of such a relationship can be leveraged for use in recommendation systems and ameliorate the "cold start" problem. Here, the data is comprised of an original collection of 1,316 Okcupid dating profiles. Of these profiles, 800 are labeled with one of 16 possible Myers-Briggs Type Indicators (MBTI). A personality specific topic model describing a person's favorite books, movies, shows, music, and food was generated using latent Dirichlet allocation (LDA). There were several significant findings, for example, intuitive thinking types preferred sci-fi/fantasy entertainment, extraversion correlated positively with upbeat dance music, and jazz, folk, and international cuisine correlated positively with those characterized by openness to experience. Many other correlations confirmed previous findings describing the relationship among personality, writing style, and personal preferences. (For complete word/personality type assocations see the Appendix).
One topic that is likely to attract an increasing amount of attention within the Knowledge-base systems research community is the coordination of information provided by multiple experts. We envision a situation in which several experts independently encode information as belief networks. A potential user must then coordinate the conclusions and recommendations of these networks to derive some sort of consensus. One approach to such a consensus is the fusion of the contributed networks into a single, consensus model prior to the consideration of any case-specific data (specific observations, test results). This approach requires two types of combination procedures, one for probabilities, and one for graphs. Since the combination of probabilities is relatively well understood, the key barriers to this approach lie in the realm of graph theory. This paper provides formal definitions of some of the operations necessary to effect the necessary graphical combinations, and provides complexity analyses of these procedures. The paper's key result is that most of these operations are NP-hard, and its primary message is that the derivation of ?good? consensus networks must be done heuristically.
Target-guided response generation enables dialogue systems to smoothly transition a conversation from a dialogue context toward a target sentence. Such control is useful for designing dialogue systems that direct a conversation toward specific goals, such as creating non-obtrusive recommendations or introducing new topics in the conversation. In this paper, we introduce a new technique for target-guided response generation, which first finds a bridging path of commonsense knowledge concepts between the source and the target, and then uses the identified bridging path to generate transition responses. Additionally, we propose techniques to re-purpose existing dialogue datasets for target-guided generation. Experiments reveal that the proposed techniques outperform various baselines on this task. Finally, we observe that the existing automated metrics for this task correlate poorly with human judgement ratings. We propose a novel evaluation metric that we demonstrate is more reliable for target-guided response evaluation. Our work generally enables dialogue system designers to exercise more control over the conversations that their systems produce.
False assumptions about sex and gender are deeply embedded in the medical system, including that they are binary, static, and concordant. Machine learning researchers must understand the nature of these assumptions in order to avoid perpetuating them. In this perspectives piece, we identify three common mistakes that researchers make when dealing with sex/gender data: "sex confusion", the failure to identity what sex in a dataset does or doesn't mean; "sex obsession", the belief that sex, specifically sex assigned at birth, is the relevant variable for most applications; and "sex/gender slippage", the conflation of sex and gender even in contexts where only one or the other is known. We then discuss how these pitfalls show up in machine learning studies based on electronic health record data, which is commonly used for everything from retrospective analysis of patient outcomes to the development of algorithms to predict risk and administer care. Finally, we offer a series of recommendations about how machine learning researchers can produce both research and algorithms that more carefully engage with questions of sex/gender, better serving all patients, including transgender people.
The Living Labs for Academic Search (LiLAS) lab aims to strengthen the concept of user-centric living labs for academic search. The methodological gap between real-world and lab-based evaluation should be bridged by allowing lab participants to evaluate their retrieval approaches in two real-world academic search systems from life sciences and social sciences. This overview paper outlines the two academic search systems LIVIVO and GESIS Search, and their corresponding tasks within LiLAS, which are ad-hoc retrieval and dataset recommendation. The lab is based on a new evaluation infrastructure named STELLA that allows participants to submit results corresponding to their experimental systems in the form of pre-computed runs and Docker containers that can be integrated into production systems and generate experimental results in real-time. Both submission types are interleaved with the results provided by the productive systems allowing for a seamless presentation and evaluation. The evaluation of results and a meta-analysis of the different tasks and submission types complement this overview.