Thoughtfully designing services and rigorously testing software to support personal information management (PIM) requires understanding the relevant collections, but relatively little is known about what people keep in their file collections, especially personal collections. Complementing recent work on the structure of 348 file collections, we examine those collections' contents, how much content is duplicated, and how collections used for personal matters differ from those used for study and work. Though all collections contain many images, some intuitively common file types are surprisingly scarce. Personal collections contain more audio than others, knowledge workers' collections contain more text documents but far fewer folders, and IT collections exhibit unusual traits. Collection duplication is correlated to collections' structural traits, but surprisingly, not to collection age. We discuss our findings in light of prior works and provide implications for various kinds of information research.
Here we describe two approaches to improve group information management (GIM) and draw on the results of prior works to implement them in software prototypes. The first aids browsing and retrieving from large and unfamiliar collections like shared drives by dynamically reducing and re-organising them. The second supports the transfer and re-use of collections (e.g. to/by successors, descendants, or curators) by integrating novel sorting and annotation features. The prototypes' source code is shared online and screenshots are presented in the accompanying poster.
Controlled topical vocabularies (CVs) are built into information systems to aid browsing and retrieval of items that may be unfamiliar, but it is unclear how this feature should be integrated with standard keyword searching. Few systems or scholarly prototypes have attempted this, and none have used the most widely used CV, the Library of Congress Subject Headings (LCSH), which organizes monograph collections in academic libraries throughout the world. This paper describes a working prototype of a Web application that concurrently allows topic exploration using an outline tree view of the LCSH hierarchy and natural language keyword searching of a real-world Science and Engineering bibliographic collection. Pilot testing shows the system is functional, and work to fit the complex LCSH structure into a usable hierarchy is ongoing. This study contributes to knowledge of the practical design decisions required when developing linked interactions between topical hierarchy browsing and natural language searching, which promise to facilitate information discovery and exploration.
Despite much discussion in HCI research about how individual differences likely determine computer users' personal information management (PIM) practices, the extent of the influence of several important factors remains unclear, including users' personalities, spatial abilities, and the different software used to manage their collections. We therefore analyse data from prior CHI work to explore (1) associations of people's file collections with personality and spatial ability, and (2) differences between collections managed with different operating systems and file managers. We find no notable associations between users' attributes and their collections, and minimal predictive power, but do find considerable and surprising differences across operating systems. We discuss these findings and how they can inform future research.
The widespread use of artificial intelligence (AI) in many domains has revealed numerous ethical issues from data and design to deployment. In response, countless broad principles and guidelines for ethical AI have been published, and following those, specific approaches have been proposed for how to encourage ethical outcomes of AI. Meanwhile, library and information services too are seeing an increase in the use of AI-powered and machine learning-powered information systems, but no practical guidance currently exists for libraries to plan for, evaluate, or audit the ethics of intended or deployed AI. We therefore report on several promising approaches for promoting ethical AI that can be adapted from other contexts to AI-powered information services and in different stages of the software lifecycle.
Computer users spend time every day interacting with digital files and folders, including downloading, moving, naming, navigating to, searching for, sharing, and deleting them. Such file management has been the focus of many studies across various fields, but has not been explicitly acknowledged nor made the focus of dedicated review. In this article we present the first dedicated review of this topic and its research, synthesizing more than 230 publications from various research domains to establish what is known and what remains to be investigated, particularly by examining the common motivations, methods, and findings evinced by the previously furcate body of work. We find three typical research motivations in the literature reviewed: understanding how and why users store, organize, retrieve, and share files and folders, understanding factors that determine their behavior, and attempting to improve the user experience through novel interfaces and information services. Relevant conceptual frameworks and approaches to designing and testing systems are described, and open research challenges and the significance for other research areas are discussed. We conclude that file management is a ubiquitous, challenging, and relatively unsupported activity that invites and has received attention from several disciplines and has broad importance for topics across information science.
AI language models trained on Web data generate prose that reflects human knowledge and public sentiments, but can also contain novel insights and predictions. We asked the world's best language model, GPT-3, fifteen difficult questions about the nature, value, and future of library and information science (LIS), topics that receive perennial attention from LIS scholars. We present highlights from its 45 different responses, which range from platitudes and caricatures to interesting perspectives and worrisome visions of the future, thus providing an LIS-tailored demonstration of the current performance of AI language models. We also reflect on the viability of using AI to forecast or generate research ideas in this way today. Finally, we have shared the full response log online for readers to consider and evaluate for themselves.
Improving file management interfaces and optimising system performance requires current data about users' digital collections and particularly about the file size distributions of such collections. However, prior works have examined only the sizes of system files and users' work files in varied contexts, and there has been no such study since 2013; it therefore remains unclear how today's file sizes are distributed, particularly personal files, and further if distributions differ among the major operating systems or common occupations. Here we examine such differences among 49 million files in 348 user collections. We find that the average file size has grown more than ten-fold since the mid-2000s, though most files are still under 8 MB, and that there are demographic and technological influences in the size distributions. We discuss the implications for user interfaces, system optimisation, and PIM research.