Abstract:The short lifespan of traditional data storage media, coupled with an exponential increase in storage demand, has made long-term archival a fundamental problem in the data storage industry and beyond. Consequently, researchers are looking for innovative media solutions that can store data over long time periods at a very low cost. DNA molecules, with their high density, long lifespan, and low energy needs, have emerged as a viable alternative to digital data archival. However, current DNA data storage technologies are facing challenges with respect to cost and reliability. Thus, coding rate and error robustness are critical to scale DNA storage and make it technologically and economically achievable. Moreover, the molecules of DNA that encode different files are often located in the same oligo pool. Without random access solutions at the oligo level, it is very impractical to decode a specific file from these mixed pools, as all oligos need to first be sequenced and decoded before a target file can be retrieved, which greatly deteriorates the read cost. This paper introduces a solution to efficiently encode and store images into DNA molecules, that aims at reducing the read cost necessary to retrieve a resolution-reduced version of an image. This image storage system is based on the Progressive Decoding Functionality of the JPEG2000 codec but can be adapted to any conventional progressive codec. Each resolution layer is encoded into a set of oligos using the JPEG DNA VM codec, a DNA-based coder that aims at retrieving a file with a high reliability. Depending on the desired resolution to be read, the set of oligos as well as the portion of the oligos to be sequenced and decoded are adjusted accordingly. These oligos will be selected at sequencing time, with the help of the adaptive sampling method provided by the Nanopore sequencers, making it a PCR-free random access solution.
Abstract:The exponential increase in storage demand and low lifespan of data storage devices has resulted in long-term archival and preservation emerging as a critical bottlenecks in data storage. In order to meet this demand, researchers are now investigating novel forms of data storage media. The high density, long lifespan and low energy needs of synthetic DNA make it a promising candidate for long-term data archival. However, current DNA data storage technologies are facing challenges with respect to cost (writing data to DNA is expensive) and reliability (reading and writing data is error prone). Thus, data compression and error correction are crucial to scale DNA storage. Additionally, the DNA molecules encoding several files are very often stored in the same place, called an oligo pool. For this reason, without random access solutions, it is relatively impractical to decode a specific file from the pool, because all the oligos from all the files need to first be sequenced, which greatly deteriorates the read cost. This paper introduces PIC-DNA - a novel JPEG2000-based progressive image coder adapted to DNA data storage. This coder directly includes a random access process in its coding system, allowing for the retrieval of a specific image from a pool of oligos encoding several images. The progressive decoder can dynamically adapt the read cost according to the user's cost and quality constraints at decoding time. Both the random access and progressive decoding greatly improve on the read-cost of image coders adapted to DNA.

Abstract:Knowledge graphs (KGs) have achieved significant attention in recent years, particularly in the area of the Semantic Web as well as gaining popularity in other application domains such as data mining and search engines. Simultaneously, there has been enormous progress in the development of different types of heterogeneous hardware, impacting the way KGs are processed. The aim of this paper is to provide a systematic literature review of knowledge graph hardware acceleration. For this, we present a classification of the primary areas in knowledge graph technology that harnesses different hardware units for accelerating certain knowledge graph functionalities. We then extensively describe respective works, focusing on how KG related schemes harness modern hardware accelerators. Based on our review, we identify various research gaps and future exploratory directions that are anticipated to be of significant value both for academics and industry practitioners.