Research

Motivation

In the field of life sciences, data-intensive workloads often demand high-performance computing solutions to process large datasets efficiently. GPUs have emerged as powerful tools in this domain due to their parallel processing capabilities, offering significant advantages in both speed and energy efficiency compared to traditional CPUs. However, much life science applications remain optimized for CPUs, missing out on the potential benefits of GPU acceleration.

TumorsamplePersonalizedmedicineTurningyearsinto daysNext generationsequencingHighPerformanceComputingDrugdevelopmentDonor todayFuture patient

This research has broad implications for the field of life sciences, demonstrating that GPU-accelerated computing can significantly enhance both performance and sustainability. By increasing processing speed of larger datasets while reducing energy consumption, this work encourages wider adoption of GPU technology in life science research, benefiting both scientists and the environment.

Efficient Data Structures on GPUs

My research focuses on optimizing life science applications by developing data structures and algorithms specifically tailored for GPU architectures. By designing these components from the ground up with GPUs in mind, the aim is to use their full performance potential.

Hierarchical Interleaved Bloom Filter (HIBF) [WIP]

One part of this work is the implementation of Hierarchical Interleaved Bloom Filters on GPUs. Bloom filters are essential for membership testing, a common operation in genomics. The hierarchical and interleaved design enhances memory management and vectorized performance for queries on multiple sets, particularly beneficial for GPUs.

Building upon existing research by Mehringer et al. , who developed HIBFs for CPUs, and Jünger et al. , who implemented efficient Bloom Filters on GPUs, this work tries to combine these results to create an optimized implementation suitable for adoption in life sciences applications.

Example

To illustrate why it is important in the context of life science applications, let us look at an example.

Assuming we have found a strand of DNA /RNA in a sewage sample, we want to know if this strand belongs to something dangerous like a parasite, bacteria or virus. Because we don’t have access to 120 GB GPU memory, we have to split the default reference database of MetaCache to use it. MetaCache provides the possibility to partition the database, query against each partition and merge the results. But this requires each partition to be loaded onto the GPU to be queried, and data transfer is often a bottleneck in hybrid computation. We could partition the database based on the taxonomy and only query the possible partitions. As we don’t want to load every partition onto the GPU, we can use an approximate membership query to only look at the partitions where the strand is maybe a part of. As we have multiple partitions, this is the use case for the HIBF. By building the HIBF at the same time we build the databases, we create a data structure with a small memory footprint which can tell us in which partitions something could be or definitely is not part of. Now we only need to look into the potential partitions, reducing the runtime and allowing to conduct more tests.

How it works

The HIBF is based on Dadi et al. ’s Interleaved Bloom Filter (IBF). The IBF optimizes the access pattern on multiple bloom filters by using a bit-wise array-of-structs-like pattern.

10011---10101---01101---11100---InterleavedBloomFilter

Let’s assume we have 5 different sets of genomes. Bacteria, plants, birds, spiders and fish.

?????

In concept, we create a bloom filter of equal length for each set, where every bit is set to 0. We can choose the length arbitrary, but it will impact how many false-positives we get later on. On the other hand, larger bloom filter mean a bigger memory footprint. Given the number of elements and the preferred probability of false-positives, we can calculate the optimal number of bits to use. Thomas Hurst created a nice page showing how the different parameters impact the bloom filter. This also gives us the number of hash functions to use. Very short, a hash function takes something and turns it into a some number. But the same thing always returns the same number. And different hash functions can return different numbers.
The next thing to do is, to take a genome (let’s call it gig_i) from one of our sets. We put that genome into the first hash function (called h0h_0) and get a number (h0(gi)h_0(g_i)). Now we calculate the modulo of this number and the length of the bloom filter in bits (bb). The result is the index (h0(gi) % b=idx0,ih_0(g_i)~\%~b = idx_{0,i}) of the bit in the bloom filter we now set to 1.
This is repeated for each hash function and each genome in the set and for each set. In the end, we have 5 different bloom filters, each with some bits set to 1.

11010011011110001110

To interleave these bloom filters, we now take the first bit of the first bloom filter, than the first of the second and so on. Because the number of bloom filters and the number of bits in the used data type to store the bits are not always dividable, we need to add padding until the started data type is full. This makes it easier to use the data structure on vectorized hardware like GPUs. Now we do the same for the other bits in the bloom filters.

11010011011110001110