|
GPU-Accelerated Cuckoo Filter
|
A high-performance CUDA implementation of the Cuckoo Filter data structure, developed as part of the thesis "Design and Evaluation of a GPU-Accelerated Cuckoo Filter".
This library provides a GPU-accelerated Cuckoo Filter implementation optimized for high-throughput batch operations. Cuckoo Filters are space-efficient probabilistic data structures that support insertion, lookup, and deletion operations with a configurable false positive rate.
Benchmarks at 80% load factor on an NVIDIA GH200 (H100 HBM3, 3.4 TB/s) with 16-bit fingerprints and equivalent space allocation for the Blocked Bloom Filter.
The GPU Cuckoo Filter is compared against:
| Comparison | Insert | Query | Delete | FPR |
|---|---|---|---|---|
| GPU vs CPU Cuckoo | 360× faster | 973× faster | N/A | 0.041% vs 0.005% |
| Cuckoo vs TCF | 6× faster | 42× faster | 100× faster | 0.041% vs 0.305% |
| Cuckoo vs GQF | 585× faster | 6× faster | 273× faster | 0.041% vs 0.001% |
| Cuckoo vs Bloom | 0.6× (slower) | 1.4× faster | N/A | 0.041% vs 1.336% |
| Comparison | Insert | Query | Delete | FPR |
|---|---|---|---|---|
| GPU vs CPU Cuckoo | 583× faster | 1504× faster | N/A | 0.039% vs 0.004% |
| Cuckoo vs TCF | 1.9× faster | 11.3× faster | 35.3× faster | 0.039% vs 0.394% |
| Cuckoo vs GQF | 9.6× faster | 2.6× faster | 3.8× faster | 0.039% vs 0.001% |
| Cuckoo vs Bloom | 0.7× (slower) | 1.0× (equal) | N/A | 0.039% vs 4.083% |
[!NOTE] A much more comprehensive evaluation, including additional systems and analyses, is presented in the accompanying thesis.
Benchmarks and tests are built by default. To disable them:
The Config template accepts the following parameters:
| Parameter | Description | Default |
|---|---|---|
T | Key type | - |
bitsPerTag | Fingerprint size in bits (8, 16, 32) | - |
maxEvictions | Maximum eviction attempts before failure | 500 |
blockSize | CUDA block size | 256 |
bucketSize | Slots per bucket (must be power of 2) | 16 |
AltBucketPolicy | Alternate bucket calculation policy | XorAltBucketPolicy |
evictionPolicy | Eviction strategy (DFS or BFS) | BFS |
WordType | Atomic type (uint32_t or uint64_t) | uint64_t |
For workloads that exceed single GPU capacity: