GPU-Accelerated Cuckoo Filter
Loading...
Searching...
No Matches
Classes | Public Types | Public Member Functions | Static Public Attributes | List of all members
cuckoogpu::FilterMultiGPU< Config > Class Template Reference

A multi-GPU implementation of the Cuckoo Filter. More...

Classes

struct  Partitioner
 Functor for partitioning keys across GPUs. More...
 

Public Types

using T = typename Config::KeyType
 

Public Member Functions

 FilterMultiGPU (size_t numGPUs, size_t capacity, float memFactor=defaultMemoryFactor)
 Constructs a new FilterMultiGPU with default transfer plan.
 
 FilterMultiGPU (size_t numGPUs, size_t capacity, const char *transferPlanPath, float memFactor=defaultMemoryFactor)
 Constructs a new FilterMultiGPU with custom transfer plan.
 
 ~FilterMultiGPU ()
 Destroys the FilterMultiGPU.
 
 FilterMultiGPU (const FilterMultiGPU &)=delete
 
FilterMultiGPUoperator= (const FilterMultiGPU &)=delete
 
size_t insertMany (const T *h_keys, size_t n, bool *h_output=nullptr)
 Inserts a batch of keys into the distributed filter.
 
void containsMany (const T *h_keys, size_t n, bool *h_output)
 Checks for the presence of multiple keys in the filter.
 
size_t deleteMany (const T *h_keys, size_t n, bool *h_output=nullptr)
 Deletes multiple keys from the filter.
 
float loadFactor () const
 Calculates the global load factor.
 
template<typename Func >
void parallelForGPUs (Func func) const
 Executes a function in parallel across all GPUs.
 
void synchronizeAllGPUs ()
 Synchronizes all GPU streams used by this filter.
 
size_t totalOccupiedSlots () const
 Returns the total number of occupied slots across all GPUs.
 
void clear ()
 Clears all filters on all GPUs.
 
size_t totalCapacity () const
 Returns the total capacity of the distributed filter.
 
size_t sizeInBytes () const
 
size_t insertMany (const thrust::host_vector< T > &h_keys, thrust::host_vector< bool > &h_output)
 Inserts keys from a Thrust host vector.
 
size_t insertMany (const thrust::host_vector< T > &h_keys, thrust::host_vector< uint8_t > &h_output)
 Inserts keys from a Thrust host vector (uint8_t output).
 
size_t insertMany (const thrust::host_vector< T > &h_keys)
 Inserts keys from a Thrust host vector without outputting results.
 
void containsMany (const thrust::host_vector< T > &h_keys, thrust::host_vector< bool > &h_output)
 Checks for existence of keys in a Thrust host vector.
 
void containsMany (const thrust::host_vector< T > &h_keys, thrust::host_vector< uint8_t > &h_output)
 Checks for existence of keys in a Thrust host vector (uint8_t output).
 
size_t deleteMany (const thrust::host_vector< T > &h_keys, thrust::host_vector< bool > &h_output)
 Deletes keys in a Thrust host vector.
 
size_t deleteMany (const thrust::host_vector< T > &h_keys, thrust::host_vector< uint8_t > &h_output)
 Deletes keys in a Thrust host vector (uint8_t output).
 
size_t deleteMany (const thrust::host_vector< T > &h_keys)
 Deletes keys in a Thrust host vector without outputting results.
 

Static Public Attributes

static constexpr float defaultMemoryFactor = 0.8f
 Default fraction of free GPU memory to use for buffers (after filter allocation)
 

Detailed Description

template<typename Config>
class cuckoogpu::FilterMultiGPU< Config >

A multi-GPU implementation of the Cuckoo Filter.

This class partitions keys across multiple GPUs using the gossip library for efficient multi-GPU communication. It handles data distribution using gossip's multisplit and all-to-all primitives, and aggregates results.

Template Parameters
ConfigThe configuration structure for the Cuckoo Filter.

Definition at line 37 of file CuckooFilterMultiGPU.cuh.

Member Typedef Documentation

◆ T

template<typename Config >
using cuckoogpu::FilterMultiGPU< Config >::T = typename Config::KeyType

Definition at line 39 of file CuckooFilterMultiGPU.cuh.

Constructor & Destructor Documentation

◆ FilterMultiGPU() [1/3]

template<typename Config >
cuckoogpu::FilterMultiGPU< Config >::FilterMultiGPU ( size_t  numGPUs,
size_t  capacity,
float  memFactor = defaultMemoryFactor 
)
inline

Constructs a new FilterMultiGPU with default transfer plan.

Initializes gossip context, multisplit, all-to-all primitives, and Filter instances on each available GPU.

Parameters
numGPUsNumber of GPUs to use.
capacityTotal capacity of the distributed filter.
memFactorFraction of free GPU memory to use for buffers.

Definition at line 324 of file CuckooFilterMultiGPU.cuh.

325 : numGPUs(numGPUs),
326 capacityPerGPU(static_cast<size_t>(SDIV(capacity, numGPUs) * 1.02)),
327 memoryFactor(memFactor),
328 gossipContext(numGPUs),
329 multisplit(gossipContext),
330 all2all(gossipContext, gossip::all2all::default_plan(numGPUs)),
331 all2allResults(gossipContext, gossip::all2all::default_plan(numGPUs)),
332 srcBuffers(numGPUs, nullptr),
333 dstBuffers(numGPUs, nullptr),
334 bufferCapacities(numGPUs, 0),
335 resultSrcBuffers(numGPUs, nullptr),
336 resultDstBuffers(numGPUs, nullptr),
337 totalBufferCapacity(0) {
338 assert(numGPUs > 0 && "Number of GPUs must be at least 1");
339
340 filters.resize(numGPUs);
341
342 for (size_t i = 0; i < numGPUs; ++i) {
343 CUDA_CALL(cudaSetDevice(gossipContext.get_device_id(i)));
344 Filter<Config>* filter;
345 CUDA_CALL(cudaMallocManaged(&filter, sizeof(Filter<Config>)));
346 new (filter) Filter<Config>(capacityPerGPU);
347 filters[i] = filter;
348 }
349 gossipContext.sync_hard();
350
351 allocateBuffers();
352 }
#define SDIV(x, y)
Integer division with rounding up (ceiling).
Definition helpers.cuh:198
#define CUDA_CALL(err)
Macro for checking CUDA errors.
Definition helpers.cuh:204

◆ FilterMultiGPU() [2/3]

template<typename Config >
cuckoogpu::FilterMultiGPU< Config >::FilterMultiGPU ( size_t  numGPUs,
size_t  capacity,
const char *  transferPlanPath,
float  memFactor = defaultMemoryFactor 
)
inline

Constructs a new FilterMultiGPU with custom transfer plan.

Initializes gossip context, multisplit, all-to-all primitives with provided transfer plan loaded from file, and Filter instances on each available GPU.

Parameters
numGPUsNumber of GPUs to use.
capacityTotal capacity of the distributed filter.
transferPlanPathPath to gossip transfer plan file for optimized topology-aware transfers.
memFactorFraction of free GPU memory to use for buffers.

Definition at line 366 of file CuckooFilterMultiGPU.cuh.

372 : numGPUs(numGPUs),
373 capacityPerGPU(static_cast<size_t>(SDIV(capacity, numGPUs) * 1.02)),
374 memoryFactor(memFactor),
375 gossipContext(numGPUs),
376 multisplit(gossipContext),
377 all2all(
378 gossipContext,
379 [&]() {
380 auto plan = parse_plan(transferPlanPath);
381 if (plan.num_gpus() == 0) {
382 return gossip::all2all::default_plan(numGPUs);
383 }
384 return plan;
385 }()
386 ),
387 all2allResults(
388 gossipContext,
389 [&]() {
390 auto plan = parse_plan(transferPlanPath);
391 if (plan.num_gpus() == 0) {
392 return gossip::all2all::default_plan(numGPUs);
393 }
394 return plan;
395 }()
396 ),
397 srcBuffers(numGPUs, nullptr),
398 dstBuffers(numGPUs, nullptr),
399 bufferCapacities(numGPUs, 0),
400 resultSrcBuffers(numGPUs, nullptr),
401 resultDstBuffers(numGPUs, nullptr),
402 totalBufferCapacity(0) {
403 assert(numGPUs > 0 && "Number of GPUs must be at least 1");
404
405 filters.resize(numGPUs);
406
407 for (size_t i = 0; i < numGPUs; ++i) {
408 CUDA_CALL(cudaSetDevice(gossipContext.get_device_id(i)));
409 Filter<Config>* filter;
410 CUDA_CALL(cudaMallocManaged(&filter, sizeof(Filter<Config>)));
411 new (filter) Filter<Config>(capacityPerGPU);
412 filters[i] = filter;
413 }
414 gossipContext.sync_hard();
415
416 allocateBuffers();
417 }

◆ ~FilterMultiGPU()

template<typename Config >
cuckoogpu::FilterMultiGPU< Config >::~FilterMultiGPU ( )
inline

Destroys the FilterMultiGPU.

Cleans up filter instances and pre-allocated buffers.

Definition at line 424 of file CuckooFilterMultiGPU.cuh.

424 {
425 freeBuffers();
426 for (size_t i = 0; i < numGPUs; ++i) {
427 CUDA_CALL(cudaSetDevice(gossipContext.get_device_id(i)));
428 filters[i]->~Filter<Config>();
429 CUDA_CALL(cudaFree(filters[i]));
430 }
431 }

◆ FilterMultiGPU() [3/3]

template<typename Config >
cuckoogpu::FilterMultiGPU< Config >::FilterMultiGPU ( const FilterMultiGPU< Config > &  )
delete

Member Function Documentation

◆ clear()

template<typename Config >
void cuckoogpu::FilterMultiGPU< Config >::clear ( )
inline

Clears all filters on all GPUs.

Definition at line 577 of file CuckooFilterMultiGPU.cuh.

577 {
578 parallelForGPUs([&](size_t i) { filters[i]->clear(); });
579 }
void parallelForGPUs(Func func) const
Executes a function in parallel across all GPUs.
Here is the call graph for this function:

◆ containsMany() [1/3]

template<typename Config >
void cuckoogpu::FilterMultiGPU< Config >::containsMany ( const T h_keys,
size_t  n,
bool *  h_output 
)
inline

Checks for the presence of multiple keys in the filter.

Parameters
h_keysPointer to host memory containing keys to check.
nNumber of keys to check.
h_outputPointer to host memory to store results (true if present, false otherwise).

Definition at line 477 of file CuckooFilterMultiGPU.cuh.

477 {
478 executeOperation<false, true>(
479 h_keys,
480 n,
481 h_output,
482 [](Filter<Config>* filter,
483 const T* keys,
484 bool* results,
485 size_t count,
486 cudaStream_t stream) { filter->containsMany(keys, count, results, stream); }
487 );
488 }
typename Config::KeyType T
Here is the call graph for this function:
Here is the caller graph for this function:

◆ containsMany() [2/3]

template<typename Config >
void cuckoogpu::FilterMultiGPU< Config >::containsMany ( const thrust::host_vector< T > &  h_keys,
thrust::host_vector< bool > &  h_output 
)
inline

Checks for existence of keys in a Thrust host vector.

Parameters
h_keysVector of keys to check.
h_outputVector to store results (bool). Resized if necessary.

Definition at line 646 of file CuckooFilterMultiGPU.cuh.

646 {
647 h_output.resize(h_keys.size());
649 thrust::raw_pointer_cast(h_keys.data()),
650 h_keys.size(),
651 thrust::raw_pointer_cast(h_output.data())
652 );
653 }
void containsMany(const T *h_keys, size_t n, bool *h_output)
Checks for the presence of multiple keys in the filter.
Here is the call graph for this function:

◆ containsMany() [3/3]

template<typename Config >
void cuckoogpu::FilterMultiGPU< Config >::containsMany ( const thrust::host_vector< T > &  h_keys,
thrust::host_vector< uint8_t > &  h_output 
)
inline

Checks for existence of keys in a Thrust host vector (uint8_t output).

Parameters
h_keysVector of keys to check.
h_outputVector to store results (uint8_t). Resized if necessary.

Definition at line 661 of file CuckooFilterMultiGPU.cuh.

661 {
662 h_output.resize(h_keys.size());
664 thrust::raw_pointer_cast(h_keys.data()),
665 h_keys.size(),
666 reinterpret_cast<bool*>(thrust::raw_pointer_cast(h_output.data()))
667 );
668 }
Here is the call graph for this function:

◆ deleteMany() [1/4]

template<typename Config >
size_t cuckoogpu::FilterMultiGPU< Config >::deleteMany ( const T h_keys,
size_t  n,
bool *  h_output = nullptr 
)
inline

Deletes multiple keys from the filter.

Parameters
h_keysPointer to host memory containing keys to delete.
nNumber of keys to delete.
h_outputOptional pointer to host memory to store results (true if found and deleted).
Returns
The total number of occupied slots across all GPUs after deletion.

Definition at line 497 of file CuckooFilterMultiGPU.cuh.

497 {
498 if (h_output) {
499 return executeOperation<true, true>(
500 h_keys,
501 n,
502 h_output,
503 [](Filter<Config>* filter,
504 const T* keys,
505 bool* results,
506 size_t count,
507 cudaStream_t stream) { filter->deleteMany(keys, count, results, stream); }
508 );
509 } else {
510 return executeOperation<true, false>(
511 h_keys,
512 n,
513 nullptr,
514 [](Filter<Config>* filter,
515 const T* keys,
516 bool* /*unused results*/,
517 size_t count,
518 cudaStream_t stream) { filter->deleteMany(keys, count, nullptr, stream); }
519 );
520 }
521 }
Here is the call graph for this function:
Here is the caller graph for this function:

◆ deleteMany() [2/4]

template<typename Config >
size_t cuckoogpu::FilterMultiGPU< Config >::deleteMany ( const thrust::host_vector< T > &  h_keys)
inline

Deletes keys in a Thrust host vector without outputting results.

Parameters
h_keysVector of keys to delete.
Returns
size_t Total number of occupied slots.

Definition at line 706 of file CuckooFilterMultiGPU.cuh.

706 {
707 return deleteMany(thrust::raw_pointer_cast(h_keys.data()), h_keys.size(), nullptr);
708 }
size_t deleteMany(const T *h_keys, size_t n, bool *h_output=nullptr)
Deletes multiple keys from the filter.
Here is the call graph for this function:

◆ deleteMany() [3/4]

template<typename Config >
size_t cuckoogpu::FilterMultiGPU< Config >::deleteMany ( const thrust::host_vector< T > &  h_keys,
thrust::host_vector< bool > &  h_output 
)
inline

Deletes keys in a Thrust host vector.

Parameters
h_keysVector of keys to delete.
h_outputVector to store results (bool). Resized if necessary.
Returns
size_t Total number of occupied slots.

Definition at line 676 of file CuckooFilterMultiGPU.cuh.

676 {
677 h_output.resize(h_keys.size());
678 return deleteMany(
679 thrust::raw_pointer_cast(h_keys.data()),
680 h_keys.size(),
681 thrust::raw_pointer_cast(h_output.data())
682 );
683 }
Here is the call graph for this function:

◆ deleteMany() [4/4]

template<typename Config >
size_t cuckoogpu::FilterMultiGPU< Config >::deleteMany ( const thrust::host_vector< T > &  h_keys,
thrust::host_vector< uint8_t > &  h_output 
)
inline

Deletes keys in a Thrust host vector (uint8_t output).

Parameters
h_keysVector of keys to delete.
h_outputVector to store results (uint8_t). Resized if necessary.
Returns
size_t Total number of occupied slots.

Definition at line 692 of file CuckooFilterMultiGPU.cuh.

692 {
693 h_output.resize(h_keys.size());
694 return deleteMany(
695 thrust::raw_pointer_cast(h_keys.data()),
696 h_keys.size(),
697 reinterpret_cast<bool*>(thrust::raw_pointer_cast(h_output.data()))
698 );
699 }
Here is the call graph for this function:

◆ insertMany() [1/4]

template<typename Config >
size_t cuckoogpu::FilterMultiGPU< Config >::insertMany ( const T h_keys,
size_t  n,
bool *  h_output = nullptr 
)
inline

Inserts a batch of keys into the distributed filter.

Uses gossip primitives for efficient multi-GPU data distribution.

Parameters
h_keysPointer to host memory containing keys to insert.
nNumber of keys to insert.
h_outputOptional pointer to host memory to store results (true if successfully inserted).
Returns
The total number of occupied slots across all GPUs after insertion.

Definition at line 445 of file CuckooFilterMultiGPU.cuh.

445 {
446 if (h_output) {
447 return executeOperation<true, true>(
448 h_keys,
449 n,
450 h_output,
451 [](Filter<Config>* filter,
452 const T* keys,
453 bool* results,
454 size_t count,
455 cudaStream_t stream) { filter->insertMany(keys, count, results, stream); }
456 );
457 } else {
458 return executeOperation<true, false>(
459 h_keys,
460 n,
461 nullptr,
462 [](Filter<Config>* filter,
463 const T* keys,
464 bool* /*unused results*/,
465 size_t count,
466 cudaStream_t stream) { filter->insertMany(keys, count, nullptr, stream); }
467 );
468 }
469 }
Here is the call graph for this function:
Here is the caller graph for this function:

◆ insertMany() [2/4]

template<typename Config >
size_t cuckoogpu::FilterMultiGPU< Config >::insertMany ( const thrust::host_vector< T > &  h_keys)
inline

Inserts keys from a Thrust host vector without outputting results.

Parameters
h_keysVector of keys to insert.
Returns
size_t Total number of occupied slots.

Definition at line 637 of file CuckooFilterMultiGPU.cuh.

637 {
638 return insertMany(thrust::raw_pointer_cast(h_keys.data()), h_keys.size(), nullptr);
639 }
size_t insertMany(const T *h_keys, size_t n, bool *h_output=nullptr)
Inserts a batch of keys into the distributed filter.
Here is the call graph for this function:

◆ insertMany() [3/4]

template<typename Config >
size_t cuckoogpu::FilterMultiGPU< Config >::insertMany ( const thrust::host_vector< T > &  h_keys,
thrust::host_vector< bool > &  h_output 
)
inline

Inserts keys from a Thrust host vector.

Parameters
h_keysVector of keys to insert.
h_outputVector to store results (bool). Resized if necessary.
Returns
size_t Total number of occupied slots.

Definition at line 607 of file CuckooFilterMultiGPU.cuh.

607 {
608 h_output.resize(h_keys.size());
609 return insertMany(
610 thrust::raw_pointer_cast(h_keys.data()),
611 h_keys.size(),
612 thrust::raw_pointer_cast(h_output.data())
613 );
614 }
Here is the call graph for this function:

◆ insertMany() [4/4]

template<typename Config >
size_t cuckoogpu::FilterMultiGPU< Config >::insertMany ( const thrust::host_vector< T > &  h_keys,
thrust::host_vector< uint8_t > &  h_output 
)
inline

Inserts keys from a Thrust host vector (uint8_t output).

Parameters
h_keysVector of keys to insert.
h_outputVector to store results (uint8_t). Resized if necessary.
Returns
size_t Total number of occupied slots.

Definition at line 623 of file CuckooFilterMultiGPU.cuh.

623 {
624 h_output.resize(h_keys.size());
625 return insertMany(
626 thrust::raw_pointer_cast(h_keys.data()),
627 h_keys.size(),
628 reinterpret_cast<bool*>(thrust::raw_pointer_cast(h_output.data()))
629 );
630 }
Here is the call graph for this function:

◆ loadFactor()

template<typename Config >
float cuckoogpu::FilterMultiGPU< Config >::loadFactor ( ) const
inline

Calculates the global load factor.

Returns
float Load factor (total occupied / total capacity).

Definition at line 527 of file CuckooFilterMultiGPU.cuh.

527 {
528 return static_cast<float>(totalOccupiedSlots()) / static_cast<float>(totalCapacity());
529 }
size_t totalCapacity() const
Returns the total capacity of the distributed filter.
size_t totalOccupiedSlots() const
Returns the total number of occupied slots across all GPUs.
Here is the call graph for this function:

◆ operator=()

template<typename Config >
FilterMultiGPU & cuckoogpu::FilterMultiGPU< Config >::operator= ( const FilterMultiGPU< Config > &  )
delete

◆ parallelForGPUs()

template<typename Config >
template<typename Func >
void cuckoogpu::FilterMultiGPU< Config >::parallelForGPUs ( Func  func) const
inline

Executes a function in parallel across all GPUs.

Spawns a thread for each GPU to run the provided function.

Template Parameters
FuncType of the function to execute.
Parameters
funcThe function to execute, taking the GPU index as an argument.

Definition at line 540 of file CuckooFilterMultiGPU.cuh.

540 {
541 std::vector<std::thread> threads;
542 for (size_t i = 0; i < numGPUs; ++i) {
543 threads.emplace_back([=, this]() {
544 CUDA_CALL(cudaSetDevice(gossipContext.get_device_id(i)));
545 func(i);
546 });
547 }
548
549 for (auto& t : threads) {
550 t.join();
551 }
552 }
Here is the caller graph for this function:

◆ sizeInBytes()

template<typename Config >
size_t cuckoogpu::FilterMultiGPU< Config >::sizeInBytes ( ) const
inline

Definition at line 593 of file CuckooFilterMultiGPU.cuh.

593 {
594 std::atomic<size_t> total(0);
595 parallelForGPUs([&](size_t i) {
596 total.fetch_add(filters[i]->sizeInBytes(), std::memory_order_relaxed);
597 });
598 return total.load();
599 }
Here is the call graph for this function:
Here is the caller graph for this function:

◆ synchronizeAllGPUs()

template<typename Config >
void cuckoogpu::FilterMultiGPU< Config >::synchronizeAllGPUs ( )
inline

Synchronizes all GPU streams used by this filter.

Definition at line 557 of file CuckooFilterMultiGPU.cuh.

557 {
558 gossipContext.sync_all_streams();
559 }

◆ totalCapacity()

template<typename Config >
size_t cuckoogpu::FilterMultiGPU< Config >::totalCapacity ( ) const
inline

Returns the total capacity of the distributed filter.

Returns
size_t Total capacity.

Definition at line 585 of file CuckooFilterMultiGPU.cuh.

585 {
586 std::atomic<size_t> total(0);
587 parallelForGPUs([&](size_t i) {
588 total.fetch_add(filters[i]->capacity(), std::memory_order_relaxed);
589 });
590 return total.load();
591 }
Here is the call graph for this function:
Here is the caller graph for this function:

◆ totalOccupiedSlots()

template<typename Config >
size_t cuckoogpu::FilterMultiGPU< Config >::totalOccupiedSlots ( ) const
inline

Returns the total number of occupied slots across all GPUs.

Returns
size_t Total occupied slots.

Definition at line 565 of file CuckooFilterMultiGPU.cuh.

565 {
566 std::atomic<size_t> total(0);
567 parallelForGPUs([&](size_t i) {
568 total.fetch_add(filters[i]->occupiedSlots(), std::memory_order_relaxed);
569 });
570
571 return total.load();
572 }
Here is the call graph for this function:
Here is the caller graph for this function:

Member Data Documentation

◆ defaultMemoryFactor

template<typename Config >
constexpr float cuckoogpu::FilterMultiGPU< Config >::defaultMemoryFactor = 0.8f
staticconstexpr

Default fraction of free GPU memory to use for buffers (after filter allocation)

Definition at line 57 of file CuckooFilterMultiGPU.cuh.


The documentation for this class was generated from the following file: